Exploring Unity DOTS and ECS: is it a game changer?
Unity DOTS allows developers to use the full potential of modern processors and deliver highly optimized, efficient games – and we think it’s worth paying attention to.
It’s been over five years since Unity first announced development of their Data-Oriented Technology Stack (DOTS). Now, with the release of the long-term support (LTS) version, Unity 2022.3.0f1, we’ve finally seen an official release. But why is Unity DOTS so critical to the game development industry, and what advantages does this technology offer?
Hello, everyone! My name is Denis Kondratev, and I'm a Unity Developer at MY.GAMES. If you've been eager to understand what Unity DOTS is and whether it's worth exploring, this is the perfect opportunity to delve into this fascinating topic, and in this article – we’ll do just that.
What is the Entity Component System (ECS)?
At its core, DOTS implements the Entity Component System (ECS) architectural pattern. To simplify this concept, let’s describe it like this: ECS is built upon three fundamental elements: Entities, Components, and Systems.
Entities, on their own, lack any inherent functionality or description. Instead, they serve as containers for various Components, which bestow them with specific characteristics for game logic, object rendering, sound effects, and more.
Components, in turn, come in different types and just store data without independent processing capabilities of their own.
Completing the ECS framework are Systems, which process Components, handle Entity creation and destruction, and manage the addition or removal of Components.
For instance, when creating a "Space Shooter" game, the playground will feature multiple objects: the player's spaceship, enemies, asteroids, loot, you name it.
All of these objects are considered entities in their own right, devoid of any distinct features. However, by assigning different components to them, we can imbue them with unique attributes.
To demonstrate, considering that all these objects possess positions on the game field, we can create a position component that holds the object's coordinates. Furthermore, for the player's spaceship, enemies, and asteroids, we can incorporate health components; the system responsible for handling object collisions will govern the health of these entities. Additionally, we can attach an enemy type component to the enemies, enabling the enemy control system to govern their behavior based on their assigned type.
While this explanation provides a simplistic, rudimentary overview, the reality is somewhat more complex. Nonetheless, I trust that the fundamental concept of ECS is clear. With that out of the way, let's delve into the advantages of this approach.
The benefits of the Entity Component System
One of the main advantages of the Entity Component System (ECS) approach is the architectural design it promotes. Object-oriented programming (OOP) carries a significant legacy with patterns like inheritance and encapsulation, and even experienced programmers can make architectural mistakes in the heat of development, leading to refactoring or tangled logic in long-term projects.
In contrast, ECS provides a simple and intuitive architecture. Everything falls naturally into isolated components and systems, making it easier to understand and develop using this approach; even novice developers quickly grasp this approach with minimal errors.
ECS follows a composite approach, where isolated components and behavior systems are created instead of complex inheritance hierarchies. These components and systems can be easily added or removed, allowing for flexible changes to entity characteristics and behavior – this approach greatly enhances code reusability.
Another key advantage of ECS is performance optimization. In ECS, data is stored in memory in a contiguous and optimized manner, with identical data types placed close to each other. This optimizes data access, reduces cache misses, and improves memory access patterns. Moreover, systems composed of separate data blocks are easier to parallelize across different processes, leading to exceptional performance gains compared to traditional approaches.
Exploring the packages of Unity DOTS
Unity DOTS encompasses a set of technologies provided by Unity Technologies that implement the ECS concept in Unity. It includes several packages designed to enhance different aspects of game development; let’s cover a few of those now.
The core of DOTS is the Entities package, which facilitates the transition from familiar MonoBehaviours and GameObjects to the Entity Component System approach. This package forms the foundation of DOTS-based development.
The Unity Physics package introduces a new approach to handling physics in games, achieving remarkable speed through parallelized computations.
Additionally, the Havok Physics for Unity package allows integration with the modern Havok Physics engine. This engine offers high-performance collision detection and physical simulation, powering popular games such as The Legend of Zelda: Breath of the Wild, Doom Eternal, Death Stranding, Mortal Kombat 11, and more.
Death Stranding, like many other video games, uses the popular Havok Physics engine
The Entities Graphics package focuses on rendering in DOTS. It enables efficient gathering of rendering data and works seamlessly with existing render pipelines like the Universal Render Pipeline (URP) or High Definition Render Pipeline (HDRP).
One more thing, Unity has also been actively developing a networking technology called Netcode. It includes packages like Unity Transport for low-level multiplayer game development, Netcode for GameObjects for traditional approaches, and the noteworthy Unity Netcode for Entities package, which aligns with DOTS principles. These packages are relatively new and will continue to evolve in the future.
Enhancing performance in Unity DOTS and beyond
Several technologies closely related to DOTS can be used within the DOTS framework and beyond. The Job System package provides a convenient way to write code with parallel computations. It revolves around dividing work into small chunks called jobs, which perform computations on their own data. The Job System evenly distributes these jobs across threads for efficient execution.
To ensure code safety, the Job System supports the processing of blittable data types. Blittable data types have the same representation in managed and unmanaged memory and require no conversion when passed between managed and unmanaged code. Examples of blittable types include byte, sbyte, short, ushort, int, uint, long, ulong, float, double, IntPtr, and UIntPtr. One-dimensional arrays of blittable primitive types and structures containing exclusively blittable types are also considered blittable.
However, types containing a variable array of blittable types are not considered blittable themselves. To address this limitation, Unity has developed the Collections package, which provides a set of unmanaged data structures for use in jobs. These collections are structured and store data in unmanaged memory using Unity mechanisms. It is the developer's responsibility to deallocate these collections using the Disposal() method.
Another important package is the Burst Compiler, which can be used with the Job System to generate highly optimized code. Although it comes with certain code usage limitations, the Burst compiler provides a significant performance boost.
Measuring performance with Job System and Burst Compiler
As mentioned, Job System and Burst Compiler are not direct components of DOTS but provide valuable assistance in programming efficient and fast parallel computations. Let's test their capabilities using a practical example: implementing Conway's Game of Life algorithm. In this algorithm, a field is divided into cells, each of which can be either alive or dead. During each turn, we check the number of live neighbors for each cell, and their states are updated according to specific rules.
Here’s the implementation of this algorithm using the traditional approach:
private void SimulateStep()
{
Profiler.BeginSample(nameof(SimulateStep));
for (var i = 0; i < width; i++)
{
for (var j = 0; j < height; j++)
{
var aliveNeighbours = CountAliveNeighbours(i, j);
var index = i * height + j;
var isAlive = aliveNeighbours switch
{
2 => _cellStates[index],
3 => true,
_ => false
};
_tempResults[index] = isAlive;
}
}
_tempResults.CopyTo(_cellStates);
Profiler.EndSample();
}
private int CountAliveNeighbours(int x, int y)
{
var count = 0;
for (var i = x - 1; i <= x + 1; i++)
{
if (i < 0 || i >= width) continue;
for (var j = y - 1; j <= y + 1; j++)
{
if (j < 0 || j >= height) continue;
if (_cellStates[i * width + j])
{
count++;
}
}
}
return count;
}
I’ve added markers to Profiler to measure the time taken for the calculations. The states of the cells are stored in a one-dimensional array called _cellStates. We initially write the temporary results to _tempResults and then copy them back to _cellStates upon completing the calculations. This approach is necessary because writing the final result directly to _cellStates would affect subsequent calculations.
I created a field of 1000x1000 cells and ran the program to measure the performance. Here are the results:
As seen from the results, the calculations took 380 ms.
Now let's apply the Job System and Burst Compiler to improve the performance. First, we will create the Job responsible for executing the Conway's Game of Life algorithm.
public struct SimulationJob : IJobParallelFor
{
public int Width;
public int Height;
[ReadOnly] public NativeArray<bool> CellStates;
[WriteOnly] public NativeArray<bool> TempResults;
public void Execute(int index)
{
var i = index / Height;
var j = index % Height;
var aliveNeighbours = CountAliveNeighbours(i, j);
var isAlive = aliveNeighbours switch
{
2 => CellStates[index],
3 => true,
_ => false
};
TempResults[index] = isAlive;
}
private int CountAliveNeighbours(int x, int y)
{
var count = 0;
for (var i = x - 1; i <= x + 1; i++)
{
if (i < 0 || i >= Width) continue;
for (var j = y - 1; j <= y + 1; j++)
{
if (j < 0 || j >= Height) continue;
if (CellStates[i * Width + j])
{
count++;
}
}
}
return count;
}
}
I have assigned the [ReadOnly] attribute to the CellStates field, allowing unrestricted access to all values of the array from any thread. However, for the TempResults field, which has the [WriteOnly] attribute, writing can only be done through the index specified in the Execute(int index) method. Attempting to write a value to a different index will generate a warning. This ensures data safety when working in a multi-threaded mode.
Now, from the regular code, let's launch our Job:
private void SimulateStepWithJob()
{
Profiler.BeginSample(nameof(SimulateStepWithJob));
var job = new SimulationJob
{
Width = width,
Height = height,
CellStates = _cellStates,
TempResults = _tempResults
};
var jobHandler = job.Schedule(width * height, 4);
jobHandler.Complete();
job.TempResults.CopyTo(_cellStates);
Profiler.EndSample();
}
After copying all the necessary data, we schedule the execution of the job using the Schedule() method. It's important to note that this scheduling doesn't immediately execute the calculations: these actions are initiated from the main thread, and the execution happens through workers distributed among different threads. To wait for the job to complete, we use jobHandler.Complete(). Only then can we copy the obtained result back to _cellStates.
Let's measure the speed:
The execution speed has increased almost tenfold, and the execution time is now approximately 42 ms. In the Profiler window, we can see that the workload was distributed among 17 workers. This number is slightly less than the number of processor threads in the test environment, which is an Intel® Core™ i9-10900 with 10 cores and 20 threads. While the results may vary on processors with fewer cores, we can ensure the full utilization of the processor's power.
But that's not all – it's time to utilize Burst Compiler, which provides significant code optimization but comes with certain restrictions. To enable Burst Compiler, simply add the [BurstCompile] attribute to the SimulationJob.
[BurstCompile]
public struct SimulationJob : IJobParallelFor
{
...
}
Let's measure again:
The results exceed even the most optimistic expectations: speed has increased almost 200 times compared to the initial result. Now, the computation time for 1 million cells is no more than 2 ms. In Profiler, the parts executed by the code compiled with the Burst Compiler are highlighted in green.
Conclusion
While the use of multithreaded computations may not always be necessary, and the utilization of Burst Compiler may not always be possible, we can observe a common trend in processor development toward multi-core architectures. This means that we should be prepared to harness their full power. ECS, and specifically Unity DOTS, align perfectly with this paradigm.
I believe that Unity DOTS deserves attention, at the very least. While it may not be the best solution for every case, ECS can prove its worth in many games.
The Unity DOTS framework, with its data-oriented and multithreaded approach, offers tremendous potential for optimizing performance in Unity games. By adopting the Entity Component System architecture and leveraging technologies like the Job System and Burst Compiler, developers can unlock new levels of performance and scalability.
As game development continues to evolve and hardware advances, embracing Unity DOTS becomes increasingly valuable. It empowers developers to harness the full potential of modern processors and deliver highly optimized and efficient games. While Unity DOTS may not be the ideal solution for every project, it undoubtedly holds immense promise for those seeking performance-driven development and scalability.
Unity DOTS is a powerful framework that can significantly benefit game developers by enhancing performance, enabling parallel computations, and embracing the future of multi-core processing. It’s worth exploring and considering its adoption to fully leverage modern hardware and optimize the performance of Unity games.
Top comments (0)