DEV Community

loading...
Cover image for Rewriting a 2000s graphics engine

Rewriting a 2000s graphics engine

igorsegallafa profile image Igor Segalla Updated on ・8 min read

I have always been interested and curious about the computer graphics area, trying to understand the whole process involved in rendering a simple 3D object. How could millions of meaningless numbers be drawn on the screen, with animations, lighting, textures and everything?

To start getting a little more aware of the subject, I started reading the book Introduction To 3D Game Programming With Directx 9.0C: A Shader Approach. This book covers from basic concepts (example of what a 3D object is) to even more advanced concepts, such as the use of Shaders (reason for my choice).

Oh, and yes, I know that the concept of fixed-function rendering pipeline is already obsolete, but I chose Directx 9 because the implementation is simpler.

Directx Book

Introduction To 3D Game Programming With Directx 9.0C: A Shader Approach (Wordware Game and Graphics Library) (Frank Luna)

Today, I'm going to tell you a little bit about my experience to rewrite a very old graphics engine and optimize it using more modern concepts.

The work involved in this whole project was very complex and quite extensive. As I don't want to go into too much technical detail in this publication and end up making it very tedious, I'll focus on the main parties involved and summarize what my solution was.


Application

This project was carried out in a game (nothing better to have feasible results) from the 2000s.

My target in this project was to optimize this game as much as I could in comparison to the old design. At the same time, acquire experience and more understanding on the subject of computer graphics.

Problems

Before starting the project, I had to take into account some existing problems that should be dealt with in some way during the execution of the project. Problems such as:

  • Fixed Point: as the game is from the 2000s, float operations were still very costly and so the entire graphical part of the game was done using variables of the whole type. Believe me, I never want to do a matrix operation using integers, without even being able to think about SSE.
  • Directx 6: for a decent optimization, it was necessary to upgrade to Directx 9 (the most similar to Directx 6) to have access to more advanced rendering features.
  • Hard-coding: the entire graphic part of the game was done on its own, without using high-level graphic libraries. The only functions used in the Directx API were DrawPrimitiveUP, Clear and SetTexture. In other words, all the vertex transformation, shading and skinning parts were done "by hand".
  • Compatibility: as I didn't want to lose compatibility, I decided to choose to use Directx 9 (which supports Windows XP), focusing on the minimum support of Shader Model 2.0 (SM2) and processors with SSE2 technology.

SSE (Streaming SIMD Extensions) is a set of SIMD type instructions designed by Intel. They are additional instructions that can increase performance when the same operations are performed on multiple data entries.

Removing use of already transformed vertices (RHW)

The first thing to be done and one of the key parts of all the work was to remove the use of the transformed vertices. Without this, it would be impossible to use more modern concepts such as vertex buffering, indexing, hardware skinning , etc.

Transformed Vertices means that the game performed all the transformations involved in a rendering process on its own, without any help from a graphic library.

In short, what needed to be done was:

  1. Remove the transformation of the vertices for Projection
  2. Remove the vertex transformation for View Space (Camera)

To remove these two transformations, it was necessary to pass the vertices in World Space to the rendering buffer , and not the transformed ones anymore.

In this way, it was possible to make the transformation by the Directx API itself, using the function SetTransform (D3DTS_View | Projection, &m), which is much more performative than the one then hard-coded by the game.

Render using DrawPrimitive

Another issue present in the rendering of the game, was that it drew all vertices using DrawPrimitiveUP.

The DrawPrimitiveUP function is used when you pass a vertex buffer created and transformed on your own. It is generally used for dynamic objects (which was not our case).

This caused a tremendous cost to the CPU, since many memory operations were carried out (memcpy mainly). The main idea in this part of the project was to start using Vertex Buffers and thus be able to draw everything using the DrawPrimitive function.

To start using Vertex Buffer I had to remove the transformation of Local Space into World Space by the game itself, and also start using the function provided by the Directx API, SetTransform (D3DTS_World, &m).

This Vertex Buffer is built during the loading of the 3D file, thus, we have a buffer that is assembled only once and remains static throughout the execution of the game, avoiding all those memory operations (if you know how much these operations cost, already can imagine the relief that the CPU gave this part of the project).

Vertex Transformation

Process involved during the transformation of the 3D object to 2D rendering on the screen. Source: http://www.shangdixinxi.com/detail-1085554.html

Vertex Indexing

Having all the 3D objects already using Vertex Buffers, we have already taken a lot of weight out of the way, but we can still improve performance.

A 3D object is formed by a list of triangles and these triangles are formed by a list of vertices. At the intersection of two triangles, some vertices are used for both triangles, and what is the problem with this? The problem is that in Vertex Buffer, these vertices that are common between the two triangles are treated as if they were different. Therefore, when the vertices are passed to the GPU, thousands of them will be passed unnecessarily.

To optimize this, in addition to the Vertex Buffer, we must now create an Index Buffer, which is a list that points to the position (index) of the vertex within the Vertex Buffer.

Vertex Indexing

Source: OpenGL Tutorial

To build this Index Buffer, it is necessary to consider the vertex position (XYZ) and also the UV texture coordinates when processing the triangles of that Mesh. When a vertex coincides with another that has been read before, it takes its index and places it in the list of indexes.

Skeletal Animation

One of the most crucial parts of the entire project in terms of performance, is the skinning of Meshes.Skinning can be summarized as the process responsible for the deformation of the 3D object in relation to its skeleton.

The process is summarized in going through all the vertices of that object and multiplying each one of these vertices by the local matrix resulting from the skeleton to which the vertex is linked. As you can imagine, this process on the CPU, with variables of the whole type, matrix operations, and to make matters worse, performed with each frame of the game, is very expensive and one of the weakest points of all rendering.

To optimize skinning, we must pass this process to be performed within the GPU. How can you do it? We use shaders!

Shaders is like a small program used for processing the vertices and pixels of a 3D model, which is executed by the GPU. I used the HLSL (High-Level Shading Language), developed by Microsoft.

Example above of how the Skinning done in HLSL would be. GetSkinMatrix () is the function in charge of getting the local Matrix of the skeleton to which that vertex is linked.

Culling

Culling is the rendering step responsible for “selecting” the objects visible by the camera and determining whether they should be rendered or not. For this, each 3D object in the game must contain a kind of delimiter that represents the useful area of that object. I used bounding sphere for Models (set of Meshes) and bounding box for Meshes.

Culling

Comparison between Bounding Sphere, Bounding Box and Quadtree

Having these delimiters, we can perform the culling steps, which will determine whether the object should be rendered or not. For this project, I used Frustum Culling for all 3D objects (if there is no “inside” the camera area, the object is cut) and Quadtree Node Culling for terrain objects (maps).

Frustum Culling

A Survey of Visibility for Walkthrough Applications — Scientific Figure on ResearchGate. Source: https://www.researchgate.net/figure/Three-types-of-visibility-culling-techniques-1-View-Frustum-Culling-2-back-face_fig1_2440562

Outcomes

With the end of the project, there was an improvement in FPS (frames per second), however, a little discreet on certain occasions. In certain cases, the FPS improved up to 3x over the previous graphics engine.

As this project took a few months to complete, there was no test environment prepared to make the comparison between before and after, so I chose to select some common scenarios and see how many FPS the current graphics engine was reaching.

Results 1
Results 2
Results 3

Test with Monsters

Upgraded graphics engine rendering 45 monsters

Test with Characters

Upgraded graphics engine rendering 300 and 600 characters (body model, head model and items)

The biggest noticeable difference in FPS between the old graphics engine and the new one, occurs when characters are being rendered on the screen. For characters, the Skinning of the head, body and also the equipped items should be considered. As in the old graphics engine Skinning was performed on the CPU, the FPS was much lower than the current one (performed on the GPU).

In addition to the considerable improvement in the game’s FPS , with the new graphics engine it was possible to add several new graphic features, such as:

Dynamic Shadows

Dynamic Shadows: With the new graphics engine, it was possible to implement dynamic shadows on the characters, with the minimum requirement of Shader Model 3.0. Before, it was completely impossible to have a feature like this.

Lightning Map

For the case of maps, dynamic shadows were not used for visual and performance reasons. To get around this, the Lightning Map technique was used to produce the shadow effect. Image for post

Ocean Effect

Ocean Effect: This effect was done as an experiment but the result was very nice. It was implemented a refraction system, reflection and some material effects (scrolling).

Intersection Shader

Intersection of objects: Effect widely used in water shaders and lately in games in the style of Battle Royale, used to represent the safe zone of the game.

Inner Glow

Inner Glow: internal glow effect for any 3D model. You can control color, brightness intensity, distance, etc.

Dissolve Effect

Dissolve effect: when a monster dies, instead of it staying on the ground and disappearing from nowhere, there is a specific effect for that.

Conclusions

Although the performance of a graphics engine is quite relative, depending on hardware, enabled features, resolution, game settings etc., I expected a more significant improvement in FPS, mainly due to the time dedicated and complexity of the project as a whole.

In contrast, the new graphics engine allowed the development of several new features that would have been impossible previously, in addition to all the knowledge acquired with the project.

There are still a lot of things that can be optimized in the future and bring even higher performance, such as:

  • Build a State Manager (a kind of cache of Directx states)
  • Use a circular buffer as a cache for animation arrays
  • Grouping frames by type of animations
  • Refactor some Shaders
  • Avoid division and multiplication operations by 256.f
  • Improve Culling with new techniques (Back-Face Culling and Occlusion)

The entire project is available in a public repository on GitHub, licensed under the MIT license.

GitHub logo igorsegallafa / delta3d

Simple rendering engine made for Priston Tale game.

Delta3D

Project of a basic Game Engine that I created for learn 3D stuff. The library was made using Directx9 with Shaders and C++ 17.

License

Licensed under the MIT License.

Features

  • Support for Pixel Shader 2.0 and 3.0.
  • Support for Hardware and Software Skinning.
  • Support for Lightning Map (Self Illumination map).
  • Support for old devices.
  • Support for material overlay.
  • Support for Vertex Color.
  • Supports up to 128 bones in Skinning.
  • Support for SMD File Format from Priston Tale game.
  • Use of SSE2 for floating optimization.
  • Static Quad Tree for Terrain rendering.
  • Particle Engine.
  • Mesh Rendering Sort (transparent meshes renders last).
  • Distance Fade at Pixel Shader.
  • Initial implementation of reflection plane.
  • Dynamic Lightning.
  • Material Transformation (Scroll).
  • Material with Animated Texture (sequence of frames).
  • Camera implementation.
  • Dynamic Event and Timer implementation.
  • Frustum Culling.
  • Value Animation with Easing.
  • Debug Renderer.
  • Good performance for scene with a lot of Skinned Meshes and big…

Discussion (1)

Collapse
turnerj profile image
James Turner

Super interesting post - great work on rewriting the engine!

Forem Open with the Forem app