DEV Community

Cover image for Insane WPF Scatter Chart performance with Parallel Rendering
Andrew Bt
Andrew Bt

Posted on

Insane WPF Scatter Chart performance with Parallel Rendering

Today one of our first ever customers (dan danna naah!) contacted us after 3-years and asked about our Scatter Chart performance. They are still using v1.x of SciChart and wanted to know if it was possible with the latest version to draw a million points in a scatter chart at interactive framerates, and what performance improvements we had achieved over the years.

This was a chance for us to show off, I thought!! :-)

Well, over the major versions we have focused on performance (see our article How Fast is SciChart WPF Chart). SciChart has gotten faster over the major versions as you can see below and in the article above:

ScatterPerformance

Scatter performance in our tests has improved from about 1.8 FPS for 250k points in v1.7 to 12FPS in v3.2, about a 6x speed improvement. This is good, but not quite good enough, the customer would like to draw 1,000,000 points in a scatter chart ....

Parallel Rendering for Scatter Charts

An old FAQ on the SciChart forums show a quick hack to enable multi-threaded rendering for the EllipsePointMarker type. This gives a decent boost in scatter chart performance on dual and quad core machines.

Unfortunately the sample no longer compiles, it only works with SciChart v3.1. So we took this as a starting point and re-worked it for SciChart v3.4.

A small update to the SciChart BasePointMarker API

We needed to add a small enhancement (backward compatible) to the BasePointMarker type. This has been added to v3.4.2.6726 and above which you can get from our NuGet feed. This allows us to be notified in the pointmarker when a batch begins and ends and do some caching.

A new ParallelEllipsePointMarker type

With our new slightly modified API, we can create a new ParallelEllipsePointMarker type that takes advantage of the Begin/End batching.

public class ParallelEllipsePointMarker : BasePointMarker
{
private float _width;
private float _height;
private readonly List<Point> _points = new List<Point>();
private IPen2D _pen;
private IBrush2D _brush;

/// <summary>
/// When overridden in a derived class, draws the point markers at specified collection of <see cref="Point" /> centers
/// </summary>
/// <param name="context">The RenderContext to draw with</param>
/// <param name="centers">The Centres of the point markers</param>
/// <param name="pen">The default Stroke pen (if current pen is not set)</param>
/// <param name="brush">The default Fill brush (if current brush is not set)</param>
/// <seealso cref="IRenderContext2D" />
/// <seealso cref="IPen2D" />
/// <seealso cref="IBrush2D" />
protected override void DrawInternal(IRenderContext2D context, IEnumerable<Point> centers, IPen2D pen, IBrush2D brush)
{
Action<Point> drawOp = (center) =>
{
context.DrawEllipse(pen, brush, center, _width, _height);
};

// This will only work for some drawing functions, like DrawEllipse. Consider this experimental
Parallel.ForEach(centers, drawOp);
}

protected override void DrawInternal(IRenderContext2D context, double x, double y, IPen2D pen, IBrush2D brush)
{
_pen = pen;
_brush = brush;
_points.Add(new Point(x, y));
}

public override void Begin(IRenderContext2D context, IPen2D defaultPen, IBrush2D defaultBrush)
{
base.Begin(context, defaultPen, defaultBrush);

_width = (float) Width;
_height = (float) Height;
_points.Clear();
}

public override void End(IRenderContext2D context)
{
DrawInternal(context, _points, _pen, _brush);
base.End(context);
}
}

The above class keeps a List of Points which is cleared on Begin() and drawn on End(). Every call to DrawInternal with a single x,y point is added to the Points. In End() we draw all points using Parallel.ForEach.

Performance of ParallelEllipsePointMarker

The performance of the ParallelEllipsePointMarker is better than the standard EllipsePointMarker. Take a look at the below. While our standard point-marker achieves barely 4.5FPS for 1,000,000 points, the Parallel Ellipse Point Marker is able to push out around 15FPS on an i7 Quad-core workstation.

ParallelEllipsev3.4.2

Overdraw and Contention

This is good but there is some overdraw. Imagine if you had 1,000,000 scatter points drawn all at the same X,Y location. The above implementation would be grossly inefficient. Also, the Parallel.ForEach spawns goodness knows how many threads and context switching between them will start to become an overhead. Having fewer threads active will be a good thing for performance. Ideally, there should be no more threads than there are CPUs.

ClusteredParallelEllipsePointMarker

What about clustering and parallel drawing? ... Whoa there! Don't get too excited!

We had a go at a simple, almost lossless clustered parallel point marker. See the code below:

public class ClusteredParallelEllipsePointMarker : BasePointMarker
{
private float _width;
private float _height;
private IPen2D _pen;
private IBrush2D _brush;
private byte[,] _pointsDrawn;
private int _viewportWidth;
private int _viewportHeight;

/// <summary>
/// When overridden in a derived class, draws the point markers at specified collection of <see cref="Point" /> centers
/// </summary>
/// <param name="context">The RenderContext to draw with</param>
/// <param name="centers">The Centres of the point markers</param>
/// <param name="pen">The default Stroke pen (if current pen is not set)</param>
/// <param name="brush">The default Fill brush (if current brush is not set)</param>
/// <seealso cref="IRenderContext2D" />
/// <seealso cref="IPen2D" />
/// <seealso cref="IBrush2D" />
protected override void DrawInternal(IRenderContext2D context, IEnumerable<Point> centers, IPen2D pen, IBrush2D brush)
{
// This method is ignored by v3.2 and up...
}

protected override void DrawInternal(IRenderContext2D context, double x, double y, IPen2D pen, IBrush2D brush)
{
_pen = pen;
_brush = brush;
int ix = (int)x;
int iy = (int)y;

// If out of bounds, skip
if (ix < 0 || iy < 0 || ix >= _viewportWidth || iy >= _viewportHeight)
return;

// If pixel already marked as ignored, skip
if (_pointsDrawn[ix, iy] == 0xFF)
return;

// Set pixel as marked to draw
_pointsDrawn[ix, iy] = 0x1;

// Below here
//
// This introduces a very very slight loss by also marking the surrounding pixels to a Point-Marker as
// ignored (using 0xFF). We will prevent drawing to these pixels as well
// Comment out this section below if you wish to have a true lossless clustered parallel pointmarker
//

// Set surrounding pixels as marked to ignore
if (ix < 1 || iy < 1 || ix >= _viewportWidth-1 || iy >= _viewportHeight-1)
return;

_pointsDrawn[ix - 1, iy - 1] = 0xFF; // x x x
_pointsDrawn[ix - 1, iy] = 0xFF; // x o x
_pointsDrawn[ix - 1, iy + 1] = 0xFF; // x x x
_pointsDrawn[ix, iy - 1] = 0xFF;
_pointsDrawn[ix, iy + 1] = 0xFF; // when a point is drawn at o, do not allow any points to be drawn at x
_pointsDrawn[ix + 1, iy - 1] = 0xFF;
_pointsDrawn[ix + 1, iy] = 0xFF;
_pointsDrawn[ix + 1, iy + 1] = 0xFF;
}

public override void Begin(IRenderContext2D context, IPen2D defaultPen, IBrush2D defaultBrush)
{
base.Begin(context, defaultPen, defaultBrush);

_viewportWidth = (int)context.ViewportSize.Width;
_viewportHeight = (int)context.ViewportSize.Height;

if (_pointsDrawn == null || _pointsDrawn.GetLength(0) != _viewportWidth ||
_pointsDrawn.GetLength(1) != _viewportHeight)
{
_pointsDrawn = new byte[_viewportWidth, _viewportHeight];
}
else
{
Array.Clear(_pointsDrawn, 0, _pointsDrawn.Length);
}
_width = (float)Width;
_height = (float)Height;
}

public override void End(IRenderContext2D context)
{
Parallel.For(0, _viewportHeight, y =>
{
for (int x = 0; x < _viewportWidth; x++)
{
if (_pointsDrawn[x, y] == 0x1)
{
context.DrawEllipse(_pen, _brush, new Point(x,y), _width, _height);
}
}
});

base.End(context);
}
}

So how does it work?

On Begin() we declare a Byte array the same size as the viewport. This byte array, any point-marker to draw we set a byte to 0x1. Any point-marker to ignore will remain 0x0.

For extra credit, we can optionally set points surrounding an 0x1 (point to draw) with 0xFF: a point to ignore. This means that if two or more scatter points are to be drawn within a 1-pixel distance of each other, only one scatter point will be drawn, reducing overdraw.

Second, because we're using a Byte array and not a List of points, we're avoiding memory creation on each pass. The Byte array is only recreated if the viewport is resized.

Finally, we use Parallel.For to iterate over rows of the byte array. Sure there is some cache missing here as we're reading from all over the byte array, but at least we have a maximum of one Task or thread per row, not one per pixel :)

Performance of the ClusteredParallelEllipsePointMarker

The performance is pretty insane. Take a look at the below. While our standard EllipsePointMarker achieves barely 4.5FPS for 1,000,000 points, the ClusteredParallelEllipsePointMarker above is able to push out a whopping ~30FPS on an i7 Quad-core workstation.

ClusteredParallelEllipsev3.4.2

A note about DirectX

The attached sample allows you to compare HighSpeed (CPU) vs. DirectX scatter performance. If you run it, you'll probably be disappointed that DirectX is considerably slower than a parallel enabled CPU rendering.* We've done some analysis and can see some low-hanging fruit to vastly improve the DirectX rendering of scatter charts. Note that DirectX is considerably faster for line charts however.

* Update: SciChart v4 now has vastly improved DirectX Scatter Chart performance compared to SciChart v3

Also note that Parallel-enabled point markers are only compatible with the HighSpeed (CPU) renderer. The other renderers are not suitable to parallelise on the CPU, although DirectX by its very nature is highly parallel and we will be working to improve it.

Conclusion

If ClusteredParallelEllipsePointMarker was able to achieve near 30FPS at 1,000,000 points, but our previous speed test clocked SciChart v3.2's EllipsePointMarker at 4.5FPS points and SciChart v1.7's point-marker at 1.8FPS for 250k points then our solution is approx ~60x faster at drawing scatter charts than SciChart v1.x.

So, how did we do? ;-)

Download the sample below

ParallelPointMarkers

Requires SciChart v3.4.2.6726 or above to run.

To test performance, run in Release mode with the debugger Detached!

 

Top comments (0)