DEV Community

Cover image for How to Effectively Handle Millions of Rows in Bold BI
Bold BI by Syncfusion
Bold BI by Syncfusion

Posted on

How to Effectively Handle Millions of Rows in Bold BI

Introduction

Many companies have adopted business intelligence tools for breaking down their data to gain business insight. Creating interactive dashboards with useful visualizations helps business owners to actively monitor their key metrics and KPIs. Such dashboards need to load and respond to user interactions quickly. A dashboard’s poor loading performance affects its widgets as well as the connected database’s querying performance. In this blog post, I’ll show you how to handle millions of rows of data with Microsoft SQL Server’s database columnar storage and indexing.

Let’s look at:

Setting up Microsoft SQL Server database

Let’s take a sample data set of over 3.7 million records. For this example, we are using a marketing data set in a Microsoft SQL Server database running in a local machine with the following environments.

Operating System Windows 10
CPU 1.60GHz 1.80 GHz
RAM 16.0 GB 
Server Application Microsoft SQL Server Management Studio v17.9.1

Executing a simple count statement, it takes six seconds for the first run to complete query execution. The second run takes only three seconds since it utilizes the Microsoft SQL Server default caching operations of previous query results.

Query 1st run 2nd run
SELECT COUNT(Channel_Id) FROM Marketing_Details_new  6 Seconds 3 Seconds

You’ll find two more sample queries with recordings for testing purposes during both runs.

Query 1st run 2nd run
SELECT TOP 10 [Date], Sum(Cast([Expense] AS DECIMAL))

FROM Marketing_Details_new

GROUP BY [Date] ORDER BY [Date]
8 Seconds 3 Seconds
SELECT Sum(Cast([Expense] AS DECIMAL)) AS NEWCOLUMN

FROM Marketing_Details_new
9 Seconds 4 Seconds

Techniques for improved performance

Performance improvement techniques in SQL database

Using the following methods, we can speed up the data access and aggregation:

Indexing

A SQL index is a set of lookup tables designed to help users find items they require regularly. An index is a data structure that is small, fast, and designed for speedy searches. It’s great for linking relational tables and searching large databases.

By converting the rowstore to a columnstore, you can gain high performance for analytic queries that require scanning large amounts of data. Execute the following command to create the clustered columnstore.

CREATE CLUSTERED COLUMNSTORE INDEX MarketingColumnarIndex ON Marketing_Details_new;

To learn more about columnstores in Microsoft SQL Server, refer to the Microsoft documentation.

Starting with Microsoft SQL Server 2016, you can combine rowstore indices and columnstore indices on the same table. You can create a nonclustered row index on the same table by executing the following command. It will improve table seek performance, so queries that look for matching values or return a range of values will perform better in a row index than a columnar index.

CREATE INDEX MarketingIndex ON Marketing_Details_new([Date],[DestinationURL],[AdCost],

[Visitors],[Visitors after landing page],[Sessions],[Impressions],[Clicks],[Revenue],

[Expense],[Channel_Id],[Channel_Name],[Product_Used],[Leads],[Wons])

To verify whether the index has been created, try running the following command. This will describe the table details fully. At the bottom of the results, you can see that two indices are present.

sp_help [Marketing_Details_new]
Indexing performance

Now, rerun the sample queries to test the indexing performance.

Query Before Indexing After Indexing
SELECT COUNT(Channel_Id) FROM Marketing_Details_new 6 Seconds 1 Second
SELECT TOP 10 [Date], Sum(Cast([Expense] AS DECIMAL))

FROM Marketing_Details_new

GROUP BY [Date] ORDER BY [Date]
8 Seconds <1 Second
SELECT Sum(Cast([Expense] AS DECIMAL)) AS NEWCOLUMN

FROM Marketing_Details_new
9 Seconds <1 Second

You can see that there is monumental improvement in SQL query performance once the columnar indexing and row indexing are done. To learn more about combining column and row indexing for data warehousing needs, refer to this documentation.

Aggregated tables and views as data marts

Aggregated tables or views as data marts in SQL are database tables that contain aggregated values by combining multiple rows of data. They are used to perform dynamic calculations and reduce the input/output, CPU, RAM, and swapping requirements. Also, they increase the SQL query performance with large data sets.

Now, let’s run the following queries to create the aggregated table and view as data marts with the marketing data set.

SELECT CONVERT(Date, getdate()) AS [Date],

COUNT([DestinationURL]) AS [DestinationURL],

SUM([AdCost]) AS [AdCost],

SUM([Visitors]) AS [Visitors],

SUM([Visitors after landing page]) AS [Visitors after landing page],

SUM([Sessions]) AS [Sessions],

SUM([Impressions]) AS [Impressions],

SUM([Clicks]) AS [Clicks],

SUM([Revenue]) AS [Revenue],

SUM([Expense]) AS [Expense],

COUNT([Channel_Id]) AS [Channel_Id],

COUNT([Channel_Name]) AS [Channel_Name],

COUNT([Product_Used]) AS [Product_Used],

SUM([Leads]) AS [Leads],

SUM([Wons]) AS [Wons]INTO marketingAggregated

FROM Marketing_Details_new GROUP BY Date ORDER BY Date ASC;
CREATE VIEW [dbo].[Marketing_View]([Date],[DestinationURL],

[AdCost],[Visitors],[Visitors after landing page],[Sessions],

[Impressions],[Clicks],[Revenue],[Expense],[Channel_Id],

[Channel_Name],[Product_Used],[Leads],[Wons]) AS

SELECT CONVERT(Date, getdate()),COUNT([DestinationURL]),

SUM([AdCost]),

SUM([Visitors]),

SUM([Visitors after landing page]),

SUM([Sessions]),

SUM([Impressions]),

SUM([Clicks]),

SUM([Revenue]),

SUM([Expense]),

COUNT([Channel_Id]),

COUNT([Channel_Name]),

COUNT([Product_Used]),

SUM([Leads]),SUM([Wons])

FROM Marketing_Details_new GROUP BY Date;

Once the aggregated table or view is created, rerun the sample queries to test the SQL query performance.

Query Original table Aggregated table Aggregated view
SELECT COUNT(Channel_Id) FROM marketingAggregated 6 Seconds <1 Second 3 Seconds
SELECT TOP 10 [Date], Sum(Cast([Expense] AS DECIMAL))FROM marketingAggregatedGROUP BY [Date] ORDER BY [Date] 8 Seconds <1 Second 3 Seconds
SELECT Sum(Cast([Expense] AS DECIMAL)) AS NEWCOLUMNFROM marketingAggregated 9 Seconds <1 Second 3 Seconds

You can see the improvement in SQL query performance once the aggregated table is created. It takes less than a second to fetch the data from the aggregated table. To learn more about SQL aggregate functions, refer to this documentation.

Table partitioning

Table partitioning is a database process. Large tables are divided into smaller parts based on the values in one or more table columns. The table partitioning helps you maintain the large tables, reduce the overall response time, and read and load data for the SQL operations.

Once the partition is done, let’s rerun the sample queries to check the SQL query performance.

Query Before Partition After Partition
SELECT COUNT(Channel_Id) FROM Marketing_Details_new 6 Seconds 4 Seconds
SELECT TOP 10 [Date], Sum(Cast([Expense] AS DECIMAL))FROM Marketing_Details_newGROUP BY [Date] ORDER BY [Date] 8 Seconds 4 Seconds
SELECT Sum(Cast([Expense] AS DECIMAL)) AS NEWCOLUMNFROM Marketing_Details_new 9 Seconds 3 Seconds

To learn more about table partitioning in SQL, refer to this documentation.

Creating a dashboard using Microsoft SQL Server database

Let’s build a dashboard connecting to this Microsoft SQL Server database using Bold BI®. To learn about how to connect with Microsoft SQL Server from Bold BI, read this blog post.  Use the following widgets and data configuration to prepare a dashboard with the marketing data set.

Widget Title

Widget Type

Data Configuration

Date

Date Picker

  • Column(s): Date

Revenue vs. Expenses

KPI card

  • Value(s): Revenue and Expense
    • Aggregation: Sum

Visitors

Number card

  • ·Measure: Visitors
    • Aggregation: Sum

Sessions

Number card

  • Measure: Sessions
    • Aggregation: Sum

Clicks

Number card

  • Measure: Clicks
    • Aggregation: Sum

Leads vs. Won

Line chart

  • Value(s): Leads and Wins
    • Aggregation: Sum
  • Column(s): Date
    • Format: Year

Page Totals

Doughnut chart

  • ·Column(s): 
    • Sessions
    • Clicks
    • Impressions
    • Visitors

Details

Grid

  • Column(s): 
    • Channel_id
    • Channel_name
    • DestinationURL
  • Hidden Column(s): Date
    • Format: Date

Now, you can see the completed dashboard in the following image.

Marketing Summary Dashboard

Performance metrics

The average performance of this dashboard is calculated with the previous techniques. To calculate the query execution time in Microsoft SQL Server, use the SQL profiler trace option by connecting with the database server.

Average Time (in Seconds)

Processes

Original table

Indexing

Aggregated tables/views as data marts

Table partitioning

Query execution time in Microsoft SQL Server

10.94

3.43

0.02

5.85

Dashboard rendering time

13.77

6.88

3.09

6.07

Total display time in dashboard

24.71

10.31

3.11

11.92

Conclusion

We have used three techniques to improve the performance of SQL queries using a large data set. Every technique improves the handling of millions of data points. We find that aggregated tables or views as data marts load dashboards faster than the other techniques. But you can choose any of these techniques to increase the query performance for your case.

Bold BI® helps you integrate dashboards in your applications written in  ASP.NET CoreASP.NET MVCASP.NETRuby on Rails, and more, seamlessly. It will help you save time and prevent you from doing redundant work. Click this link to explore its features. To learn more about embedding dashboards into your applications, refer to this blog and our help documentation.

Get started with Bold BI by signing up for a free trial and create interactive business intelligence dashboards. You can contact us by submitting questions through the Bold BI website or, if you already have an account, you can log in to submit your support questions.

Top comments (0)