Learn The Basics and Best Practices of using the Sigma Computing Business Analysis Tool
Writers: Aliénor Lougerstay, Agustin Pace and Augusto Rosa
What is Sigma Computing
Sigma is a cloud-based analytics and business intelligence platform built for the Modern Data Stack. As opposed to reporting tools that use in-memory databases to keep the data available quickly, Sigma uses all the data in your cloud data warehouse to explore, analyze, collaborate, and share insights faster and easier than with legacy tools. Sigma uses the power of cloud based data warehouses like Snowflake Data Cloud to scale data queries effortlessly.
Sigma Top features
- Support for Snowflake, BigQuery, Redshift, Databricks, PostgreSQL, AlloyDB, and MySQL, and upload your own CSV files
- Excel Spreadsheet interface, plus dashboarding and reporting interfaces
- “Bring your own Data” with Input Tables
- Drill Anywhere
- Live Edit Collaboration
Sigma Best Practices for Large Scale Reporting
We found that Sigma’s dashboard-building best practices are best suited to ad hoc reporting needs. However, a different set of guidelines may be required to build out the reporting infrastructure needed by larger data organizations.
Here are some of the best practices that Infostrux has found to be effective at scale.
Often, larger organizations that have been relying on their data for decision-making for a long time have built reporting infrastructure that relies on a 1 underlying SQL query-to-1 chart relationship. This was optimal at a time when reporting tools were limited, and data analysts were the only point of contact with the data. With the evolving data landscape, everyone can become a data consumer, and tools now offer greater flexibility, which means some of those legacy systems will need to be refactored. What we recommend is to move towards a 1 dataset to 1 dashboard (or tab) relationship, meaning essentially 1 underlying SQL query to many charts.
Advantages of this include:
- Performance : With a single query to your data warehouse, the data for all charts in your dashboard is refreshed.
- Data consistency and accuracy : When relying on only one data source, your data is always consistent.
- Cross filtering : Clicking on one field will filter the rest of the dashboard, allowing easy exploration.
- Easier filtering : Instead of applying each filter to 10 charts with similar names, apply the filter once to the underlying dataset.
- Maintainability : Has your data source changed? Great, now you only have one place you need to update.
- Expandability : Updating the underlying dataset with, for example, an additional join can add additional insights while preserving all the advantages above, versus creating a brand new independent chart with its separate data connection that will not tie in.
Now that we have covered the desired query-to-chart relationship let’s explore why we recommended using datasets, a separate Sigma feature, instead of Sigma’s recommended approach of building your reporting table in a hidden tab and basing charts off that in-dashboard table.
Datasets
Let’s explore the reasons why it’s not always possible to get the dashboard we want, even when we have a great framework for building it.
Create datasets using Sigma’s internal tool, and once you create a dataset, you can either:
- Use Sigma’s dataset-building tool and connect directly to Snowflake tables
- Use Sigma’s dataset-building tool and connect to other datasets
- Write custom SQL queries
- Connect Sigma to UDFs for computing results
Building reports
When building reports, keep in mind that aggregations and grouping should be done on the visualization level, not the dataset.
Formatting data
- IDs : For unique IDs composed of digits only, format them as text at the dataset level otherwise, they will behave unexpectedly as they will be considered a number.
- Dates : Format your fields (ie currency fields, whole numbers, etc.) at the dataset level. This will save you time and make your dataset easier to use.
- Dates are timestamps regardless of the formatting. However, truncating removes the full timestamp info. Do not truncate dates at the dataset level. Truncate in the grouping on the chart level.
- Converting timestamp across timezones: To convert NTZ to EST, leave the original timestamp, as it contains the timezone info in its format, and create a secondary timestamp with the conversion. For example, our timestamp could be created_at, then we rename it to created_at_ntz, hide that column, and then create a new column in the dataset called created_at or created_at_est defined as follows: ConvertTimezone([CREATED AT NTZ],”America/New_York”
- Migration data advice: Week start day for weekly aggregations: for example, if you are migraine from another BI system that the date will be different from Sigma. For example, we had this case in the other BI tool that had the week start day as Monday. On Sigma, it was Sunday. When needing to truncate the date to a week, use this calculation: DateTrunc(“week_starting_monday”, [CREATED AT]))
Filtering data
- When filters don’t require end-user editing and are reused across all charts, such as “valid rows only” filters, it’s best to filter at the dataset level instead of the report level.
- If you’re creating end-user filters that target everything in the report, such as date filters that apply to all charts, it’s best to target the whole dataset instead of each individual chart. This will simplify validation and make the process easier.
- Where applicable, use synced filters. These filters enable the propagation of selected values from one tab to the others, streamlining the process and making it more efficient.
- Leverage default filter values to streamline your work. A useful tip is to select “Last 1 Month” for date filters, for example. This ensures that everyone who opens the dashboard will have useful data immediately, without the need to update the dates every time. For charts that are used to dive into item-level data, pre-filter on a single item to avoid long queries that load data for all items.
- When it comes to filtering datasets directly from the dashboard level and making the selected value accessible through the underlying Custom SQL statement, parameters can come in handy. However, it’s important to use them sparingly as they tend to add complexity to the process.
Filter for dynamic aggregations
Using a filter for dynamic aggregations: If the reporting tool that you are migrating from has Daily/Weekly/Monthly/Yearly aggregation, there are two options:
- 1. Using a filter : A filter will be at the report level and would be ideal if only one visualization needs the aggregation while the others are not aggregated. In this case, create a [Aggregation] column in the visualization or dataset you want to aggregate, and use the following custom field: DateTrunc([Aggregation], [TIMESTAMP_FIELD]) Then add a filter on the dashboard targeting that field, and use that field in your groupings to aggregate data using the desired granularity.
- 2. Using a parameter : A parameter will be selectable at the dashboard level but affect the underlying dataset itself. It’s best suited when all the charts for the dataset need to be aggregated. To use an aggregation parameter, you will need to create a new column inside the dataset using the same custom field as above. Then, create a filter targeting the dataset parameter in the dashboard and use that field in your groupings to aggregate data using the desired granularity.
Conclusion
Sigma Computing is a cloud-based reporting tool designed for the Modern Data Stack. Unlike other reporting tools, Sigma uses all the data in a cloud data warehouse to explore, analyze, collaborate, and share insights faster and easier.
Best practices for building dashboards include having one dataset matching one dashboard and N number of charts to aggregate data in different forms, faster performance, data consistency and accuracy, cross-filtering, easier filtering, and maintainability.
When building reports, aggregations and grouping should be done on the visualization level, not the dataset, and formatting data such as dates and IDs at the dataset level can save time and make the dataset easier to use.
At Infostrux, we recommend Sigma Computing if you are in the Snowflake eco-system as one of the friendlist to Snowflake.
Who we are:
- I’m Aliénor Lougerstay, Architect at Infostrux Solutions. You can follow me on LinkedIn.
- I’m Agustin, Senior Engineer at Infostrux Solutions. You can follow me on LinkedIn.
- I’m Augusto Rosa, VP of Engineering for Infostrux Solutions. You can follow me on LinkedIn.
Thanks for reading our blog post. Subscribe to Infostrux Medium Blogs https://medium.com/infostrux-solutions for the most interesting Data Engineering and Snowflake news.
Top comments (0)