<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Albert Wong</title>
    <description>The latest articles on DEV Community by Albert Wong (@albertatstarrocks).</description>
    <link>https://dev.to/albertatstarrocks</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1253783%2Ffdac30bc-993a-47bd-b969-489c4d2f6297.jpg</url>
      <title>DEV Community: Albert Wong</title>
      <link>https://dev.to/albertatstarrocks</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/albertatstarrocks"/>
    <language>en</language>
    <item>
      <title>Popular Open Source replacements for Business Intelligence Tools Power BI, Tableau, Looker, MicroStrategy</title>
      <dc:creator>Albert Wong</dc:creator>
      <pubDate>Tue, 23 Jan 2024 19:07:11 +0000</pubDate>
      <link>https://dev.to/albertatstarrocks/popular-open-source-replacements-for-business-intelligence-tools-power-bi-tableau-looker-microstrategy-220p</link>
      <guid>https://dev.to/albertatstarrocks/popular-open-source-replacements-for-business-intelligence-tools-power-bi-tableau-looker-microstrategy-220p</guid>
      <description>&lt;p&gt;There are the top 4 open source replacements for Microsoft Power BI, Tableau, Looker, MicroStrategy and other Business Intelligence and Data Visualization Tools.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Apache SuperSet&lt;/li&gt;
&lt;li&gt;Metabase&lt;/li&gt;
&lt;li&gt;Lightdash&lt;/li&gt;
&lt;li&gt;Streamlit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Apache Superset is a modern, open-source data exploration and data visualization platform. It is designed to be fast, lightweight, and easy to use, making it a good choice for users of all skill levels, from data analysts to business executives.&lt;/p&gt;

&lt;p&gt;Superset provides a variety of features for data exploration and visualization, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A powerful SQL editor for querying data from a variety of sources, including relational databases, data lakes, and cloud storage platforms.&lt;/li&gt;
&lt;li&gt;A drag-and-drop interface for creating charts and dashboards.&lt;/li&gt;
&lt;li&gt;A variety of built-in chart types, including line charts, bar charts, pie charts, and geospatial charts.&lt;/li&gt;
&lt;li&gt;The ability to create custom charts and dashboards using Python.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Superset also provides a number of features that make it easy to share and collaborate on data insights, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The ability to publish charts and dashboards to a web server.&lt;/li&gt;
&lt;li&gt;The ability to export charts and dashboards to PDF, CSV, and other formats.&lt;/li&gt;
&lt;li&gt;The ability to create teams and roles to control access to data and insights.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Superset is a popular choice for data exploration and visualization because it is easy to use, powerful, and flexible. Read more about Apache Superset at &lt;a href="http://superset.apache.org"&gt;http://superset.apache.org&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Metabase is an open-source business intelligence platform that helps you and your team make better decisions with data. It’s easy to use, powerful, and flexible.&lt;/p&gt;

&lt;p&gt;Here are some of the things you can do with Metabase:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ask questions about your data without writing SQL.&lt;/li&gt;
&lt;li&gt;Create beautiful visualizations of your data.&lt;/li&gt;
&lt;li&gt;Share your insights with others with interactive dashboards.&lt;/li&gt;
&lt;li&gt;Collaborate on data analysis with your team.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Metabase is used by companies of all sizes, from startups to Fortune 500 companies. Read more about Metabase at &lt;a href="http://metabase.com"&gt;http://metabase.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lightdash is an open-source business intelligence platform that makes data exploration and analytics accessible to everyone. It is built on top of dbt, a popular data transformation tool, and provides a variety of features to help users make better decisions with their data.&lt;/p&gt;

&lt;p&gt;Some of the key features of Lightdash include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Self-service data exploration: Lightdash allows users to explore their data without writing any SQL.&lt;/li&gt;
&lt;li&gt;Powerful data visualization: Lightdash provides a variety of built-in charts and dashboards, as well as the ability to create custom visualizations.&lt;/li&gt;
&lt;li&gt;Collaboration and sharing: Lightdash makes it easy to share data and insights with others.&lt;/li&gt;
&lt;li&gt;Security and governance: Lightdash provides a variety of features to help organizations keep their data secure and governed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read more about Lightdash at &lt;a href="http://lightdash.com"&gt;http://lightdash.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps.&lt;/p&gt;

&lt;p&gt;Streamlit is built on top of the Python programming language and uses a number of popular Python libraries, such as NumPy, Pandas, and Matplotlib. This makes it easy for Python developers to get started with Streamlit and to create powerful data apps.&lt;/p&gt;

&lt;p&gt;Streamlit provides a number of features that make it easy to create and share data apps, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Interactive widgets: Streamlit provides a variety of interactive widgets that can be used to create user interfaces for data apps. These widgets include buttons, sliders, text boxes, and drop-down menus.&lt;/li&gt;
&lt;li&gt;Data visualization: Streamlit provides a variety of built-in charts and dashboards that can be used to visualize data. Streamlit also supports custom visualizations using Plotly and other Python libraries.&lt;/li&gt;
&lt;li&gt;Deployment: Streamlit makes it easy to deploy data apps to the web. Streamlit apps can be deployed to a variety of hosting providers, including Heroku, AWS Elastic Kubernetes Service (EKS), and Google Kubernetes Engine (GKE).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Read more about Streamlit at &lt;a href="http://streamlit.io"&gt;http://streamlit.io&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of course you’ll need to pair these Business Intelligence Tools with an OLAP database. I recommend StarRocks.&lt;/p&gt;

&lt;p&gt;StarRocks is a next-generation, massively parallel processing (MPP) database designed to make real-time analytics easy for enterprises. It is built to power sub-second queries at scale, making it ideal for a wide range of use cases, including user-facing analytics, real-time dashboards, ad hoc querying, and machine learning models.&lt;/p&gt;

&lt;p&gt;StarRocks offers a number of advantages over other databases for real-time analytics, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High concurrency: StarRocks can handle a large number of concurrent users without sacrificing performance.&lt;/li&gt;
&lt;li&gt;Real-time insights: StarRocks can provide real-time insights into data by using a variety of techniques, such as incremental aggregation and materialized views.&lt;/li&gt;
&lt;li&gt;Fresh mutable data: StarRocks can handle fresh mutable data, which is data that is constantly changing.&lt;/li&gt;
&lt;li&gt;Efficient resource isolation: StarRocks can efficiently isolate resources so that different users do not interfere with each other’s queries.&lt;/li&gt;
&lt;li&gt;Ease of use: StarRocks is easy to use and administer. It provides a variety of tools and features to make it easy to get started and to manage your cluster.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is used by AirBnB to replace their Apache Druid, Clickhouse and Trino environments. Read more about StarRocks and the Apache Druid, Clickhouse and Trino replacement use case at &lt;a href="http://starrocks.io"&gt;http://starrocks.io&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. StarRocks received InfoWorld’s 2023 BOSSIE Award for best open source software.&lt;/p&gt;

</description>
      <category>analytics</category>
    </item>
    <item>
      <title>OLAP cubes are dead; change my mind</title>
      <dc:creator>Albert Wong</dc:creator>
      <pubDate>Tue, 23 Jan 2024 18:48:31 +0000</pubDate>
      <link>https://dev.to/albertatstarrocks/olap-cubes-are-dead-change-my-mind-411k</link>
      <guid>https://dev.to/albertatstarrocks/olap-cubes-are-dead-change-my-mind-411k</guid>
      <description>&lt;p&gt;A database cube, also known as a data cube, is a multidimensional data structure that is used for data analysis and reporting. It organizes data into dimensions and measures, which allows users to slice and dice the data to gain insights into their business.&lt;/p&gt;

&lt;p&gt;Database cubes were popular in the early days of data warehousing, but they have largely been replaced by other technologies, such as columnar databases (Snowflake, StarRocks) and distributed computing frameworks (Apache Spark).&lt;/p&gt;

&lt;p&gt;There are a few reasons for the decline of database cubes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They are complex and expensive to set up and maintain. Database cubes require specialized hardware and software, and they can be difficult to scale to large datasets.&lt;/li&gt;
&lt;li&gt;They are not well-suited for real-time analytics. Database cubes are typically used for batch processing, which means that they cannot provide real-time insights into data.&lt;/li&gt;
&lt;li&gt;They are not as flexible as other technologies. Database cubes are typically designed for specific use cases, such as sales analysis or financial reporting. They can be difficult to adapt to new requirements or to support user-facing analytics (self-service analytics or ad-hoc queries).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Other technologies, such as columnar databases and distributed computing frameworks, offer a number of advantages over database cubes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They are more scalable and cost-effective. Columnar databases and distributed computing frameworks can be used to process large datasets efficiently and at a lower cost.&lt;/li&gt;
&lt;li&gt;They are more flexible. Columnar databases and distributed computing frameworks can be used for a variety of use cases, including real-time analytics and machine learning.&lt;/li&gt;
&lt;li&gt;They are easier to use. Columnar databases and distributed computing frameworks are more user-friendly than database cubes, and they can be used with a variety of programming languages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Overall, database cubes are a legacy technology that has been largely replaced by other technologies. Other technologies offer a number of advantages over database cubes, including scalability, flexibility, cost-effectiveness, and ease of use.&lt;/p&gt;

&lt;p&gt;Here are some examples of technologies that can be used to replace database cubes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Columnar databases, such as Apache Parquet and open source OLAP databases like StarRocks&lt;/li&gt;
&lt;li&gt;Distributed computing frameworks, such as Apache Spark and Hadoop&lt;/li&gt;
&lt;li&gt;In-memory databases, such as Redis and Memcached&lt;/li&gt;
&lt;li&gt;Cloud-based data warehouses, such as Google BigQuery and Amazon Redshift&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are considering using a database cube, I recommend that you evaluate the alternatives carefully. Other technologies may offer a better fit for your needs.&lt;/p&gt;

&lt;p&gt;Read more about StarRocks at &lt;a href="http://starrocks.io"&gt;http://starrocks.io&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. StarRocks received InfoWorld’s 2023 BOSSIE Award for best open source software.&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>database</category>
    </item>
    <item>
      <title>Trino / Presto (PrestoDB) alternatives</title>
      <dc:creator>Albert Wong</dc:creator>
      <pubDate>Tue, 23 Jan 2024 18:34:36 +0000</pubDate>
      <link>https://dev.to/albertatstarrocks/trino-presto-prestodb-alternatives-58h2</link>
      <guid>https://dev.to/albertatstarrocks/trino-presto-prestodb-alternatives-58h2</guid>
      <description>&lt;p&gt;When it comes to Trino and PrestoDB competitors, it’s important to consider them within two main categories:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Direct Competitors: These are tools that offer similar functionality and target the same use cases as Trino, primarily focusing on distributed SQL querying across data warehouses and data lakes. Here are some of the biggest names:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Starburst: Founded by the creators of the original Presto project, Starburst is a well-established commercial distribution of Trino with additional features and enterprise support.&lt;/li&gt;
&lt;li&gt;Dremio: This cloud-native data lake engine uses a query engine similar to Trino and focuses on simplifying data access and governance.&lt;/li&gt;
&lt;li&gt;Ahana: Acquired by Cloudera in 2020, Ahana is another commercial distribution of Trino offering enterprise features and integrations with Cloudera’s data platform.&lt;/li&gt;
&lt;li&gt;Kloudfuse: Focused on data virtualization, Kloudfuse leverages Trino as its query engine while providing a unified data access layer across various sources.&lt;/li&gt;
&lt;/ul&gt;

&lt;ol&gt;
&lt;li&gt;Alternative Technologies: While not direct competitors in the strictest sense, these tools cater to similar needs with different approaches:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;Apache Spark: This popular distributed processing framework can also be used for SQL-like queries on large datasets, but requires more coding knowledge compared to Trino’s pure SQL interface.&lt;/li&gt;
&lt;li&gt;PrestoDB: Although closely related to Trino, this fork of the original Presto project takes a different development path and caters to a distinct community.&lt;/li&gt;
&lt;li&gt;Other SQL Engines: Traditional OLAP database engines like open source StarRocks, Snowflake or Redshift may be considered alternatives for specific use cases, depending on data size, scalability, and cost requirements.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to understand more on why StarRocks is different than Trino, check out &lt;a href="https://www.starrocks.io/blog/comparison-starrocks-vs-trino"&gt;https://www.starrocks.io/blog/comparison-starrocks-vs-trino&lt;/a&gt;&lt;/p&gt;

</description>
      <category>analytics</category>
      <category>database</category>
    </item>
    <item>
      <title>Data Lakehouse using Open Source StarRocks</title>
      <dc:creator>Albert Wong</dc:creator>
      <pubDate>Tue, 23 Jan 2024 18:26:35 +0000</pubDate>
      <link>https://dev.to/starrocks/data-lakehouse-using-open-source-starrocks-4b11</link>
      <guid>https://dev.to/starrocks/data-lakehouse-using-open-source-starrocks-4b11</guid>
      <description>&lt;p&gt;A data lakehouse is a revolutionary data architecture that merges the best of both data lakes and data warehouses. Think of it as a single, comprehensive data "home" where you can store, process, and analyze all your data – structured, unstructured, and semi-structured – in a flexible and efficient way.&lt;/p&gt;

&lt;p&gt;Value of Data Lakehouses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Democratized data access: Everyone, from data scientists to business analysts, can access and explore all data in one place.&lt;/li&gt;
&lt;li&gt;Increased agility and insights: Analyze data as needed, regardless of schema or format, leading to faster discovery and innovation.&lt;/li&gt;
&lt;li&gt;Reduced costs and complexity: Eliminates the need for multiple data platforms, streamlining data management and reducing overhead.&lt;/li&gt;
&lt;li&gt;Faster and more accurate analytics: Leverage diverse data sources to build richer models and make better data-driven decisions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How StarRocks Uniquely Solves Data Lakehouse Challenges:&lt;/p&gt;

&lt;p&gt;Traditional data lakehouses often face these hurdles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Performance bottlenecks: Processing large volumes and diverse data formats can be slow and cumbersome.&lt;/li&gt;
&lt;li&gt;High operational costs: Scaling and managing a complex data lakehouse infrastructure can be expensive.&lt;/li&gt;
&lt;li&gt;Limited accessibility: Non-technical users might struggle to navigate and analyze data effectively.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;StarRocks tackles these challenges with its unique capabilities:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Hybrid storage architecture: Combines columnar storage for performance with row-based storage for flexibility, handling structured and unstructured data efficiently.&lt;/li&gt;
&lt;li&gt;Massively scalable architecture: Scales horizontally to handle petabytes of data and millions of concurrent users effortlessly.&lt;/li&gt;
&lt;li&gt;Real-time analytics: Processes data streams in real-time, enabling instant insights and reactive decision-making.&lt;/li&gt;
&lt;li&gt;Easy-to-use tools: Provides intuitive dashboards and visualizations for self-service analytics, empowering all users.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Data lakehouses hold the key to unlocking the full potential of your data, and StarRocks offers a unique solution to overcome the usual obstacles. Its sub-second query engine, hybrid storage, scalability, real-time processing, and user-friendly tools make it a powerful platform for building a truly unified and insightful data lakehouse.&lt;/p&gt;

&lt;p&gt;StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.&lt;/p&gt;

</description>
      <category>datalakehouse</category>
      <category>starrocks</category>
      <category>opensource</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Breaking Free from Proprietary Clouds (Snowflake, RedShift, BigQuery): Top Open Source Alternatives to OLAP Databases</title>
      <dc:creator>Albert Wong</dc:creator>
      <pubDate>Wed, 10 Jan 2024 22:37:13 +0000</pubDate>
      <link>https://dev.to/albertatstarrocks/breaking-free-from-proprietary-clouds-snowflake-redshift-bigquery-top-open-source-alternatives-to-olap-databases-23e3</link>
      <guid>https://dev.to/albertatstarrocks/breaking-free-from-proprietary-clouds-snowflake-redshift-bigquery-top-open-source-alternatives-to-olap-databases-23e3</guid>
      <description>&lt;p&gt;While cloud-based OLAP databases like Snowflake, RedShift, and BigQuery offer convenience, they often come with vendor lock-in and escalating costs. Fortunately, a thriving landscape of open source alternatives empowers you to take control of your data warehouse and unlock significant cost savings. Here's a look at some of the leading contenders:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1px5ps4e0y2yqeubs1l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv1px5ps4e0y2yqeubs1l.png" alt="Image description" width="700" height="362"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ClickHouse: Enterprise-Grade Performance and Scalability&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Unleash lightning-fast queries: Achieve millisecond-level response times, even with billions of rows.&lt;/li&gt;
&lt;li&gt;Columnar storage for efficiency: Optimize performance for analytical workloads with column-based data organization.&lt;/li&gt;
&lt;li&gt;Handle diverse data types: Seamlessly analyze structured, semi-structured, and geospatial data.&lt;/li&gt;
&lt;li&gt;Scalability without limits: Effortlessly scale horizontally to handle ever-growing datasets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s8en6i8qf1lphoe253t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4s8en6i8qf1lphoe253t.png" alt="Image description" width="720" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;StarRocks: Blazing Fast Analytics for Massive Datasets&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MPP architecture for parallel processing: Distribute workloads across multiple nodes for unparalleled speed and scalability.&lt;/li&gt;
&lt;li&gt;Seamless integration with data lakes: Query data directly from your data lake, eliminating data movement.&lt;/li&gt;
&lt;li&gt;Compatible with popular BI tools: Connect with Tableau, Power BI, and more for seamless visualization and analysis.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;DuckDB: Lightweight and Embedded Analytics&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ideal for smaller datasets and embedded use cases: Delivers fast performance for smaller-scale analytics or integration within applications.&lt;/li&gt;
&lt;li&gt;Zero-configuration setup: Get started quickly without complex installation or configuration.&lt;/li&gt;
&lt;li&gt;SQL support for familiarity: Use familiar SQL syntax for querying and data manipulation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Choose Your Open Source Adventure:&lt;/p&gt;

&lt;p&gt;The right open source OLAP database for you depends on your specific needs and infrastructure. Evaluate factors such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data volume and query complexity&lt;/li&gt;
&lt;li&gt;Performance requirements&lt;/li&gt;
&lt;li&gt;Scalability needs&lt;/li&gt;
&lt;li&gt;Cloud or on-premises deployment&lt;/li&gt;
&lt;li&gt;Integration with existing tools and technologies&lt;/li&gt;
&lt;li&gt;Explore these open source options to harness the power of analytics without compromising cost, flexibility, or control.&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>dataengineering</category>
      <category>datalakehouse</category>
      <category>datawarehouse</category>
      <category>starrocks</category>
    </item>
  </channel>
</rss>
