De' Clerke

Posted on Jun 7

I Shipped 12 BI Dashboards With 5 Different Tools. Here Is the Honest Comparison.

#datascience #dataengineering #python #sql

Most BI tool comparisons are written by someone who spent a weekend with each option, deployed a toy dataset, and wrote up their impressions. This is not that.

Over the past three months I shipped 12 dashboards across 5 tools: Streamlit, Plotly Dash, Apache Superset, Evidence.dev, and Grafana. Each one was the visualization layer on a real data pipeline with Airflow, DuckDB, dbt, and live API data. I ran into real failures, real deployment constraints, and real differences in where each tool fits -- and where it does not.

This is that article.

The Setup

Every project I built follows the same general pattern: an Airflow 3.0 pipeline pulls data from somewhere, dbt transforms it into mart tables, and a visualization layer sits on top. The question was always: what goes in that last layer?

Here is what I ended up using and why:

Tool	Projects	Stack
Streamlit	Kenya Fiscal Intelligence, Kenya Human Development, Kenya Agricultural Pulse, EAC Economic Lens, LoanRisk Analytics, Kenya Tenders	Python + WB API + DuckDB
Plotly Dash	Call Center Analytics	DuckDB + dbt
Apache Superset	Kenya Real Estate Pipeline, Ecommerce Analytics	PostgreSQL + DuckDB + dbt
Evidence.dev	BizPulse Kenya, Kenya Economic Pulse, LedgerSync	PostgreSQL + Delta Lake
Grafana	Saruk Electronics Tracker	PostgreSQL + dbt

None of these were chosen at random. Each one came from a specific requirement -- deployment target, interactivity level, pipeline stack, or time constraints. Let me go through each one.

Streamlit: The Data Engineer's Default

Streamlit is where I start when the requirement is flexible and the timeline is tight. It is Python all the way down, which means I can query DuckDB, call an API, run a pandas transformation, and render a chart in the same file. No context switching.

For the Kenya BI Dashboards series I built four separate Streamlit apps, each pulling live data from the World Bank REST API:

@st.cache_data(ttl=86400)
def fetch_indicator(country: str, indicator: str) -> pd.DataFrame:
    url = f"https://api.worldbank.org/v2/country/{country}/indicator/{indicator}"
    r = requests.get(url, params={"format": "json", "per_page": 100}, timeout=15)
    data = r.json()[1]
    return pd.DataFrame([
        {"year": int(d["date"]), "value": d["value"]}
        for d in data if d["value"]
    ]).sort_values("year")

The ttl=86400 means the API is called once per day. On Streamlit Cloud, the parquet files I commit to the repo serve as the cold-start fallback. That pattern -- cache aggressively, commit a data snapshot -- is what makes Streamlit Cloud viable for production.

What Streamlit does well: iteration speed, Python-native logic in the dashboard, @st.cache_data for heavy computations, st.session_state for multi-page state persistence, and one-click deploy to Streamlit Cloud.

Where it gets awkward: reactive filtering. In Streamlit, every widget interaction re-runs the entire script. For simple filters this is fine. Once you need dependent dropdowns, cross-filter behavior between charts, or real callback logic, the model starts to feel wrong. That is where Dash earns its place.

The Plotly 6 breaking changes. Every Streamlit project I built in 2026 hit these. Three things changed silently between Plotly 5 and 6:

First, numpy.bool_ is no longer accepted in layout parameters. If you compute a boolean from pandas and pass it directly to a Plotly call, you get a TypeError. Wrap with bool().

Second, titlefont is removed. The old syntax fig.update_layout(titlefont=dict(size=16)) silently does nothing in Plotly 6. The replacement is fig.update_layout(title=dict(font=dict(size=16))).

Third, 8-digit hex colors silently drop the alpha channel. fillcolor="#00d26a40" no longer works. You need fillcolor="rgba(0,210,106,0.25)". I wrote a small helper that I now copy into every project:

def hex_to_rgba(hex_color: str, alpha: float = 1.0) -> str:
    h = hex_color.lstrip("#")
    if len(h) == 3:
        h = "".join(c * 2 for c in h)
    r, g, b = int(h[0:2], 16), int(h[2:4], 16), int(h[4:6], 16)
    return f"rgba({r},{g},{b},{alpha})"

None of these raise loud errors. They just produce wrong output. Check Plotly version first if your charts look off after an upgrade.

Plotly Dash: When You Need Real Reactivity

The Call Center Analytics project needed something Streamlit could not deliver cleanly: a five-page dashboard where filtering by date range on page one should update the agent leaderboard on page three, and where clicking a row in a table should drill into that agent's trend line.

That is callbacks, and callbacks are Dash's native model.

@callback(
    Output("agent-trend", "figure"),
    Input("agent-table", "active_cell"),
    State("agent-table", "data"),
)
def update_agent_trend(active_cell, table_data):
    if not active_cell:
        return go.Figure()
    agent = table_data[active_cell["row"]]["agent_name"]
    dff = df[df["agent_name"] == agent]
    return px.line(dff, x="date", y="calls_resolved", title=f"{agent} -- Daily Resolution")

The Input / Output / State model is explicit about data flow in a way that Streamlit's implicit re-run is not. For complex interactivity, that explicitness is a feature.

What Dash does well: reactive multi-page apps, the DataTable component (sortable, filterable, paginated, no extra libraries), cross-filtering between charts, and fine-grained control over what triggers what.

Where it gets awkward: deployment. Dash is a Flask server. You need to manage process lifecycle, reverse proxying if you want HTTPS, and there is no equivalent of Streamlit Cloud's one-click deploy. For internal tools or Docker-hosted projects it is fine. For public-facing demos that need to be live at a URL, Streamlit Cloud wins on friction.

One gotcha with Dash Bootstrap Components: dbc.themes.DARKLY sets a dark theme, but Plotly figures still need template="plotly_dark" and paper_bgcolor set manually. The Bootstrap theme does not propagate into the Plotly canvas.

Apache Superset: When Your Stakeholders Use the Dashboard

The Kenya Real Estate Pipeline and the Ecommerce Analytics project both ended up on Superset, and for the same reason: the expected audience was non-technical. Superset has a point-and-click chart builder. You do not need to write code to add a filter or change a chart type. A business analyst can use it without opening a terminal.

Setup is Docker Compose:

superset:
  image: apache/superset:4.1.1
  environment:
    SUPERSET_SECRET_KEY: "change-this-in-production"
  ports:
    - "8088:8088"

Then three commands on first run:

docker exec -it superset superset db upgrade
docker exec -it superset superset fab create-admin \
  --username admin --email admin@admin.com --password admin
docker exec -it superset superset init

Connecting Superset to DuckDB requires duckdb-engine installed in the Superset container, and the connection string has one critical detail: use ?read_only=true. Airflow writes to the DuckDB file while Superset reads it. Without the read-only flag you will hit file lock errors mid-query.

duckdb:////app/data/analytics.duckdb?read_only=true

What Superset does well: chart library is deep (30+ chart types including time-series, heatmaps, treemaps, geospatial), SQL Lab for ad-hoc queries, role-based access control, and it looks polished out of the box.

Where it gets awkward: it is heavy. Seven Docker services minimum. Initial build takes several minutes. The Explore page loses unsaved changes on refresh (no autosave). And the DuckDB integration is second-class compared to PostgreSQL -- time grains do not work with DuckDB's date_trunc, so you end up writing custom SQL for what should be a dropdown option.

For PostgreSQL backends, Superset is near-perfect. For DuckDB, use it with that read-only caveat and accept the limitations.

Evidence.dev: When the Story Is in the Data

Evidence.dev takes a different approach from everything else on this list. You write SQL query blocks directly in Markdown files, and the results become available as variables that feed components:

```sql debt_trend
SELECT year, govt_debt_pct_gdp, interest_pct_revenue
FROM gold.fiscal_summary
WHERE country = 'KEN'
ORDER BY year
```

<LineChart data={debt_trend} x="year" y="govt_debt_pct_gdp" />

Kenya's debt-to-GDP ratio reached **{debt_trend[debt_trend.length-1].govt_debt_pct_gdp.toFixed(1)}%** in {debt_trend[debt_trend.length-1].year}.

The key insight is that the narrative and the data live in the same file. You write around the numbers. This is the right model for analytical reports -- fiscal briefings, quarterly reviews, data quality documentation -- where the goal is communication, not exploration.

I used Evidence.dev on three projects: BizPulse Kenya (weekly sentiment briefing), Kenya Economic Pulse (macro indicator report), and LedgerSync (fiscal reconciliation audit report). All three had the same shape: a data engineering pipeline produced the numbers, and Evidence.dev turned those numbers into a readable document with charts embedded in the prose.

What Evidence.dev does well: the SQL-in-Markdown model is genuinely fast for static reports, Svelte under the hood means the output is a fast static site, and deploy to Vercel is one command.

The production gotcha. Evidence.dev 40.x has a broken dev server. Running npm run dev throws a lodash ESM error:

Error [ERR_REQUIRE_ESM]: require() of ES Module .../lodash-es/lodash.js

The fix is to skip the dev server entirely and use build-and-serve:

npm run build && npx serve build

This is not documented prominently. It took me longer to find than it should have. The dev server issue is a known regression in the 40.x line.

Where Evidence.dev gets awkward: interactivity is limited. Dropdown filters and date pickers exist, but anything complex requires writing Svelte components. If your users need to explore the data -- not just read a pre-built narrative -- use one of the other tools.

Grafana: When Your Data Is Already in Postgres and You Need Operational Monitoring

The Saruk Electronics Tracker pipeline writes daily price history to PostgreSQL. The natural choice for monitoring that kind of time-series operational data is Grafana. Every chart is a SQL query against the database. The panels auto-refresh. There is no Python to write.

SELECT
    scraped_at AS __time,
    AVG(price_kes) AS avg_price,
    category
FROM price_history
WHERE scraped_at BETWEEN $__timeFrom() AND $__timeTo()
GROUP BY DATE_TRUNC('day', scraped_at), category
ORDER BY __time

The $__timeFrom() and $__timeTo() macros are Grafana's time range variables. The time range picker in the dashboard header drives them automatically.

The Grafana 13 PostgreSQL gotcha. This one cost me hours. Grafana 13 rewrote the PostgreSQL plugin as grafana-postgresql-datasource, and it now requires the database name in jsonData.database in addition to the top-level database field. The health check passes either way. The error only surfaces when a panel runs its first query:

You do not currently have a default database configured.

The fix is in your provisioning YAML:

datasources:
  - name: PostgreSQL
    type: grafana-postgresql-datasource
    uid: postgres-ds
    url: postgres:5432
    database: analytics
    jsonData:
      database: analytics   # this line is the fix
      sslmode: disable
      postgresVersion: 1500

The second gotcha: do not use template variable substitution for the datasource UID in provisioned dashboards. ${DS_POSTGRESQL} does not resolve correctly when dashboards are loaded from a provisioning directory. Hardcode the UID in every panel's datasource field instead.

What Grafana does well: time-series visualization is its native language, alerting is built-in, the dashboard JSON is version-controllable, and it runs on almost nothing resource-wise compared to Superset.

Where it gets awkward: non-time-series reports feel forced. Grafana is optimized for "how is this metric behaving over time." For cross-sectional analysis, category breakdowns, or anything that looks like a report rather than a monitoring panel, the other tools are better.

The Decision Framework

After 12 projects, here is how I think about the choice:

Who uses the dashboard?

You (the engineer) or a technical teammate: Streamlit or Grafana
A business analyst who needs to build their own charts: Superset
An exec or stakeholder reading a report: Evidence.dev
Someone who needs complex cross-filtering: Dash

What does the underlying data look like?

Time-series operational data in PostgreSQL: Grafana
Analytical mart tables in DuckDB: Streamlit or Superset (read-only)
Aggregate report data: Evidence.dev
Complex relational data requiring SQL exploration: Superset SQL Lab

Where does it need to run?

Publicly accessible URL, no server to manage: Streamlit Cloud or Evidence.dev on Vercel
Internal tool, Docker Compose is fine: any of them
Embedded in another application: Dash (Flask) or Streamlit (embeddable via iframe)

How long do you have?

Under a day: Streamlit
A few days, complex interactivity required: Dash
A few days, non-technical audience: Superset
Writing a data narrative: Evidence.dev

What I Would Change

If I were starting over, I would reach for Streamlit by default earlier and stop second-guessing it. It handles 80% of dashboard requirements and deploys in minutes. The re-run model is a constraint, but it is a constraint you work around once and then forget.

I would also set up the hex_to_rgba helper and the Plotly 6 compatibility checks at the start of every project instead of discovering them mid-build. The Plotly 6 changes are not loud. They silently produce wrong output. That is the worst kind of bug in a visualization layer.

Evidence.dev is underused in the data engineering community. If you are building an end-of-sprint data summary, a pipeline audit report, or any kind of structured analytical document, it is faster than any of the other tools for that use case. The SQL-in-Markdown model is genuinely good.

Grafana is the right choice exactly when you are already using PostgreSQL and you need time-series monitoring. Outside that narrow case, the ergonomics work against you.

Superset is the right choice when the audience is non-technical and you need a real chart builder. The Docker setup cost is real but one-time. After that, analysts can build their own views without bothering you.

The code patterns behind all of these -- caching strategies, Plotly 6 compatibility, the WB API direct-request pattern, DuckDB connection strings, Evidence.dev build workarounds, Grafana 13 provisioning -- are in my BI and Data Analysis cheatsheet along with the rest of my reference docs.

If you have questions about any of these tools or want to see the full pipeline code, the repos are all public on my GitHub. Follow me on dev.to for more articles from real data engineering projects.