Bruno Enrique ANCCO SUAÑA

Posted on Sep 14

📊Beyond the Standard: Exploring Modern Python Visualization Tools

#python #devops #datascience #tutorial

In the world of data science, moving from a static Jupyter notebook to a dynamic, interactive web application is a game-changer. It allows stakeholders to explore data, test hypotheses, and gain insights on their own. While tools like Tableau or Power BI have their place, a code-first approach using Python offers unparalleled flexibility and power.

This article dives into three powerful Python libraries for building dashboards and reports: Streamlit, Dash, and Bokeh. We'll explore the philosophy behind each, build a simple interactive dashboard with all three, and walk through deploying our app to the cloud, complete with a GitHub repository and CI/CD automation.

🚀 The Contenders
Let's meet our three visualization frameworks. Each has a unique approach to turning Python scripts into interactive web apps.

1. Streamlit ✨
The Pitch: The fastest way to build and share data apps.

Streamlit is the go-to for data scientists who want to create beautiful, functional apps with minimal effort and without thinking about traditional web development. Its core philosophy is simplicity. You write a Python script as you normally would, and Streamlit intelligently re-runs your code from top to bottom whenever a user interacts with a widget.

Pros:

✅Aidiculously easy to learn.
✅Minimal boilerplate code.
✅Automatic updates on widget interaction.
✅A rich ecosystem of custom components.

Cons:

❌The "rerun everything" model can be inefficient for very complex or long-running apps.
❌Less control over the fine-grained layout and styling compared to others.

2. Plotly Dash 📊
The Pitch: Build analytical web apps for Python. No JavaScript required.

Dash, created by the team behind the Plotly charting library, is a powerful framework for building production-ready, enterprise-grade applications. It provides a more structured "blank canvas" where you define the layout using Python classes that mimic HTML and then connect interactive components using explicit "callbacks."

Pros:

✅Highly customizable and flexible layouts.
✅Scalable for complex, multi-page applications.
✅Excellent for apps requiring precise state management.
✅Part of the robust Plotly ecosystem.

Cons:

❌Steeper learning curve with more boilerplate code.
❌Requires a deeper understanding of concepts like layouts and callbacks.

3. Bokeh 🎨
The Pitch: Interactive visualizations for modern web browsers.

Bokeh is, at its core, a visualization library, but it comes with a powerful server component that allows you to build full-fledged interactive applications. It excels at handling large datasets and streaming data efficiently. Its strength lies in the granular control it gives you over every plot element and its powerful data linking and selection tools.

Pros:

✅Excellent for high-performance interactivity on large datasets.
✅Provides a high level of control over plot design and interactions.
✅Can be used as a standalone library or with its server for full apps.

Cons:

❌Can be more verbose for creating full dashboard layouts compared to Streamlit.
❌The API can feel less "Pythonic" initially than Streamlit's.

🐧 The Demo Project: Palmer Penguins Explorer
To compare these tools, we'll build the same simple app in all three: an interactive scatter plot explorer for the famous Palmer Penguins dataset. Users will be able to select the species to display and choose the variables for the X and Y axes.

First, let's install our libraries:

pip install streamlit pandas plotly-express dash bokeh

1. Streamlit Example
Notice how clean and readable this is. We use st.sidebar to place our widgets and the main area for the plot. The code reads like a simple script.

streamlit_app/app.py

import streamlit as st
import pandas as pd
import plotly.express as px

# Load the dataset
@st.cache_data # Cache the data to improve performance
def load_data():
    df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv')
    df.dropna(inplace=True)
    return df

df = load_data()

# --- Page Configuration ---
st.set_page_config(
    page_title="🐧 Palmer Penguins Explorer",
    page_icon="🧊",
    layout="centered"
)

st.title("🐧 Palmer Penguins Explorer")
st.markdown("Explore the Palmer Penguins dataset using **Streamlit**.")

# --- Sidebar for User Inputs ---
st.sidebar.header("📊 Chart Controls")

# Species selector
species_list = ['All'] + sorted(df['species'].unique().tolist())
selected_species = st.sidebar.selectbox("Select Species", species_list)

# Axis selectors
numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns.tolist()
x_axis = st.sidebar.selectbox("Select X-axis", numeric_columns, index=numeric_columns.index('bill_length_mm'))
y_axis = st.sidebar.selectbox("Select Y-axis", numeric_columns, index=numeric_columns.index('bill_depth_mm'))

# --- Data Filtering ---
if selected_species != 'All':
    filtered_df = df[df['species'] == selected_species]
else:
    filtered_df = df

# --- Display Chart ---
st.subheader(f"Scatter Plot: {x_axis} vs. {y_axis}")

if not filtered_df.empty:
    fig = px.scatter(
        filtered_df,
        x=x_axis,
        y=y_axis,
        color='species',
        hover_name='species',
        title=f'Relationship between {x_axis} and {y_axis}'
    )
    st.plotly_chart(fig, use_container_width=True)
else:
    st.warning("No data available for the selected species.")

st.markdown("---")
st.write("Data Source:")
st.dataframe(filtered_df.head())

2. Dash Example
Dash is more structured. We define a static layout and then create a @app.callback function that listens for changes to the inputs (species-dropdown, x-axis-dropdown, y-axis-dropdown) and updates the output (penguin-scatter-plot).

dash_app/app.py

import dash
from dash import dcc, html, Input, Output
import pandas as pd
import plotly.express as px

# Load the dataset
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv')
df.dropna(inplace=True)

# --- App Initialization ---
app = dash.Dash(__name__)
server = app.server # Expose server for deployment

numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns.tolist()
species_list = [{'label': 'All', 'value': 'All'}] + [{'label': s, 'value': s} for s in sorted(df['species'].unique())]

# --- App Layout ---
app.layout = html.Div(style={'fontFamily': 'sans-serif'}, children=[
    html.H1("🐧 Palmer Penguins Explorer (Dash)", style={'textAlign': 'center'}),
    html.P("Explore the Palmer Penguins dataset using Plotly Dash.", style={'textAlign': 'center'}),

    html.Div(style={'display': 'flex', 'padding': '20px'}, children=[
        # Controls Div
        html.Div(style={'width': '25%', 'paddingRight': '20px'}, children=[
            html.H3("📊 Chart Controls"),
            html.Label("Select Species"),
            dcc.Dropdown(id='species-dropdown', options=species_list, value='All'),
            html.Br(),
            html.Label("Select X-axis"),
            dcc.Dropdown(id='x-axis-dropdown', options=[{'label': i, 'value': i} for i in numeric_columns], value='bill_length_mm'),
            html.Br(),
            html.Label("Select Y-axis"),
            dcc.Dropdown(id='y-axis-dropdown', options=[{'label': i, 'value': i} for i in numeric_columns], value='bill_depth_mm'),
        ]),

        # Graph Div
        html.Div(style={'width': '75%'}, children=[
            dcc.Graph(id='penguin-scatter-plot')
        ])
    ])
])

# --- Callback for Interactivity ---
@app.callback(
    Output('penguin-scatter-plot', 'figure'),
    [Input('species-dropdown', 'value'),
     Input('x-axis-dropdown', 'value'),
     Input('y-axis-dropdown', 'value')]
)
def update_graph(selected_species, x_axis, y_axis):
    if selected_species == 'All' or selected_species is None:
        filtered_df = df
    else:
        filtered_df = df[df['species'] == selected_species]

    fig = px.scatter(
        filtered_df,
        x=x_axis,
        y=y_axis,
        color='species',
        hover_name='species',
        title=f'Relationship between {x_axis} and {y_axis}'
    )
    fig.update_layout(transition_duration=500)
    return fig

# --- Run the App ---
if __name__ == '__main__':
    app.run_server(debug=True)

3. Bokeh Example
Bokeh requires us to think more about data sources and how glyphs (like circles) are updated. Here, we create a ColumnDataSource and a callback function that modifies the data within this source when a widget changes.

bokeh_app/app.py

import pandas as pd
from bokeh.plotting import figure, curdoc
from bokeh.models import ColumnDataSource, Select
from bokeh.layouts import column, row
from bokeh.palettes import Category10_3

# Load the dataset
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv')
df.dropna(inplace=True)

# Map species to colors
species_unique = sorted(df['species'].unique())
color_map = {species: Category10_3[i] for i, species in enumerate(species_unique)}
df['color'] = df['species'].map(color_map)

# Create a ColumnDataSource
source = ColumnDataSource(data=df)

# --- Create Widgets ---
numeric_columns = df.select_dtypes(include=['float64', 'int64']).columns.tolist()
species_list = ['All'] + species_unique

x_axis_select = Select(title="Select X-axis", value="bill_length_mm", options=numeric_columns)
y_axis_select = Select(title="Select Y-axis", value="bill_depth_mm", options=numeric_columns)
species_select = Select(title="Select Species", value="All", options=species_list)

# --- Create the Plot ---
p = figure(height=500, width=700, title="Penguin Scatter Plot", tooltips=[("Species", "@species"), ("X", "@x"), ("Y", "@y")])
p.circle(x="x", y="y", source=source, size=10, color="color", legend_field="species")
p.xaxis.axis_label = x_axis_select.value
p.yaxis.axis_label = y_axis_select.value

# --- Define the Callback ---
def update_plot(attr, old, new):
    # Filter data based on species selection
    if species_select.value == 'All':
        filtered_df = df
    else:
        filtered_df = df[df.species == species_select.value]

    # Update source data
    source.data = {
        'x': filtered_df[x_axis_select.value],
        'y': filtered_df[y_axis_select.value],
        'species': filtered_df['species'],
        'color': filtered_df['color']
    }

    # Update axis labels
    p.xaxis.axis_label = x_axis_select.value
    p.yaxis.axis_label = y_axis_select.value

# Attach the callback to the 'value' property of each widget
for widget in [x_axis_select, y_axis_select, species_select]:
    widget.on_change('value', update_plot)

# --- Arrange Layout ---
controls = column(species_select, x_axis_select, y_axis_select, width=200)
layout = row(controls, p)

# Initialize the plot data
update_plot(None, None, None) 

curdoc().add_root(layout)
curdoc().title = "🐧 Palmer Penguins Explorer (Bokeh)"

📦GitHub Repository
A working implementation is available here:
🔗Other Visualization Tools
🚀Deploy

✅Conclusion: Which Tool Should You Use?

Choose Streamlit if: You are a data scientist who needs to build a beautiful, interactive tool quickly. You value simplicity and speed over granular control. Perfect for prototypes, internal tools, and ML model demos.
Choose Dash if: You are building a complex, production-ready application that requires a specific layout, multi-page functionality, and robust state management. You are comfortable with more boilerplate and a callback-driven architecture.
Choose Bokeh if: Your primary need is highly interactive, high-performance plotting, especially with large or streaming datasets. You want fine-grained control over your visualizations and are comfortable building the UI around them.

DEV Community

📊Beyond the Standard: Exploring Modern Python Visualization Tools

Top comments (0)