Building an Enterprise-Ready SonarCloud Dashboard with Streamlit & Azure

#python #azure #sonarcloud #development

The Catalyst: Chasing KPIs Without an Enterprise License

We’ve all been there: SonarCloud gives you fantastic insights for a single repository, but the moment your engineering manager asks for a "birds-eye view" of security debt and code coverage across 50 microservices, things get messy.

My journey started exactly there. I needed to provide weekly auditing KPIs, but our organization didn't have access to the built-in PDF reporting reserved for SonarCloud Enterprise plans. I refused to compile these reports manually every Friday.

What started as a quick, dirty Proof of Concept to pull a few metrics evolved into a full-fledged internal tool. However, taking a Streamlit app from a simple script to an enterprise-grade, performant application introduced unique hurdles—specifically around memory limits, storage coupling, and corporate authentication.

Here is a deep dive into the architecture, the trade-offs I made, and how I tackled memory bloat and separation of concerns.

1. Moving Beyond the Monolith: Architectural Trade-offs

Streamlit is famous for fast prototyping, but a single, monolithic app.py script becomes unmanageable quickly in a production environment. To prioritize maintainability and team collaboration, I adopted a modular folder structure that mimics a traditional MVC (Model-View-Controller) framework.

Design Decision: By strictly separating routing (app.py), business logic (data_service.py), and presentation (dashboard_view.py), the application becomes highly testable. The trade-off is a slightly higher initial cognitive load compared to standard Streamlit scripts, but the return on investment for long-term maintainability is immense.

2. Tackling Out-of-Memory (OOM) Errors: Parquet Compression

Retrieving massive datasets—thousands of SonarCloud metric rows spanning months of history—quickly bloats Streamlit's Session State and RAM usage. In my early PoC, appending historical data for multiple projects often led to container restarts due to memory exhaustion.

The Optimization: Instead of storing raw Pandas DataFrames in memory, I implemented a mechanism to compress DataFrames into Parquet binaries before storing them in Streamlit's state limit. During UI rendering, the active data is rapidly decompressed on the fly, and the uncompressed DataFrame is explicitly deleted from memory the moment it is no longer needed.

# In our main orchestration file (app.py):
if 'metrics_data_parquet' in st.session_state:
    # 1. Decompress the lightweight binary on the fly
    metrics_data = decompress_from_parquet(st.session_state['metrics_data_parquet'])

    if not metrics_data.empty:
        # Render the UI components
        display_dashboard(metrics_data, [data_project], projects, data_branch)

        # 2. Explicitly release the uncompressed DataFrame from memory
        del metrics_data

Trade-off: This introduces slight CPU overhead during decompression, but it completely stabilizes memory consumption, preventing orphaned DataFrames from lingering and ensuring optimal performance under heavy load.

3. The Storage Factory Pattern: Decoupling the Database

I started with Azure Table Storage as the backend, but I wanted to avoid tight coupling. If we ever needed to migrate to PostgreSQL or MongoDB, I didn't want to rewrite the service layer.

I implemented a Storage Factory Pattern in database/factory.py.

Python

# The service layer requests a client, oblivious to the underlying Azure implementation
storage = get_storage_client()

# Later, we pass the storage client into our decoupled data service:
st.session_state['metrics_data_parquet'] = fetch_metrics_data(projects, days, branch, storage)

Why this matters: Beyond future-proofing the application, this abstraction dramatically improves the developer experience. Engineers can run Azurite (a local emulator for Azure Storage) via Docker Compose, entirely bypassing the need for cloud credentials during local feature development.

4. Securing the Application: Entra ID (Azure AD) Integration

Internal corporate metrics are sensitive. Native Streamlit doesn't offer Single Sign-On (SSO) out of the box, so basic auth wasn't going to cut it.

I integrated the Microsoft Authentication Library (msal) paired with streamlit-cookies-manager. The flow checks for a secure, HttpOnly auth cookie. If missing, it generates a secure state token and redirects the user through an OAuth2 authorization code flow, mitigating Cross-Site Request Forgery (CSRF) risks. Once authenticated, Microsoft Graph populates their user session.

5. The "Offline" Demo Mode

One of the biggest friction points in open-source or enterprise tooling is the onboarding phase. Waiting for IT to provision API keys or database access kills momentum.

To solve this, I built a --demo-mode flag. It bypasses the MSAL authentication and cloud database dependencies entirely, injecting synthetic, locally-generated SonarCloud data into the UI.

# 1. Generate synthetic data locally
python src/dashboard/demo/demo_generator.py

# 2. Run the Streamlit app passing the demo flag
streamlit run src/dashboard/app.py -- --demo-mode

Let's Build Together

What started as a hacked-together script to appease my manager has evolved into a mature, architecturally sound application. However, there is always room for optimization.

I am open-sourcing this project and actively looking for community contributions! Whether it is optimizing the async API ingestion, polishing the CSS overrides, or adding new storage backends, I'd love your input.

Clone the repo at SonarCloudDashboard, spin up the --demo-mode in under 30 seconds, and let me know what you think. What patterns do you rely on to scale your Python dashboards? Drop a comment or open a PR!