Satoshi S. for Web Dev Path

Posted on Apr 6 • Edited on Jun 22

My Experience with Solana

#solana #python #mlh #data

Intro

I’ll start with an update on my current work.

Last November, I began working with JB Music Therapy to build a music wellness application. We recently completed a large user testing phase with over 250 participants, including music therapy students. Now, we’re migrating the web application to Flutter in preparation for releasing it on the Google Play Store and Apple App Store.

Since I don’t yet have production-level mobile development experience, I’ve been actively learning through a Udemy course while working on the migration. It’s been a challenging but rewarding process.

As a contract worker, I have a flexible schedule. To take advantage of that, I decided to apply to the MLH Fellowship again. I’ve participated in the program three times before, and each time I’ve gained valuable skills and experience.

This time, I was accepted into the Web3 track and assigned to work with the Solana Foundation team. In this post, I’ll share my experience working with Solana.

If you're interested in my previous experiences in the program, please check the posts below.

Project We Were Assigned To

Usually, MLH Fellowship participants contribute to the existing large-scale open-source projects under the guidance of core maintainers, but this time was a bit different. Our team was tasked with building a system to aggregate stablecoin metrics from multiple data providers and present them in a unified dashboard for the Solana Developer Website.

Why is this important? Because metrics can vary depending on the data provider. Differences in calculation methods, definitions, and data sources often lead to inconsistencies. Having a single place to visually compare these metrics makes it easier to understand and validate the data.

To support this, we are building a data pipeline on Databricks that pulls data from five major cryptocurrency analytics platforms: Allium, Artemis, Blockworks, DefiLlama, and Dune.

We are in the process of open-sourcing the data retrieval repository so that other developers can easily add new data providers or extend the available metrics in the future.

Explore the API

As a starting point, we decided to retrieve four core metrics.

metrics = [
  {
    'name': 'Supply',
    'unit': 'USD',
    'description': 'Total circulating supply of stablecoins on Solana, denominated in USD.'
  },
  {
    'name': 'Transfer Volume',
    'unit': 'USD',
    'description': 'Total transfer volume of stablecoins on Solana, denominated in USD.'
  },
  {
    'name': 'Transfer Count',
    'unit': 'Count',
    'description': 'Total number of stablecoin transfer transactions on Solana'
  },
  {
    'name': 'Active Addresses',
    'unit': 'Count',
    'description': 'Number of unique addresses interacting with stablecoins on Solana'
  }
]

We explored each API using notebooks to retrieve these metrics and understand their data structures.

During this process, we encountered several challenges, such as handling paginated responses, working with SQL-based APIs and adapting to different data schemas across providers.

Ingestion Pipeline

After exploring data provider APIs, we moved on to building the ingestion pipeline. We set up a database on Databricks and stored the data in a consistent format. During this process, we also began structuring the codebase with the goal of open-sourcing it in the future.

Dashboard

Databricks provides built-in dashboarding capabilities, which we used to visualize the aggregated data. I leveraged the AI Agent to generate plots quickly, allowing us to compare metrics across providers. Databricks also provides embeddable HTML components, which we plan to integrate into the final Solana developer page.

Repository Structure Proposal

We are now halfway through the Fellowship and have started organizing our code from Databricks into a structured GitHub repository.

solana-stablecoin-benchmark/
  providers/        # contributors add providers here
  normalizers/      # shared transformation logic
  pipelines/        # ingestion + normalization orchestration
  db/               # schema definitions
  config/           # config files (logging, env templates)
  tests/            # validation and correctness
  notebooks/        # optional exploration (Databricks / local analysis)
  main.py           # pipeline entry point
  requirements.txt  # dependencies
  .env.example      # API keys template
  README.md

This structure is designed to make the project easy to extend, allowing contributors to add new data providers and metrics with minimal friction.

Midterm Conclusion

Throughout the project, effective communication has been just as important as technical implementation. Since our team is distributed across multiple time zones, we rely heavily on asynchronous communication.

To support this, I've focused on clearly documenting my work so that teammates can easily understand progress and continue development without blockers. This experience has reinforced how important well-written documentation is for team productivity.

This project has been a great opportunity to deepen my experience in data engineering and collaborate in a distributed team environment.

After the Midterm Presentation

After the midterm presentation, we decided to change the scope of the open-source project.

Initially, we planned to open-source the entire Databricks data pipeline, including data retrieval, processing, and storage.

To make the project easier to maintain and more accessible to external contributors, we decided to focus on open-sourcing only the data retrieval layer through reusable Data Provider APIs.

At the same time, we expanded the scope of the metrics themselves by adding overview and network activity metrics alongside stablecoin-specific metrics.

My Contributions

During the Fellowship, I merged 8 PRs into the data retrieval repository.

Creating Data Provider Classes

One of my main responsibilities was designing and refining the Data Provider classes used by the project.

Each provider exposed data differently, so creating a unified interface required solving several practical challenges.

Data integrity

We scheduled daily retrieval jobs for metrics collection, but some providers published data that was not fully finalized at the time of retrieval. This occasionally caused inconsistencies in the dashboard results.

To address this issue, I added buffer logic to allow the pipeline to re-check recently updated data before treating it as finalized.

Query Cost and API Limitations

Some providers exposed SQL-based APIs, and certain metrics — especially active address calculations — required expensive queries with high CPU usage.

In one case, our team reached the monthly usage limit for a provider twice while testing and validating metrics. Because of these limitations, we were unable to retrieve one metric reliably within the available quota.

This experience taught me how important query optimization and cost awareness are when working with external analytics platforms.

Limited Documentation

Another challenge was that some API behavior and metric definitions were not fully documented publicly.

In several situations, my mentor needed to communicate directly with provider teams to clarify API behavior and metric calculations. This showed me that real-world engineering often involves cross-team communication in addition to technical implementation.

Adding GitHub Actions Workflows

We also needed to improve repository maintainability as the project became more collaborative.

I contributed to adding GitHub Actions workflows for:

linting,
unit testing,
and PR validation checks.

These automated checks helped ensure that contributions remained stable and maintainable as the repository evolved.

Contributing to Solana-com

Lastly, I contributed to the Solana developer website by adding a new data page that embeds our Databricks dashboard.

Since the Solana website itself is open source, I first explored the repository structure and frontend architecture before creating the PR.

Successfully integrating the dashboard into the site was especially rewarding because it connected our backend data work to a user-facing experience.

This page will go live in mid June.

Last Conclusion

Overall, this Fellowship gave me valuable experience working on real-world data engineering problems, collaborating in a distributed open-source environment, and contributing to production-facing infrastructure.

I especially enjoyed working at the intersection of data platforms, developer tooling, and Web3 analytics.

Thanks for reading about my experience with MLH x Solana Fellwoship.

After the Fellowship

Following the MLH Fellowship, I continued contributing to the project through a contract role with the Solana Foundation team. Today, the product officially launched as Solana Data. It’s rewarding to see work that started as a Fellowship project evolve into a public-facing platform used by developers and analysts across the ecosystem.