Building a Data Analytics Platform for a Fintech: My Journey into Google Cloud

#python #database #api

When I joined Nkwa, a fintech startup focused on improving financial inclusion in Cameroon, I had only been there for about four months when I faced a critical challenge: developing a data platform that could handle everything from raw ingestion to advanced analytics and regulatory reporting. Before joining Nkwa, my professional background was primarily in treasury analysis, working with financial instruments, managing liquidity, and conducting risk assessments. I had spent a good part of my early career buried in spreadsheets, bank statements, and treasury management systems—far from the world of modern data engineering. So, being handed the responsibility of building an analytics reporting platform on the cloud felt both exciting and intimidating.

To add to the pressure, our small but ambitious company needed to rapidly scale its services. We were working to provide financial products to the unbanked in Cameroon—people who had never previously had access to conventional banking services. Our mission was to bring financial empowerment to these communities. As more users signed up and our operations expanded, it became clear that we needed a robust data platform. Not just one that stored data, but one that would allow us to run analytical queries, generate product and financial reports, and comply with regulatory requirements. It also needed to be cost-effective, secure, and designed for rapid iteration.

The Starting Point: Understanding the Data Sources

One of the first steps was to understand the variety and complexity of our data sources. Nkwa’s mobile app collected a wealth of user data. This data lived in Firebase Cloud Firestore, a NoSQL database known for its ease of integration with mobile apps, real-time capabilities, and horizontal scaling. But that was only one piece of the puzzle.

We also relied heavily on APIs from our mobile payment partners: MTN Mobile Money (MoMo) and Orange Money. These services handled user transactions—deposits, withdrawals, peer-to-peer transfers—and provided data through their proprietary APIs. Initially, just reading through their documentation felt like learning a new language. Each partner had its own authentication methods, pagination rules, rate limits, and data formats. I remember many late nights combing through PDF documents and developer portals, trying to understand how to pull the right transaction data without breaking their usage limits or missing important attributes.

In addition, we integrated data from Beac (beac.int), the central bank of Cameroon. This was critical for financial benchmarking. Beac provided daily exchange rates, information on savings interest rates, and other macroeconomic indicators. Such data helped us benchmark our financial products and ensure we were offering competitive and fair services, while also staying compliant with regulations. Understanding Beac’s datasets was like dealing with official government reports and spreadsheets—less tech-savvy but no less important.

By the end of this discovery phase, I realized that our data environment was not a single homogenous source. We had streaming data from the Nkwa app, batch data from MTN and Orange APIs (which could be polled periodically), and more static yet critical data from Beac. Each source had its unique format and refresh cycle. To support analytics, we needed a platform that could handle these disparate data flows gracefully.

Choosing the Technology Stack: Why Google Cloud?

When I joined Nkwa, I had a background in Python and SQL, mostly from my treasury analyst days where I ran SQL queries on internal financial databases and wrote Python scripts for some automation tasks. However, building a scalable data platform in the cloud was new territory. I had roughly six months to get comfortable with Google Cloud. Luckily, cloud providers today offer extensive documentation, tutorials, and community support, which helped speed up my learning.

We chose Google Cloud for a few reasons. First, the company had a strategic preference for GCP due to existing infrastructure and the ecosystem’s simplicity. Also, Google Cloud services like BigQuery, Cloud Storage, and Dataflow fit very well into the modern data analytics landscape. They are fully managed, meaning we wouldn’t have to spend hours provisioning servers or managing patches. This freed us to focus on solving data problems rather than infrastructure ones.

Laying Out the Architecture

After some brainstorming and research, we established a blueprint for our data platform. We wanted a clear flow: ingestion, transformation, storage, and then analytics. The final architecture looked something like this:
1. Data Sources Layer:
• Nkwa App (Firebase Cloud Firestore): This provided user information, transactional metadata, and behavioral data straight from our mobile application.
• External Partners (MTN Mobile Money, Orange Money): *We accessed their APIs to retrieve transactional data. At first, this required manual scheduling and polling, but we planned to automate it.
*• Beac (beac.int): We pulled in exchange rates, savings interest rates, and regulatory benchmarks.
2. Ingestion Layer:
We decided to use Apache Kafka as our data streaming service. Kafka provided a scalable, fault-tolerant way to bring all this data into one place. For data that was event-driven (like user transactions in real-time), pushing them into Kafka felt natural. For batch data from Beac or partner APIs, we adapted and wrote Python scripts that would fetch data and push it into Kafka at predefined intervals.
Integrating Kafka with GCP might seem non-traditional since we have Pub/Sub as a native solution, but we already had some Kafka expertise and found it easier at the time. However, I must admit, if we had started fresh or wanted pure GCP-managed services, Pub/Sub might have been the better option.
3. Processing Layer (Google Dataflow):
Once the data landed in Kafka, it was time to process and transform it. We used Google Dataflow for this. Dataflow is a fully managed service for handling both streaming and batch processing jobs, built on Apache Beam. I found Dataflow approachable, especially since I was familiar with Python and SQL. It allowed us to write pipelines in Python and let Dataflow handle the scaling.
For example, a Dataflow job could take raw transaction logs from the app, join them with user details from Firestore exports, and enrich them with the latest exchange rates from Beac. The output would be a clean, well-structured dataset ready for analysis.
4. Storage and Analytics Layer:
Cloud Storage acted as a landing area for any raw files we extracted. Some APIs provided JSON or CSV dumps; we stored them in Cloud Storage before processing. It served as a cost-effective data lake, giving us a place to keep data indefinitely while controlling costs.
BigQuery was the heart of our analytics stack. It’s a serverless data warehouse that allows running SQL queries at scale. This was perfect for me since I was used to SQL from my treasury days. In BigQuery, we could store curated datasets and run complex queries without worrying about capacity planning or indexing. It also integrated seamlessly with Cloud Storage and Dataflow.
With BigQuery, we performed aggregations to produce key metrics: user growth rates, transaction volumes, average transaction sizes, cost of service delivery, and many other KPIs that the business and regulatory bodies wanted to see.
5. Orchestration (Cloud Composer):
Building the pipelines was one thing; orchestrating them was another. We introduced Cloud Composer, a fully managed Apache Airflow service, to schedule and monitor our data workflows. Composer allowed us to run daily tasks—for instance, at 2 AM every day, fetch the Beac exchange rates and interest rates, store them in Cloud Storage, process them with Dataflow, and load the results into BigQuery. At 3 AM, run aggregation queries in BigQuery to update the reporting tables. By 8 AM, the BI dashboards would have fresh data for the management team. It sounds simple now, but it took time and testing to get these dependencies right.
6. BI and Reporting Tools:
Once our analytics-friendly datasets were in BigQuery, they could be accessed by various tools. Whether it was a simple SQL client, a business intelligence tool, or even a data science notebook, BigQuery served as a single source of truth. The management team could view dashboards to track user growth, the finance team could pull reports to comply with regulatory bodies, and data scientists could run more complex models to predict user churn or transaction fraud.

C*hallenges and How I Overcame Them*

Adapting from treasury analysis to building a data platform in four months was not without hurdles. My previous experience gave me a good grasp of data modeling and some programming basics, but understanding distributed systems, real-time data processing, and cloud-native architectures was new.
• Learning Curve on GCP:
The documentation and tutorials helped, but it was still overwhelming at times. I often started with small experiments on my personal GCP sandbox. I would spin up a Dataflow job with a small sample of data and test how transformations worked. I learned how BigQuery’s SQL syntax differed slightly from traditional SQL systems I knew. Over time, I got more confident and started to see patterns and best practices.
• Integrating Partner APIs:
Reading documentation from MTN and Orange took patience. Sometimes the docs were incomplete, and I had to open support tickets or check community forums for answers. Dealing with authentication tokens, handling rate limits, and ensuring we had the right API keys in place were all chores that required care and attention to detail. I learned to log every step of the ingestion process and keep careful track of errors. Good logging saved me countless hours of guesswork.
• Working with Beac Data:
Beac data was not presented as a modern API but more of a data feed or manual download. We had to write scripts to fetch this data, parse the sometimes messy formats, and convert them into a structured schema. It felt like going through official financial bulletins line by line, but the payoff was huge. Once this data was cleanly integrated, we could benchmark our product rates effectively.
• Scaling and Cost Management:
One of the trickier aspects was ensuring that as we brought more data into the system, the costs didn’t skyrocket. BigQuery charges based on the amount of data processed, and Dataflow costs can add up if pipelines run continuously. We learned to optimize queries, partition BigQuery tables by date, and compress data in Cloud Storage. Over time, we established governance practices—reviewing queries regularly, archiving older data that wasn’t frequently accessed, and consolidating transformations.
• Time Pressure and the Need for Incremental Wins:
With only four months into the job, I couldn’t afford to build everything perfectly from day one. Instead, I focused on delivering incremental wins. First, I set up a basic pipeline for one data source. Then I added another source, then another transformation step, and so on. Each small success boosted my confidence and helped the team trust the new platform. By the end of the four months, we had a decent system in place, and I had learned a tremendous amount along the way.

Leveraging Existing Skills and Learning New Ones

My background in Python and SQL was a real lifesaver. Python helped in writing custom Dataflow pipelines and orchestrating tasks. SQL was crucial for querying BigQuery and understanding how to structure our tables for efficient querying. Even though I was new to cloud data engineering, my previous skill set bridged the gap. I could incrementally grow my knowledge of GCP services without feeling completely lost.

As I got more comfortable, I realized that having some frontend development skills would be beneficial, too. At first, it seemed unrelated, but understanding frontend principles allowed me to appreciate the importance of well-organized data structures for reporting and visualization tools. I took a few courses on Scrimba, an online platform that offers interactive coding tutorials. Learning frontend development basics made it easier for me to understand how the BI team would use our data. If they needed a certain data point to display on the dashboard, I knew exactly how to structure it in BigQuery to make their lives easier.

This cross-training had another benefit: it improved communication with other team members. When I talked to the frontend developers, I could speak their language a bit. When I discussed transformations with data scientists, I knew how to provide the cleanest inputs for their models. This holistic view of the data’s journey—from ingestion all the way to user-facing dashboards—made me more effective in my role.

Lessons Learned

As I reflect on those first four months at Nkwa, I realize how pivotal they were in my career. Shifting from a treasury analyst to a data platform developer in a fintech startup taught me lessons that I would carry forward:
1. Embrace Change and Uncertainty:
I stepped out of my comfort zone. The tools, the ecosystem, even the type of data I was dealing with—it all changed. But by embracing that change, I discovered a world of technology and processes that make finance more accessible.
2. Start Small and Iterate:
Instead of trying to build the perfect platform from day one, I focused on small wins. One pipeline at a time, one data source at a time. This incremental approach helped me learn faster and show progress to stakeholders.
3. Leverage Existing Skills and Build New Ones:
My Python and SQL background provided a strong foundation. On top of that, I learned GCP’s services and picked up some frontend development concepts from Scrimba. This combination made me a more versatile contributor.
4. Documentation and Communication are Key:
Reading partner API documentation carefully, asking questions on forums, and writing detailed internal documentation for my pipelines saved time and prevented mistakes.
5. Cost and Performance Considerations Matter:
At scale, even small inefficiencies become expensive. I learned to think about query optimization, data partitioning, and workflow scheduling from day one.
6. Understand the Business Context:
The data platform wasn’t just a technical feat; it served the company’s mission of financial inclusion. By understanding the business goals—such as compliance reporting to regulators and improving product offerings for the unbanked—I was able to design data models and pipelines that aligned with what our stakeholders truly needed.

Conclusion

In just a few months, I went from examining treasury reports to orchestrating a complex data analytics platform on Google Cloud. I learned how to integrate disparate data sources—Firebase Firestore for our app data, MTN and Orange Money APIs for transaction records, and Beac for macroeconomic indicators—into a cohesive system. I leveraged Apache Kafka for ingestion, Google Dataflow for transformations, and BigQuery for analytics, all orchestrated by Cloud Composer. Our BI and data science teams could then produce actionable insights, empowering the business and satisfying regulatory requirements.

It wasn’t easy, but that’s what made the journey worthwhile. The lessons I took away—both technical and personal—shaped my approach to problem-solving and teamwork. Today, that data platform continues to evolve, supporting Nkwa’s vision of bringing financial services to those who need them most. And I’m proud to have played a part in building it.

DEV Community

Building a Data Analytics Platform for a Fintech: My Journey into Google Cloud

Top comments (0)

Read next

使用 selenium 讀取需要登入會員的網頁

What is SQL and Why It’s Essential for Data Management

How to Optimize Loops for Better Performance

Valid Database properties(ACID).