Originally published on nfolyo blog
In nFolyo MVP Postmortem – What We Learned Building a Portfolio Tracker, we briefly introduced our tech stack and architecture. This post delves deeper into the technical decisions behind nFolyo, outlining our current architecture and its key limitations. We'll then present our approach for the upcoming v0.5.0 release, where we're restructuring our architecture with scalability and responsiveness in mind.
Architecture Overview
Before discussing any restructuring, it's crucial to understand how our services and web apps currently interact and their limitations.
The nFolyo Client App serves as our user-facing frontend. Built with NextJS/TypeScript, it uses a combination of RadixUI and custom-made components. This app performs minimal data calculations, delegating most data processing to our Python services.
nFolyo Core, built with Flask in Python, is responsible for updating user accounts. It provides importers for various brokers (currently only Interactive Brokers and Freetrade) to import activities such as trades, dividends, fees, cash transactions, and corporate actions. During import, all broker activity undergoes standardization to strip private information, eliminate differences between brokers, and ensure consistent data handling. We'll explore this process in a future post.
To compute metrics like market value, PnL, and TWRR, nFolyo Core fetches pricing and foreign exchange data from the nFolyo Finance service. nFolyo Finance serves as our price API that fetches, aggregates, and standardizes pricing data from external providers (such as Yahoo Finance), ensuring reliable data and fast retrieval.
You'll notice the nFolyo Admin web app in the diagram. We've recently moved all admin functionality from the nFolyo Client App to a separate VPN-protected web app. nFolyo Admin, built with Next.js and shadcn/ui, lets us manage beta test users, handle user accounts, and monitor performance through benchmarking data.
One detail omitted from our diagram for simplicity: both nFolyo Core and Finance services write to the nFolyo Benchmarker Database, which feeds data to the benchmarking tools in our admin web app. The benchmarker is a separate Python module we wrote to help us better understand our performance bottlenecks.
What Works, and What Doesn't
Let's be honest: nFolyo is slow. There are two main reasons: we're running on low-tier Azure services, and our architecture doesn't allow for proper task parallelization. Before scaling up our infrastructure, we need to address the fundamental architectural bottlenecks. Otherwise, we'll hit the same issues with larger user volumes and more features built on a shaky foundation, making future refactoring difficult.
Dissecting the /update-account
Endpoint
Let's examine the /update-account
endpoint that handles account updates. It's our worst-case scenario, implemented as a monolithic function for quick MVP development.
It performs these steps:
- Collects all holding symbols and unique currency pairs from the user's holdings
- Fetches latest price/FX data from nFolyo Finance
- Updates holdings market value, PnL, and other metrics
- Aggregates holding values into themes and account metrics (market value, PnL, etc)
- For each holding:
- Fetches historical prices and FX data
- Calculates TWRR, market value over time, cash in and outflows
- Aggregates these to themes and the overall account historical charts
As shown in the sequence diagram, there are many steps in a single call. During this time, nFolyo Core is blocked from serving other requests while the user's frontend displays the "updating your account" component.
While we've improved the component in v0.4.0 to provide more information about the current stage of the account update, it remains a long wait. We're not displaying information to the user as it becomes available (e.g., holding, theme account market value, PnL).
Benchmarking the Bottleneck
In the benchmarker screenshot below, you'll see /update-account
taking 20 minutes to complete! While this example is particularly bad due to concurrent requests at peak CCU, it illustrates the worst-case scenario we need to address.
Notice that getHistoricalData
takes about 1 minute 20 seconds — unacceptable, but not our biggest problem!
The performance.calculateHoldingPerformanceTWRR (111)
is even worse, consuming 80% of the total time! This severely impacts user experience — nobody wants to wait minutes to view their account data. While some delay during initial import might be acceptable, it shouldn't be the norm.
Several factors contribute to this poor performance:
- We process holdings sequentially, in this case, 111 holdings one after another.
- The
calculateHoldingPerformanceTWRR()
function itself is inefficient. Pandas struggles with large datasets, especially when combining multiple tables (historical prices, historical FX data, corporate actions) for TWRR calculations. - Theme and account TWRR calculations must wait for all holdings to complete before they can start.
Solutions: Task Splitting, Parallelization, and New Price Provider
What are we doing to address these issues?
First, we'll tackle the 80% bottleneck by parallelizing the TWRR calculations. Instead of using Python threads, we're adopting a more scalable message queue pattern. We'll split or completely convert nFolyo Core into a Celery worker, queueing tasks to Redis or RabbitMQ.
This approach will allow us to queue all TWRR tasks and scale up workers for faster processing when needed. Even with just two Celery workers running in parallel, we could nearly halve processing time.
Secondly, we're switching price providers to reduce the 1-minute-20-second historical price retrieval to 10–20 seconds. Combined with scheduled background price and account updates, this should reduce friction between user actions and data visibility.
We'll optimize TWRR calculation performance by exploring Polars, a performance-focused alternative to Pandas, while improving our TWRR algorithm.
Finally, we'll split the /update-account
endpoint into smaller tasks (separating quick operations like price updates from slower TWRR calculations). This lets users view current market values while TWRR charts process in the background.
Evolving the Architecture
Let's examine the suggested architecture diagram below that accommodates these solutions. This diagram omits some services and apps (like admin app, common and benchmarker DBs) to highlight the fundamental architectural changes.
The key improvement is offloading heavy tasks to Python Celery workers. These can scale up and run in the background while keeping the UI responsive and showing data as they become available.
This architecture coupled with Server-Sent Events should allow us to provide more frequent task status updates, eliminating our current expensive polling approach, the result of fast development for MVP and Azure function limitations (45sec timeout).
This is a suggested architecture, meaning that things are subject to change as we progress our development. At the moment we believe we'll split out a big chunk of the nFolyo Core business logic to nFolyo Core Celery Workers, starting with the heaviest tasks. nFolyo Core will remain to receive requests and orchestrate the heavier tasks.
The scheduled jobs will probably end up either queuing tasks directly to the message queue or do it via nFolyo Core endpoints. We're also exploring whether the nFolyo Client App itself should queue tasks directly. Many possibilities are open and we'll be revising our suggested architecture once they become more concrete.
Conclusion
In this post, we examined nFolyo's current architecture and tech stack, identified key bottlenecks, and highlighted the scalability challenges we're facing. We used /update-account
as a case study to demonstrate where and why performance breaks down.
To address these issues, we're revising our architecture and refactoring critical services:
- Introducing a message queue to enable parallel task execution.
- Offloading heavy computations to Celery workers, with nFolyo Core orchestrating and dispatching tasks.
- Breaking up monolithic functions into smaller, async-friendly tasks, with Server-Sent Events providing updates for a more responsive UI.
- Scheduling background updates to execute heavy calculations ahead of user interaction.
- Switching to a faster, more efficient price provider.
As we roll out v0.5.0, we'll share more learnings and code details. For now, we're focused on making nFolyo faster, smoother, and ready to scale.
Top comments (0)