Creating Stock Data | building stocksimpy 3

#buildinpublic #programming #beginners #python

StockSimPy is a lightweight Python library for simple stock backtesting. The goal is to understand Pandas, experiment with stock strategies better, and create an easy-to-use alternative to more complex backtesting tools. This is part 3 of the series where I build this library in public.

After finishing basic indicator calculation functions, I needed a way to keep track of all the stock information in an organised, reusable format. That’s where the StockData comes in — it acts as a container for everything you’ll need in backtesting or simulation.

I initially thought it should be easy to code as it just needed to keep the information and require some simple import and export, but I was quite wrong. Turns out working with data can be messy.

Data Validation

When importing stock data, you can’t assume the columns are always consistent. Strategies require the use of different features, but some fields are essential:

The tricky path — though — is naming conventions. What do I mean?

Let's take “Open” as an example; it could show up as “OPEN”, “open”, “OpeN”, “open_price”, “OpenPrice”, “openPrice”, and many other wild naming styles.

Lowercasing handles some cases, but what about the ones with “price” in the name? Then I thought — I could easily search for the substring “open” in the whole word. This covers all the cases I mentioned above, but if open is named something else entirely, it wouldn’t work.

A more comprehensive approach might be to create a full-blown synonym-matching system. But that might be overkill for now. Still, I might add it as a feature in the future if somebody requests it.

Data Import

The most important feature of StockData is importing data—without that, it’s just an empty shell.

I was quite skeptical about creating these import functions at first. I considered leaving import up to the user — just pass in a Pandas DataFrame — but having built-in loaders felt more convenient. So far, StockData supports imports from:

SQLite
CSV
Excel
Pandas DataFrame
Python dictionary
JSON

(This process felt quite repetitive as I was just using built-in pandas functions or just straight-up copying documentation.)

To simplify things, I added anauto_loader() function that picks the correct import based on the file extension of source parameter. I used **kwargs so users can pass in additional parameters.

On top of that, StockData integrates directly with yfinance (optional dependency). This allows fetching live stock data for a given ticker and date range, making it much more practical.

For testing purposes, there’s also a generate_mock_data() function. It isn’t designed for real backtesting but is useful for experimenting with new features.

Data Export

Here is a question: why export data you already imported? Two reasons:

Users might want to inspect or clean their data after transformations.
I will soon integrate the indicator functions from earlier posts, with StockData so exporting results will be handy.

Export currently supports all the same formats mentioned in import, plus SQL. There is also a flexible to_custom() function that lets you define your own export method.

It was such a twist, this step turned out to be more about data flexibility rather than really "storing data." With StockData in place, stocksimpy now has a solid foundation for testing.

If you want to use this library in the future, or have any ideas that I could add, go for it. Ask me in comments, connect with me on socials. I want to make this project something useful.

Follow the rest of the series, watch me build in public.