DEV Community

Ali Sher
Ali Sher

Posted on

Grants to Investments Part 1: The Data

I was brainstorming some ideas for my next project while browsing the resources I have at my current co-op at the Ontario Public Service when an idea struck me.

Now, everyone knows that Government Grants are a huge opportunity for companies to jumpstart their journeys, and the amount of attention that AI is getting just makes it all the more hot of a topic. Just browsing The Government of Canada's Grants and Contributions Page at: https://search.open.canada.ca/grants/ and you will see a myriad of different listings.

So I thought, why not use this wealth of resources to create a solution that helps people judge which public sectors are getting the most funding?

This is where my current project is stepping in - I am ideating a solution that helps people find investment opportunities using publicly available grant knowledge put simply.

The steps are simple (in theory). It will feature an ETL pipeline where I will ingest data and feed it to a model that determines which sectors are the hottest of the week (or some timeframe). They will provide a quick summary via LLM as to what the AI opportunities are in each sector and whether that sector (and a listing of grants/companies) are worth looking into. Ideally, I will also extract data from another API source such as market data to support these findings (Making a note to myself to include that these are NOT Investment Advice, etc, etc.).

The data pulled will help people know where government spending (their money!) is going as well as give an opportunity to see which companies are benefitting as a result to not only themselves, but as good companies do, to the people as well.

But, first things first - how do I create a model to determine which grant fits into which category? Well, I have selected the following categories/sectors to look at:

CATEGORIES = [
"Housing & Shelter",
"Education & Training",
"Employment & Entrepreneurship",
"Business & Innovation",
"Health & Wellness",
"Environment & Energy",
"Community & Nonprofits",
"Research & Academia",
"Indigenous Programs",
"Public Safety & Emergency Services",
"Agriculture & Rural Development",
"Arts, Culture & Heritage",
"Civic & Democratic Engagement"
].

These are focal points for the Canadian government and will serve as a good basis to build a classification model. The flow is simple:

  1. Extract grants of a given time period.
  2. A classification model will determine which sector the grants belong to.
  3. Use an LLM/algorithm to determine which sectors are hottest.
  4. Compare data to market data (extracting recipient names, descriptions, etc.).
  5. Provide a summary to the user via frontend and save weekly results to the database.

The question is now, how to train the model? Well, let's do it ourselves! I have created a Python script that uses the open.canada.ca API to download a CSV of grants. They are then categorized by an LLM. This dataset will serve to train the model down the road. For now, you can find the datamining and collecting script here: https://github.com/Sher213/GrantsInvestments/tree/main

To really challenge myself, the ETL (and model) will all be done in Rust! I think it will be a really fun and novel experience.

More to come!

Ali

Top comments (0)