This is a submission for the Bright Data Web Scraping Challenge: Most Creative Use of Web Data for AI Models
Now days, we can easily find many open-source trading bots to automate trading activities in hope to gaining profits. In this article, I will share my latest project, making use of Bright Data Web Scraper API and open-source LLM to create a simple trading signal dashboard.
The idea itself is not new and there are many trading bots that can use publicly available data from news and social media to create trading signal. Heck, even a goldfish could turn more profit trading stock than people in r/WallStreetBets
as demonstrated in Michael Reeves video.
What I Built
I built a dashboard where you can view a so called "trading signal" from sentiment analysis of various news source. In this project, I used data from BBC, CNN, and Reuters as sentiment source. Then, I used Yahoo Finance to get the stock (AAPL, META, MSFT, NVDA) historical data. I also used Ollama and Llama 3.1 to predict the sentiment from the news.
Overall, the process is not complicated. Get the data, perform sentiment analysis, and overlay the sentiment in the stock price historical plot. I will use Streamlit and Plotly to plot the data.
Demo
You can check the web app here.
Bright Data Hackathon: Trading Signal using Sentiment Analysis
This repo contains the source code for my submission for Bright Data Web Scraping Hackathon at DEV.to.
Setup
Use uv
to install dependencies. Clone this repo and run uv sync
to install the packages.
Running the Project
Trigger data collection API to scrape the news from multiple sources.
python scripts/scrape_api.py --api-key YOUR_API_KEY discover --output-file ./data/snapshot-bbc.jsonl --keywords 'apple,facebook meta,microsoft,nvidia' --engine bbc
python scripts/scrape_api.py --api-key YOUR_API_KEY discover --output-file ./data/snapshot-cnn.jsonl --keywords 'apple,facebook meta,microsoft,nvidia' --engine cnn
python scripts/scrape_api.py --api-key YOUR_API_KEY discover --output-file ./data/snapshot-reuters.jsonl --keywords 'apple,facebook meta,microsoft,nvidia' --engine reuters
Copy the contents of all 3 snapshot files into one, then download the scraped data.
python scripts/scrape_api.py --api-key YOUR_API_KEY download --snapshots-file ./data/snapshot-all.jsonl --output-path ./data/scraped
Then, run these notebooks in order:
Note: You will also need Ollama with Llama 3.1 to run the LLM extraction notebook.
Here's what you can explore in the web app:
Trading Signal, in this page you can see when a news articles are posted and its sentiment analysis result. In general, if the article provides a hopeful/positive opinion, the sentiment value will be +1 and -1 otherwise.
- Green = positive outlook/bullish.
- Red = negative outlook/bearish.
What we want is a green dot, followed by a rising stock price and vice versa, red dot followed by falling stock price. But as we can see above, it is not always the case.
Sentiment Analysis, in this page you can see some statistics from the sentiment analysis process, and you can also see the news headline and its corresponding sentiment. Sometimes, the LLM cannot reliably classify the sentiment.
How I Used Bright Data
I mainly used the Web Scraper API from Bright Data to collect the news articles. Bright Data has a wide selection of supported website for scraping, and thus greatly streamline the modelling process in this project. I only used a small subset of news website offered by Bright Data and I can already get a working prototype. We can definitely expand this project by adding more website, maybe even r/WallStreetBets
to get more trading signals.
Even if the website you want to scrape is not available in the Web Scraper API, you can always create your own script or use the Scraping Browser service to build your very own data collection script. I actually did this for my first submission.
Prize Categories
Although I filled the hackathon category for the third prompt, I believe this project could fall into the second prompt too.
Final Thoughts
This is a really unexpected project. Originally, I planned to submit just one project, but in the last minute, I got inspired from watching Michael Reeves video and helping my friend finishing his research thesis, also using stock market data. Also, I can't believe I finished this project in less than 8 hours.
I will definitely will not finish this project fast if I have to manually create the scraping script from scratch and waiting for the data. Thanks to Bright Data Web Scraper API, I can quickly get the data I need.
Top comments (0)