Skip to content

DEV Community

Fahmi Noor Fiqri

Posted on Dec 29, 2024

Trading Signal from Sentiment Analysis using Bright Data API

#devchallenge #brightdatachallenge #ai #webdev

This is a submission for the Bright Data Web Scraping Challenge: Most Creative Use of Web Data for AI Models

Now days, we can easily find many open-source trading bots to automate trading activities in hope to gaining profits. In this article, I will share my latest project, making use of Bright Data Web Scraper API and open-source LLM to create a simple trading signal dashboard.

The idea itself is not new and there are many trading bots that can use publicly available data from news and social media to create trading signal. Heck, even a goldfish could turn more profit trading stock than people in r/WallStreetBets as demonstrated in Michael Reeves video.

What I Built

I built a dashboard where you can view a so called "trading signal" from sentiment analysis of various news source. In this project, I used data from BBC, CNN, and Reuters as sentiment source. Then, I used Yahoo Finance to get the stock (AAPL, META, MSFT, NVDA) historical data. I also used Ollama and Llama 3.1 to predict the sentiment from the news.

Overall, the process is not complicated. Get the data, perform sentiment analysis, and overlay the sentiment in the stock price historical plot. I will use Streamlit and Plotly to plot the data.

Demo

You can check the web app here.

fahminlb33 / devto-brightdata-scraping-sentiment

Bright Data Hackathon: Trading Signal using Sentiment Analysis

This repo contains the source code for my submission for Bright Data Web Scraping Hackathon at DEV.to.

Setup

Use uv to install dependencies. Clone this repo and run uv sync to install the packages.

Running the Project

Trigger data collection API to scrape the news from multiple sources.

python scripts/scrape_api.py --api-key YOUR_API_KEY discover --output-file ./data/snapshot-bbc.jsonl --keywords 'apple,facebook meta,microsoft,nvidia' --engine bbc
python scripts/scrape_api.py --api-key YOUR_API_KEY discover --output-file ./data/snapshot-cnn.jsonl --keywords 'apple,facebook meta,microsoft,nvidia' --engine cnn
python scripts/scrape_api.py --api-key YOUR_API_KEY discover --output-file ./data/snapshot-reuters.jsonl --keywords 'apple,facebook meta,microsoft,nvidia' --engine reuters

Copy the contents of all 3 snapshot files into one, then download the scraped data.

python scripts/scrape_api.py --api-key YOUR_API_KEY download --snapshots-file ./data/snapshot-all.jsonl --output-path ./data/scraped

Then, run these notebooks in order:

Note: You will also need Ollama with Llama 3.1 to run the LLM extraction notebook.

Here's what you can explore in the web app:

Trading Signal, in this page you can see when a news articles are posted and its sentiment analysis result. In general, if the article provides a hopeful/positive opinion, the sentiment value will be +1 and -1 otherwise.

Green = positive outlook/bullish.
Red = negative outlook/bearish.

What we want is a green dot, followed by a rising stock price and vice versa, red dot followed by falling stock price. But as we can see above, it is not always the case.

Sentiment Analysis, in this page you can see some statistics from the sentiment analysis process, and you can also see the news headline and its corresponding sentiment. Sometimes, the LLM cannot reliably classify the sentiment.

How I Used Bright Data

I mainly used the Web Scraper API from Bright Data to collect the news articles. Bright Data has a wide selection of supported website for scraping, and thus greatly streamline the modelling process in this project. I only used a small subset of news website offered by Bright Data and I can already get a working prototype. We can definitely expand this project by adding more website, maybe even r/WallStreetBets to get more trading signals.

Even if the website you want to scrape is not available in the Web Scraper API, you can always create your own script or use the Scraping Browser service to build your very own data collection script. I actually did this for my first submission.

Prize Categories

Although I filled the hackathon category for the third prompt, I believe this project could fall into the second prompt too.

Final Thoughts

This is a really unexpected project. Originally, I planned to submit just one project, but in the last minute, I got inspired from watching Michael Reeves video and helping my friend finishing his research thesis, also using stock market data. Also, I can't believe I finished this project in less than 8 hours.

I will definitely will not finish this project fast if I have to manually create the scraping script from scratch and waiting for the data. Thanks to Bright Data Web Scraper API, I can quickly get the data I need.

Top comments (0)

Subscribe