DEV Community

Cover image for Identify Stocks on Reddit with SpaCy (NER in Python)
James Briggs
James Briggs

Posted on

3 1

Identify Stocks on Reddit with SpaCy (NER in Python)

We will learn how to process unstructured text data from Reddit and extract organization names so that any further analysis is automatically classified and results assigned to the correct stocks.

Organizations are mentioned in each subreddit in a variety of formats. Typically we will find two formats:

  • Organization name, eg Tesla/Tesla Motors
  • Ticker symbol, eg TSLA, tsla, or $TSLA

We also need to be able to differentiate between tickers and other abbreviations/slang -some of these are unclear like AI (AI can mean both artificial intelligence and refer to the ticker symbol for C3.ai).

So, we need a reasonable competent NER process to accurately classify our data.

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more