DEV Community

loading...

Data gathering with python. An Example (part 1)

ViksaaSkool
software engineering, stand up comedy. very interesting person.
・2 min read

Big data, data science, machine learning, AI, deep learning, python - to a marketing person catchy selling point, to a mathematician interesting real life implementation of the language of nature, to a software engineer tools to solving real life problems (and help people become even lazier).

buzz

Don't be taken off by the poetic start of this post, there will be code.

Myself and as I presume, many software engineers nowadays, want to get into data science, comprehend enough of the underlying processes and math so I/we can properly implement algorithms and approaches to data sets and utilize them in form of useful applications.
Where do you start when you try to get into data science (in my case: NLP(Natural language processing))?

The data.
obviously

I had this problem - I wanted to gather tweets form stand up comedians in order to do analysis. My weapon of choice: python and colab (Google's version of Jupiter notebooks)

Problems:

  1. How to find the comedians?
  2. How to find the comedians on twitter?
  3. How to gather the tweet data (in separate post, coming soon)

Solution:

  1. Wikipedia - find list of all English speaking comedians
  2. Automated google search for twitter handles
  3. Use TwitterAPI - a bit restrictive lately (or try tо scrape?)

The code:

Resulting file:
Alt Text

Now is up to step 3...
Easier approach: use the TwitterAPI via some of the many python SDKs (note: for which you need to apply and wait for a decision if you're going to get API key) - can't search the past
More challenging approach: scrape twitter (brace yourself for "To Many Requests" issue), something that I'll cover in separate post in near future.

[SPOILER ALERT]: use nasty!

Discussion (0)