DEV Community

Ali Deishidi
Ali Deishidi

Posted on

Crawling Glassdoor

header
So a week ago I read a post here about Ruby would kill Python in the future. There are lots of debates, as always, on the comment section of that post. Someone mentions that it does not matter who would kill whom, every language or tool is suitable for something; You have to pick right tool for right work.

I think that is the right answer, however, it is important to consider that what market (or industry) thinks about tools and languages. Do they want a Ruby programmer as much as they want a Python developer?

To find out, I created a small Ruby project which does these tasks:

  • Crawl Glass-door pages for predefined cities for software job ads
  • Store pages
  • Create anagram to count number of occurrences of each keyword (such as Python or Ruby
  • Generate yml and png files to visualize how industry is in need for each skill.

Here is the output for total ten cities around the world. Remember, this is the number of ads that contains each keyword. For example if ad contains Java then it increase the number of total ads that contains this specific keyword.

Software Languages

And here is the number of technologies mentions:

Software Technologies

So finally which language is in demand more than others?

The answer is easy: Java. But if you are looking for script languages, then the answer is Python, Javascript and then Ruby. However there are interesting findings when you compare results for individual cities.

For example this is an output for Amsterdam:
Amsterdam stats
Notice anything unusual? yes Scala is in demand even more than Pyhton or Javascript!

See the rest of these reports here. It includes chart for New york, Berlin, London, Toronto, Singapore, Dubai, Tallinn and etc.

About the project

The structure is easy. First there is a configuration file which you can define the cities, keywords, categories and etc.

urls:
  - Tallinn;https://www.glassdoor.ca/Job/tallinn;jobs
job_types:  
  - software
  - back-end
  - front-end
category:
  - languages
  - technologies
languages:
  - java
  - javascript 
  - c 

Then by running client.rb it will get first URL from the configuration file, crawls the web page, saves all URL specific parameter for each listing page, gets the second page and repeats it until the last page.

After that another class will crawl the web site again. This time it download the whole ad page and save it on disk.

The third class then create an anagram of all predefined keywords and scans every document that we saved in previous step. We save the results as a yml file then.

Here is the sample of output:

languages:
  java: 324
  javascript: 196
  c: 75
  c#: 140
  c++: 144
technologies:
  kafka: 41
  nosql: 60

At the end with the help of Gruff Gem we generate images from YAML files.

Side notes

  • This could be helpful if you are investigating your next career path or your next language to learn. Nothing serious more than that.
  • The project is pretty much configurable. Just update the config file: add what city you want, the first URL, what keywords you are looking for and what categories. Then run it (wait minutes to get all the data) and check the output on result folder. link to project
  • Have fun!

Top comments (6)

Collapse
 
darkain profile image
Vincent Milum Jr

As someone who actively looks at these listings daily trying to find jobs, just seeking out keywords really isn't the best approach here.

Most job listings in software will list something along the lines of: "Experience in one or more of the following languages" - even ones they don't use at the company, because they know a dev can transition from one language to another.

Java just so happens to be a very common language in CS in schools in the past decade or two, with python being one of the main languages now. So job postings, regardless of what the job is ACTUALLY for, will list these jobs, because they know the dev can transition into their environment.

Collapse
 
sizief profile image
Ali Deishidi

I agree with you Vincent. But consider this project as a big picture of what companies want.

Collapse
 
renegadecoder94 profile image
Jeremy Grifski

Yay, my java articles aren’t pointless! Nice article, by the way. :)

Collapse
 
_hs_ profile image
HS

Joking aside it's gona take a lot battle testing for other techs to prove they can replace JVM stuff (frameworks, libraries...) that can be used with Java hence Java will take some time to go out of usefulness. Look at Groovy today and compare to last year on TIOBE (more searches means more interest more interest means more devs want to work in it/learn it for some reason). Basically you need more articles about Java as it's still not profitable to switch existing systems to "cool" languages :D

Collapse
 
orimdominic profile image
Orim Dominic Adah

Goodness! I was thinking of dropping Java for Kotlin in Android cos Google supports Kotlin for Android now, but there's Java right up there.. With a King's crown 😭😭

Succinct article 👏

Collapse
 
_hs_ profile image
HS

Kotlin could be just temp solutions as Dart and Flutter are more focused on if you take a look at Fuchsia OS and consider that many people like to do cross-platform not just Android as customers expect both products and it's easier (requires less money from the employers) to develop in 1 technology. Kotlin did not manage to provide good enough replacement for services although people still switch to it because it's the hype.

Recently wrote article about JVM stuff and indexes on such ranking pages. Thing is Groovy was niche and Kotlin was talked about too much but this month Groovy climbed to #20 place from #68 last year and still is today (30.4.2019) above Kotlin. It was a quick jump if you check the history. Kotlin is still 35th.

Hyped technologies usually go away as quick as they come. Remember Node.js - people still use it but no one thinks it's gona replace Java or C# as before. It may in the future but there's less and less people thinking so.

FP was the boom recently also but many developers refuse to switch to it and this is one of the reasons people also talked about Kotlin. But hey for FP on JVM do you need another language or is Scala good enough? Pure FP people say Scala is not FP enough because it supports other stuff to. Well guess what people care about easy to do in and quick to do in and easy to understand. Not a lot people care about math and so not a lot people care about next cool language. They know some stuff they work in it and are proven to create good things in it while their employers care only about market feedback and why would they wait for them to learn another tech or switch to people that know it already but have no proven quality products behind them.

It's a matter of money I think.