Get top 50 web traffic sites with Python

We can get the top 50 web traffic sties with these two site traffic monitor services:

and the top 50 web traffic sites are:

With python requests and BeautifulSoup modules, we can automate list the top 50 web sites from these two monitor services.

First we create a dict that store these two monitor service urls and selectors (for BeautifulSoup select):

webRankSites = {
  "Alexa": {
    "url": "",
    "selector": "div.DescriptionCell"
    "url": "",
    "selector": "td.topRankingGrid-cell.topWebsitesGrid-cellWebsite.showInMobile"
How to define the selector? We need to check these two services url content with the site list:

  1. Alexa:
    Alexa selector
    As the developer tools show, the web site is in the element div with class DescriptionCell, the selector is "div.DescriptionCell".

  2. SimilarWeb:
    SimilarWeb selector
    The web site is in the element td with 3 classes topRankingGrid-cell, topWebsitesGrid-cellWebsite, showInMobile. The selector is "td.topRankingGrid-cell.topWebsitesGrid-cellWebsite.showInMobile".

Second we start to get the url content with requests.get and with BeautifulSoup selector patterns to get the web site list (myheaders is used for similarWeb service, since no user-agent will result response status code 403):

myheaders = {"user-agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Firefox/78.0(HTTP_USER_AGENT)"}

for site in webRankSites:
  print("site: " + site)
  resp = requests.get(webRankSites[site]["url"], headers = myheaders)
  soup = BeautifulSoup(resp.text, 'html.parser')
  items =[site]["selector"])
  i = 1
  for item in items:
    print(str(i) + ". " + item.text.strip())
Then we can get the result:

site: Alexa
site: SimilarWeb
Wow, the result can be automate to get and it looks great. Wanna try? Check this demo:

And enjoy it! Happy coding!!

Since the similarweb response the request of these code without the ranking list, the list of similarweb will not appear. I've check the resp.text:

      <title>Pardon Our Interruption</title>
              <div class="Title">Pardon Our Interruption...</div>
               <div class="Paragraph">As you were browsing something about your browser made us think you were a bot. There are a few reasons this might happen:</div>
               <ul class="ListContainer">
                  <li class="ListItem">You're a power user moving through this website with super-human speed.</li>
                  <li class="ListItem">You've disabled JavaScript in your web browser.</li>

And I have no idea to solve this interruption. Any idea?

Someone provide a method that add a header with
"authority": ""
and help to get result from (already update the code )