DEV Community

Discussion on: Scrape Google Scholar with Python

Collapse
 
deepeshsagar profile image
deepeshsagar

Hi, this is very useful article. Definitely reduces the time. I'm planning to get all articles within the last month. Can you tell me how to do that. Below two actions. 1) default display of search is sort by " by relevance". how to set it to "sort by date" 2) there is information on number of days before article published. how to get it?

Collapse
 
dmitryzub profile image
Dmitriy Zub ☀️

Hi, @deepeshsagar ! I'm glad that the article helped you somehow!

Use sortby=pubdate query parameter which will sort by published date.

In articles example the link would look like this: https://scholar.google.com/citations?hl=en&user=m8dFEawAAAAJ&sortby=pubdate

Or you can add a params dict() to make it more readable and faster to understand:

params = {
   "user": "m8dFEawAAAAJ",
   "sortby": "pubdate",
   "hl": "en"
}

html = requests.get('https://scholar.google.com/citations', params=params)
# further code..
Enter fullscreen mode Exit fullscreen mode

I updated code on replit so you can test in the browser (try to remove sortby param and see the difference in first articles).

Collapse
 
dmitryzub profile image
Dmitriy Zub ☀️ • Edited

@deepeshsagar i've just updated blog post and now you're able to extract all available articles from author page. This is possible because of pagination i've added.

🐱‍👤