<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Thinh Ong</title>
    <description>The latest articles on DEV Community by Thinh Ong (@thinhong).</description>
    <link>https://dev.to/thinhong</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F808409%2F94a1781c-98ab-4eae-a85f-d3a1c42ba12f.png</url>
      <title>DEV Community: Thinh Ong</title>
      <link>https://dev.to/thinhong</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/thinhong"/>
    <language>en</language>
    <item>
      <title>Academic portfolio: scrape publications from your Google Scholar profile with React</title>
      <dc:creator>Thinh Ong</dc:creator>
      <pubDate>Thu, 03 Feb 2022 05:53:33 +0000</pubDate>
      <link>https://dev.to/thinhong/academic-portfolio-scrape-publications-from-your-google-scholar-profile-with-react-14m2</link>
      <guid>https://dev.to/thinhong/academic-portfolio-scrape-publications-from-your-google-scholar-profile-with-react-14m2</guid>
      <description>&lt;p&gt;"Publish or perish", publication is super important in research. If you have a personal website, it would be a pain to manually update your publications, so why not scraping all publications from Google Scholar instead? Then you only need to maintain your Google Scholar profile and whenever there is a new published article, it will be automatically update on your personal website. Here I use React and decorate it with Chakra UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Set up a cors-anywhere server
&lt;/h2&gt;

&lt;p&gt;Google Scholar use CORS mechanism to secure data transfer, so you'll come across a CORS error like this when you try to fetch data from them.&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0Au2bJB5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/de8nbaynfpqil7y1krk1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0Au2bJB5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/de8nbaynfpqil7y1krk1.png" alt="CORS error" width="541" height="86"&gt;&lt;/a&gt;&lt;br&gt;
To overcome this, we need to set up a proxy server. You can create a heroku account for free and deploy a &lt;a href="https://github.com/Rob--W/cors-anywhere"&gt;cors-anywhere&lt;/a&gt; server (also free) with these simple commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/Rob--W/cors-anywhere.git
cd cors-anywhere/
npm install
heroku create
git push heroku master
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you have your own cors-anywhere server with an url like this &lt;code&gt;https://safe-mountain-7777.herokuapp.com/&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Create react app and install dependencies
&lt;/h2&gt;

&lt;p&gt;This will take some time so please bear with it, in terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;create-react-app scholarscraper
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Personally I use Chakra UI to style my website. We'll use axios to scrape the html and cheerio to extract the html data, so now let's install them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd scholarscraper
npm i @chakra-ui/react
npm i axios
npm i cheerio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Edit the App.js file
&lt;/h2&gt;

&lt;p&gt;I'll explain these step by step, at the end of this I also put a full version of the App.js file. &lt;/p&gt;

&lt;p&gt;First, we import all libraries&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import axios from 'axios';
import {Text, Link, ChakraProvider, Container} from "@chakra-ui/react";
import {useEffect, useState} from "react";
const cheerio = require('cheerio')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the &lt;code&gt;function App() {}&lt;/code&gt;, basically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We set the PROXY_URL which is the cors-anywhere server we deployed previously, then the URL to Google scholar&lt;/li&gt;
&lt;li&gt;Our articles will be stored in variable &lt;code&gt;articles&lt;/code&gt;, this is an array defined by &lt;code&gt;useState([])&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Make a get request to scholar with the proxy, this is super simple with &lt;code&gt;PROXY_URL + URL&lt;/code&gt;, we also paste the params with your user id. This is the id in your scholar profile url
&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TQ4pnFuv--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/163vmuoxlcrzskwnmexw.png" alt="User id" width="493" height="39"&gt;
&lt;/li&gt;
&lt;li&gt;Extract the elements with cheerio, here I extract title, authors, journal, number of citation and some links, if you want to extract more data, you can inspect the scholar website to get their classes and use my syntax
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    const PROXY_URL = 'https://safe-mountain-7777.herokuapp.com/';
    const URL = 'https://scholar.google.com/citations';
    const [articles, setArticles] = useState([]);

    useEffect(() =&amp;gt; {
        axios.get(PROXY_URL + URL, {
            params: {
                'user': 'PkfvVs0AAAAJ',
                'hl': 'en'
            }
        })
        .then(res =&amp;gt; {
            let $ = cheerio.load(res.data);
            let arrayArticles = [];
            $('#gsc_a_b .gsc_a_t').each((index, element) =&amp;gt; {
                const title = $(element).find('.gsc_a_at').text();
                const link = $(element).find('.gsc_a_at').attr('href');
                const author = $(element).find('.gsc_a_at + .gs_gray').text();
                const journal = $(element).find('.gs_gray + .gs_gray').text();
                arrayArticles.push({'title': title, 'link': link, 'author': author, 'journal': journal});
            })
            $('#gsc_a_b .gsc_a_c').each((index, element) =&amp;gt; {
                const cited = $(element).find('.gs_ibl').text();
                const citedLink = $(element).find('.gs_ibl').attr('href');
                arrayArticles[index]['cited'] = cited;
                arrayArticles[index]['citedLink'] = citedLink;
            })
            setArticles(arrayArticles);
        })
        .catch(err =&amp;gt; console.error())
    }, [])
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, render the UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;   return (
        &amp;lt;ChakraProvider&amp;gt;
            &amp;lt;Container maxW={'container.md'}&amp;gt;
                {articles.map(article =&amp;gt; {
                    return (
                        &amp;lt;&amp;gt;
                            &amp;lt;Link href={`https://scholar.google.com${article.link}`} isExternal&amp;gt;
                                &amp;lt;Text fontWeight={600} color={'teal.800'}&amp;gt;{article.title}&amp;lt;/Text&amp;gt;
                            &amp;lt;/Link&amp;gt;
                            &amp;lt;Text color={'gray.600'}&amp;gt;{article.author}&amp;lt;/Text&amp;gt;
                            &amp;lt;Text color={'gray.600'}&amp;gt;{article.journal}&amp;lt;/Text&amp;gt;
                            &amp;lt;Link href={article.citedLink} isExternal&amp;gt;
                                &amp;lt;Text color={'gray.600'}&amp;gt;Cited by {article.cited}&amp;lt;/Text&amp;gt;
                            &amp;lt;/Link&amp;gt;
                        &amp;lt;/&amp;gt;
                    )
                })}
            &amp;lt;/Container&amp;gt;
        &amp;lt;/ChakraProvider&amp;gt;
    )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The full App.js file is here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import axios from 'axios';
import {Text, Link, ChakraProvider, Container} from "@chakra-ui/react";
import {useEffect, useState} from "react";
const cheerio = require('cheerio')

function App() {
    const PROXY_URL = 'https://safe-mountain-19493.herokuapp.com/';
    const URL = 'https://scholar.google.com/citations';
    const [articles, setArticles] = useState([]);

    useEffect(() =&amp;gt; {
        axios.get(PROXY_URL + URL, {
            params: {
                'user': 'PkfvVs0AAAAJ',
                'hl': 'en'
            }
        })
        .then(res =&amp;gt; {
            let $ = cheerio.load(res.data);
            let arrayArticles = [];
            $('#gsc_a_b .gsc_a_t').each((index, element) =&amp;gt; {
                const title = $(element).find('.gsc_a_at').text();
                const link = $(element).find('.gsc_a_at').attr('href');
                const author = $(element).find('.gsc_a_at + .gs_gray').text();
                const journal = $(element).find('.gs_gray + .gs_gray').text();
                arrayArticles.push({'title': title, 'link': link, 'author': author, 'journal': journal});
            })
            $('#gsc_a_b .gsc_a_c').each((index, element) =&amp;gt; {
                const cited = $(element).find('.gs_ibl').text();
                const citedLink = $(element).find('.gs_ibl').attr('href');
                arrayArticles[index]['cited'] = cited;
                arrayArticles[index]['citedLink'] = citedLink;
            })
            setArticles(arrayArticles);
        })
        .catch(err =&amp;gt; console.error())
    }, [])

    return (
        &amp;lt;ChakraProvider&amp;gt;
            &amp;lt;Container maxW={'container.md'}&amp;gt;
                {articles.map(article =&amp;gt; {
                    return (
                        &amp;lt;&amp;gt;
                            &amp;lt;Link href={`https://scholar.google.com${article.link}`} isExternal&amp;gt;
                                &amp;lt;Text fontWeight={600} color={'teal.800'}&amp;gt;{article.title}&amp;lt;/Text&amp;gt;
                            &amp;lt;/Link&amp;gt;
                            &amp;lt;Text color={'gray.600'}&amp;gt;{article.author}&amp;lt;/Text&amp;gt;
                            &amp;lt;Text color={'gray.600'}&amp;gt;{article.journal}&amp;lt;/Text&amp;gt;
                            &amp;lt;Link href={article.citedLink} isExternal&amp;gt;
                                &amp;lt;Text color={'gray.600'}&amp;gt;Cited by {article.cited}&amp;lt;/Text&amp;gt;
                            &amp;lt;/Link&amp;gt;
                        &amp;lt;/&amp;gt;
                    )
                })}
            &amp;lt;/Container&amp;gt;
        &amp;lt;/ChakraProvider&amp;gt;
    )
}

export default App;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now start the app and enjoy your work&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;npm start
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The app will look like this:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--uIzJVPao--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pnj3sjgqsnmj2aw1ng8z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--uIzJVPao--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pnj3sjgqsnmj2aw1ng8z.png" alt="Demo" width="880" height="403"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Good luck!&lt;/p&gt;

</description>
      <category>react</category>
      <category>webdev</category>
      <category>beginners</category>
      <category>chakraui</category>
    </item>
  </channel>
</rss>
