DEV Community

Cover image for Web-Scraping Discord Bot
Jakub Jabłoński
Jakub Jabłoński

Posted on • Updated on

Web-Scraping Discord Bot

🚨 There's a new, more advanced version of this bot that you can checkout for free 🚨
Visit notifast.me to learn more

Intro

You can get a serious advantage on the internet when you can code your own tools. This post is about how I got my apartment with a web-scraping Discord bot.

The problem

Apartment market in Poland is a mess with the current state of the world. The prices are crazy high because of inflation and demand is even higher with millions of people form Ukraine running away from the war. There I was, in the situation where I needed to find a new place to rent in less than 3 weeks. There were a few popular websites for listing places but the issue was that they were gone after a few minutes since the listing was posted. I needed a way to get notified as soon as possible.

The solution

The solution was very simple. To get an advantage I decided to create a bot that checks the site every single minute and sends me a message on Discord when it finds a new listing.

I decided to go with the following stack:

  • Node.js
  • Prisma
  • Discord.js
  • Puppeteer

Hosted on Heroku free tier (R.I.P)

The results

First in one day I did a quick working prototype with JS and hardcoded one single URL and it was working as a charm for a few days. 90% of the times I was the first one to call but it was still difficult because of people that were blindly paying before seeing the place.

Then after checking out additional websites I decided to make it more generic and open for other people to use. I changed it to TypeScript created a simple Discord bot where you can add your own URL and get notified when a new item appears.

The new version of the bot checking 3 different websites ran for a little over a week notifying me about hundreds of new listings and I finally got the perfect place for me. I'm absolutely sure I wouldn't be able to get it without the bot.

This is how it turned out with a simple instruction on how to use it:

Adding new scraper job

Bot is working for every single site with dynamic content (amazon, ebay, etc.). The only requirement is that the service has unique links to items.

Use /create-job command and fill in the form:

  • name - name of the job
  • url - url that the bot will scrape
  • selector - selector for an a tag with link to element (examples below)
  • interval - interval of the job in minutes (min 1 minute)
  • active? - default true, if the job should be active right away (you can always enable/disable job with [enable/disable]-job commands)
  • channel? - defualt channel where the command was run which channel should be messaged when the new item appear
  • clean? - default true, if the query params should be ignored (essential for some sites like Ebay)

image

After that you can do basic CRUD opeartions on jobs with commands like /list-jobs, /update-job, /delete-job...

Wait...

After that you simply wait for the job run and it will send message when new items are found:

image

You can also run the job manually with /run-job. Note that the first run will get all the elements that will be on the webiste.

image

Code bits

You can check out the whole project here: https://github.com/jjablonski-it/webscraper-bot

Main function

This is the main function that runs when the bot starts. It checks if the guild exists in the database and if not it saves it. Then it registers the Discord slash commands for the guild. After that it runs the web server and the interval jobs.

const client = await getClient()
const existingGuilds = await getGuilds()

client.guilds.cache.forEach((guild) => {
  const guildExists = existingGuilds.some((g) => g.id === guild.id)
  if (!guildExists) {
    saveGuild(guild.id)
  }
  registerCommands(CONFIG.CLIENT_ID, guild.id)
})

runWebServer()
runIntervalJobs()
Enter fullscreen mode Exit fullscreen mode

Web scraper

This is the very basic web scraping function that extracts the links of new elements from the website.

export const scrapeLinks = async (
  url: string,
  selector: string
): Promise<Array<string>> => {
  const page = await browser.newPage()
  await page.setCacheEnabled(false)
  await page.goto(url)
  await page.waitForSelector(selector)
  const links = await page.$$(selector)
  const res = await Promise.all(
    links.map((handle) => handle.getProperty('href'))
  )
  const hrefs = (await Promise.all(
    res.map((href) => href.jsonValue())
  )) as string[]
  await page.close()
  return hrefs
}
Enter fullscreen mode Exit fullscreen mode

Web server

This was only to health-check the app and wake it up from Heroku.

const server = createServer((_req, res) => {
  res.write('OK')
  res.end()
})

export const runWebServer = () => {
  const { PORT } = CONFIG
  server.listen(PORT, () => console.log(`Server listening on port ${PORT}`))
}
Enter fullscreen mode Exit fullscreen mode

Interval jobs

This is the function that handles all the jobs form all discord channels.

export const runIntervalJobs = async () => {
  let i = 0
  const run = async () => {
    const jobs: Job[] = await getJobs()
    const jobsToRun = jobs.filter(
      ({ interval, active }) => i % interval === 0 && active
    )
    console.log(`${jobsToRun.length} jobs in queue`)
    await runJobs(jobsToRun)
    i++
  }
  run()
  const intervalId = setInterval(run, 1000 * (isDev ? 10 : 60))
  return intervalId
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

I'm very happy with the results and I'm sure I will use this bot in the future. It's currently working until the Heroku free tier dies but if anyone will be interested I will move it somewhere else and maybe improve it a bit further. It is working but is no way near perfect so feel free to contribute if you are interested.

Bot icon generated by DALL·E 2

Top comments (1)

Collapse
 
anishkgt profile image
anishkgt

Enjoyed it half way and the commands on discord are missing and cannot delete, add or update anything after creating a job