Let's build a simple web crawler application on AWS, that sends a notification when there is a special offer for a product.
I love peanut butter, and whenever there is an offer for it I would like to stock up some. So I thought I will create a simple application that sends me a notification whenever there is an offer for this product. What I need, to create this application is:
- The product URL,
- And the exact location where the offer is located on the webpage, so I can extract this information. For this product, the offer is located in the
#pap-banner-text-value
HTML element.
Probably you want to get notified by different things, so you can customize the application logic. Perhaps you would like to get notifications when a GPU/XBOX/PS4 is back in stock or you want to pull data from an API and send notifications based on predefined criteria. It's up to you what the application sends notifications about.
Architecture
The AWS services we will use to create this application are:
- AWS EventBridge - For scheduling lambda function invocations
- AWS Lambda - For crawling the website and publishing messages to an SNS topic
- AWS SNS - For sending email notifications
We'll also use NodeJS for the Lambda function and Serverless for managing the infrastructure and deploying the application.
Setting up the development environment
First, we have to install the Serverless CLI.
npm install -g serverless
After installation, we have to configure the credentials. If you don't have it, you can generate new credentials on the AWS console.
serverless config credentials --provider aws --key 1234 --secret 5678
After we have installed the CLI, we can create a new project.
serverless create --template aws-nodejs --path offer-notification-application
The above command creates a skeleton project with a serverless.yml
file, where we'll define our infrastructure, and a handler.js
where we'll implement our Lambda function.
Implementation
Based on the above architecture diagram, we'll have a Lambda function, which will be invoked every day. This Lambda function will fetch the target website content, and whenever an offer is found it publishes a message to an SNS topic. Because our Lambda function will publish to an SNS topic, therefore it must have the necessary permissions to do this.
We also need an SNS topic where the offers will be published and an email subscriber which needs to be notified whenever a new message is published on this topic.
The following serverless.yml
file describes the above-mentioned architecture.
We also need a .env file with an EMAIL environment variable in it, set to the desired email address where we want to receive the notifications.
Now, that we have defined our infrastructure, we can move on and write our application logic.
We should create a new src
directory and move the handler.js
file under the newly created directory, to make it better structured.
What our Lambda function needs to do is fetch the target website content and check whether there is an interesting offer. For this, we need some additional packages to fetch the site (axios) and parse the HTML content (cheerio), so let's install them with the following command.
npm install axios cheerio
Now we have everything to implement the application core logic. For this product, it will look like the following.
As you can see the fetchOffer
function fetches the website content, and as we already know that the offer is located in the #pap-banner-text-value
HTML element, with cheerio we can easily extract the content of it.
Because I only want to get notifications when the offer is something like 2 FOR 1 or 30 % off, I need to check if the offer matches with one of the regular expressions.
In the handler function, the fetchOffer
function is invoked, and whenever it returns an offer it will be published to the SNS topic.
It's very simple, isn't it?
Deployment
Now we can deploy the application to AWS with a single command.
serverless deploy
After it's successfully deployed, we should receive a confirmation email to the configured email address. After we have confirmed the subscription, we'll receive an email from every message published on that SNS topic.
To remove the deployed application run the following command.
serverless remove
Testing
We can easily test the deployed application manually by invoking the Lambda function with the following command.
serverless invoke --function crawl
If there is an offer on the site, we should receive an email about the offer.
Summary
For creating this application we have used Serverless, to define the infrastructure and deploy our application. We also used AWS Lambda for running our code, scheduled events for triggering the Lambda function invocation, and SNS for sending email notifications to subscribers. As you have seen it was very easy to implement and deploy this application to AWS with Serverless.
You can check out the repository on GitHub.
Top comments (0)