My buddy Adam had asked me if I was able to create a piece of functionality for his business Adam & Co (names are fictional). I had recently just earned my AWS Developer Associate certification and I thought it would be the perfect opportunity to build a project completely through AWS services and apply some of the things I had learned.
Project Overview
People are going into foreclosure on their homes in the county where my Adam lives. As part of the foreclosure process the owner's information is listed on the local county website. Adam & Co need to reach out to these people, to prevent them from losing there home.
However, they do not have the time and resources to go through the site manually, and sort the records ensuring they are not reaching out to the same people twice.
User Story
Adam & Co wanted a system that could automate the process of going through the county records, find only the new listings that have been published, and receive them in an email. So that Adam & Co can focus on reaching out to potential clients efficiently and effectively.
Solution
To solve this I was hoping I just could use a Lambda function and an API, however the county did not have the most up to date technology and had no public API's.
So I had to create a scraper, I chose Puppeteer which is very powerful and I have experience with. This technology choice came with a cost because with Puppeteer my node modules became too large to fit in a Lambda function, so I made a small Express app and placed it in EC2.
The express app runs Puppeteer to scrape the site and inserts the records into DynamoDB. To prevent old listings from being added, I use the case id of each record as a primary key and don't allow for duplicates. I had assumed that this would be the most difficult problem to solve but it ended up being the easiest.
The express server is triggered by CloudWatch every Monday morning right before Adam & Co get to the office. After the scraping is done, the app will grab the records that were generated that day and email them using AWS SES. The app consists of three main parts:
- Scrape the site for all listings
- Insert records to DynamoDB
- Email only the new records that were able to be created using SES
Technologies used:
- EC2
- CodeCommit
- Node.js
- Puppeteer
- CloudWatch Events
- SES
- DynamoDB
Here is a short video of the app running locally (not Headless like in production), certain portions are blurred to protect personal information:
Let me know what you think, and if you have any questions!
Top comments (0)