Build a Website Snapshot Service in 5 Minutes With StdLib and Node.js
Janeth May 16
In this tutorial., I will show you how to schedule snapshots of a website using StdLib! If you want to trigger a snapshot of a website every month, week, day or even every minute-StdLib’s got you! The best part of this is that you won’t have to fill up your inbox or storage space with screenshots, they will automatically store in Wayback Machine at https://web.archive.org/.
If you don’t know, the Internet Archive is a non-profit that has been working hard to build the digital library of the Internet since 1996! Through StdLib’s snapshot service, the community will have more significant input on what to preserve for the future on Web Archive!
So, let’s get to it!
What You’ll Need Beforehand
-1x Command Line Terminal With Node.js Installed
-5x Minutes (or 300x Seconds)
Minute 1: StdLib Account Setup
You’ll need a StdLib account to deploy your daily snapshot service. Getting started with StdLib is easy — head over to our website, choose a username and sign up for free!
Once you have created an account, all of your services will be stored and published under your username. For example, your snapshot service handler will be called:
lib.<username>.DailySnapShot (or whatever creative name you decide to give your service).
Minute 2: Install the StdLib Command Line Tools
Before you begin deploying services to StdLib, you’ll need to install our open source command line tools. If you don’t have at least Node.js version 8.x installed, you can download the latest version, along with npm, here.
Once complete, install the StdLib CLI by opening up a terminal and running:
$ npm install lib.cli -g
This gives you access to the
lib command for service management and execution. Next, create a
stdlib directory for your StdLib services.
$ mkdir stdlib
$ cd stdlib
$ lib init
You’ll be asked to log in using the credentials you created your account with. That’s it, you are ready to build and deploy!
Minute 3: Creating a StdLib Service
You’ll now want to create a StdLib service for your snapshot. I’ve provided a @JanethL/DailySnapShot source (template) so that you can get your service up and running with very little effort. In the
stdlib directory you just created, type:
$ lib create -s @JanethL/DailySnapShot
Next, You’ll be asked to enter a
Service Name, I named my service
TrumpsTwitterArchive because I have chosen to monitor the tweets that Trump deletes- you should select a name relevant to the website you are monitoring. Once your service has been created, enter the service directory by running:
$ cd <username>/servicename
Fire up your favorite editor text editor with
$ code . or something similar to open the directory.
The process should look like this on your terminal:
Once your editor is open, you’ll want to enter the
/__main__.js file in your service’s directory under
/functions and change the url on line 7 to reflect your desired website’s url.
The final step, enter your
/package.json file and give your service a short description. My service description reads: “Takes a snapshot of Trump’s Twitter and stores it on https://webarchives.org." This step isn’t required to create your service, but I recommend it because it will help you stay organized as well as help others to understand what your service does.
Make sure to save the change, return to your terminal and deploy your function by running:
$ lib up dev
In order to run a scheduled task, you need to push an immutable release version with
$ lib release:
$ lib release
Awesome! Your service is now available at:
Minute 4: Setting a Task on StdLib to Trigger Your Snapshot Service
We are almost done! Head over to your StdLib dashboard at https://dashboard.stdlib.com/dashboard/#/ Scroll down on the left sidebar menu and click “Scheduled Tasks.” Here you will be able to search up your released service.
Once you find and select the service you want to run as a task, choose which function within the service you want to execute. You can give your task a name and select how often you want your service to trigger a snapshot, anywhere from once a minute to once a week. After filling out the function parameters, you can easily run a test execution to ensure that your task does what you want it to do. A screenshot will be stored immediately in Wayback Machine. To check if it was stored, simply copy and paste the resulting URL into your browser, you should see a snapshot of your selected website.
And that’s it! You should see your scheduled task listed under “My Tasks”!
Minute 5: Explore the Wayback Machine
Whenever you need to get ahold of your snapshot all you have to do is go into https://archive.org/ and search up the website’s URL. You will be able to locate the precise date, and time your service took a snapshot.
Thanks for reading! I hope this article has been helpful showing you how easy it is to schedule a task using Standard Library!
I would love for you to comment here or e-mail me at Janeth [at] stdlib [dot] com, or follow StdLib on Twitter, @StdLibHQ or @mss_ledezma! Please let me know if you’ve built anything exciting that you would like StdLib team to feature or share!
Janeth Ledezma is the Community Manager for StdLib and recent graduate from UC Berkeley — go bears! When she isn’t learning the Arabic language, or working out, you can find her riding Muir Wood’s loop or exploring Marin County with a group of riders.