Here at Diffbot, we enjoy hacking away at fun new app concepts. Not all of them turn out great, but Crawly was (and still is) a hit. It passively generates around 15% of inbound organic traffic to Diffbot. It's a shame the poor thing hasn't been owned or maintained for over 8 years. Frankly, it's a miracle this thing still builds.
This week, at the unanimous behest of Diffbot support and dev ops, I take up the challenge of getting this little app patched. The good news is that I did in 2 days. The bad news is that it was still a miserable experience, even for a tiny hackathon project.
In the hopes of helping a fellow dev, I've organized my notes into a step by step guide. I'll point out some places where I got stuck, but in general will try to stick to the things that worked. Because every project is unique, your mileage will vary. But at least you won't be starting with this useless Google search result.
How Crawly is Built
Crawly is an Express app built on the bones of hackathon-starter, a boilerplate package for Node apps. Surprisingly, this project is still alive and well maintained in 2024. But there's no guarantee that the project still uses the same dependencies (it doesn't), and there's a likelihood that even more dependencies would be added (there were). So I abandoned any plans to merge the latest in.
Crawly's main dependencies are express@4.13.3, mongodb@3.6.3, jade@1.11.0 (templating), passport@0.3.2, and request@2.67.0. It also included mocha and phantomjs for testing.
Bootstrap@3.3.5 is vendorized and compiled at build time.
The app is dockerized for deployment. Running off a node:10 image.
Some notable omissions —
- No webpack. Bundling dependencies was still pretty cutting edge in 2014. Instead, Express middleware compiled dependencies like SCSS to CSS files at run time. (ew)
- No tests. There're some assertions here and there. But I'm not surprised. This was a hackathon project after all.
Step 1: Upgrade Node v10 to v22
The keystone upgrade. We'll switch Node to v22 and reinstall node modules.
$ nvm use 22.5.1
Delete your node_modules
folder, delete package-lock.json
, then run npm i
Get used to these messages. We're going to see a lot of it. For now, we can ignore all of the deprecation warnings and focus on the errors, specifcally —
npm ERR! code 7
npm ERR! path ......./node_modules/kerberos
npm ERR! command failed
npm ERR! command sh- c prebuild-install || node-gyp rebuild
What's going on here — one of the modules, kerberos
, is prevented from running a prebuild-install step.
Modules with a pre/post install step will often fail if their expected version of Node is not available. To get past this, we need to update these packages first to a version that does support Node v22.
Tip: If you're dockerizing your app, pick a node image and use the version of node it uses. See Step 4: Docker up.
Step 2: Update dependencies in package.json
I used npm-check-updates. It works pretty well!
$ npx npm-check-updates --format group
This will generate an output like this.
I took the easy road and updated them all at once. Who knows? Could work.
$ npx npm-check-updates -u
$ npm i
I love how useless these deprecation warnings are. Half of them are nested dependencies and it's impossible to identify the parent dependency without successfully installing it.
It appears we were able to get past the kerberos preinstall error this time. But node-sass
is still failing a post-install build step.
npm ERR! code 1
npm ERR! path /Users/jc/Diffbot/diffbot-crawler/node_modules/node-sass
npm ERR! command failed
npm ERR! command sh -c node scripts/build.js
I've run into this issue before. node-sass
is deprecated and replaced by sass
.
We'll remove node-sass
and the corresponding node-sass-middleware
packages from package.json, replacing it with the latest version of sass
.
$ npm uninstall --save node-sass node-sass-middleware
$ npm install --save sass
xcellent! We're on our way. Before we move forward, let's be sure to update our import references in the code.
Because node-sass
has been completed replaced with sass
, we'll run a simple find and replace in the project for node-sass
and node-sass-middleware
to ensure all imports for this package are replaced.
- var sass = require('node-sass-middleware');
+ var sass = require('sass');
Step 3: npm run start
All required dependencies are installed, despite copious deprecation warnings. Let's try to get this app running.
$ npm run start
var MongoStore = require('connect-mongo')(session);
^
TypeError: Class constructor MongoStore cannot be invoked without 'new'
A class constructor error with connect-mongo
. This is the first of our build errors. Likely, they won't be the same as ours. But they're all pretty googlable. I'll share the ones we ran into.
I resolved this MongoStore constructor error with a simple syntax change outlined in this StackOverflow answer.
dotenv.load({ path: '.env' });
^
TypeError: dotenv.load is not a function
Another easy syntax change. I followed the latest recommended syntax.
- dotenv.load({ path: '.env' });
+ var dotenv = require('dotenv').config();
app.use(sass({
^
TypeError: sass is not a function
Sass is back. Remember that express middleware I mentioned earlier? When we removed node-sass-middleware
, we also removed the shortcut function that converted SCSS files to CSS files for us in Express.
These days, it's a little heavy to compile SCSS at run time. Most web projects will use a tool like webpack
to compile and inject references to stylesheets at build time. All that is served at run time are lightweight static files.
I'm not in the mood to install and generate a webpack config to patch a tiny legacy app. The middleware wasn't doing much. The only output was a single main.css
file generated from a main.scss
file.
We'll write a simple script to do this.
/**
* Compiles main.scss to main.css
*/
const compileScss = () => {
const scssFilePath = path.join(__dirname, 'public', 'css', 'main.scss');
const cssFilePath = path.join(__dirname, 'public', 'css', 'main.css');
// Check if the SCSS file exists
if (fs.existsSync(scssFilePath)) {
// Compile SCSS to CSS
try {
let result = sass.compile(scssFilePath, {
sourceMap: true,
style: "expanded"
})
if (result.css) {
// Ensure the output directory exists
fs.mkdirSync(path.dirname(cssFilePath), { recursive: true });
// Write the compiled CSS to a file
fs.writeFileSync(cssFilePath, result.css);
}
else {
console.error("Couldn't generate main.css. See compileScss in app.js.")
}
}
catch (e) {
console.log(e)
}
}
}
We'll stick this function in app.listen()
so it runs once when express starts.
/**
* Start Express server.
*/
app.listen(app.get('port'), function() {
compileScss()
console.log('Express server listening on port %d in %s mode', app.get('port'), app.get('env'));
});
Onwards!
app.use(expressValidator());
^
TypeError: expressValidator is not a function
Another syntax issue. We'll get rid of this package altogether. It was included as part of hackathon-starter, but the original project owner never wrote any validation logic with it.
npm uninstall --save express-validator
Express started! But we're not out of the woods yet. The compileScss()
function we wrote failed to build the final CSS file. The .404 parsing error above is a simple fix. Class names cannot start with numerals.
While we're still seeing a bunch of deprecated package and syntax warnings, the app works. So
Step 4: Docker up
Our last major dependency to work out is Docker, and by extension, the separate but dependent mongodb
container.
The Dockerfiles are pretty straight forward. Let's get the easy stuff out of the way.
# FROM node:10
FROM node:22-bullseye-slim
# image: mongo:4.4
image: mongodb/mongodb-community-server:5.0.24-ubuntu2004
Note that instead of simply going with :latest
I chose specific images to make it less likely to break inadvertently on future version bumps.
The node:22-bullseye-slim
image runs Node v22.5.1. In hindsight I would've figured this out before I got the app running on my local machine. But it's not a breaking change to go from 20.3.1 to 22.5.1.
$ nvm install 22.5.1
$ nvm use 22.5.1
package.json
...
"engines": {
"node": "22.5.x"
},
...
For good measure we'll also create a .nvmrc
file.
v22.5.1
Drew, our dev ops lead, has thankfully maintained our docker setup over time. So there wasn't much in the way of Docker updates to make.
My initial docker compose up
led to an infinitely restarting mongo container with an AuthenticationFailed error. There wasn't clear guidance anywhere, but I was able to fix this by adding authSource=admin
to the mongodb URL (StackOverflow).
Isn't she beautiful? Except...
From the console —
/home/node/app/node_modules/mongoose/lib/model.js:545
throw new MongooseError('Model.prototype.save() no longer accepts a callback');
^
MongooseError: Model.prototype.save() no longer accepts a callback
Alas, callbacks out. Async in. But it was way too much work to refactor the code to use async functions, so I migrated all callbacks to use promises instead. (StackOverflow)
The was the single most time consuming step of this entire project. Every model CRUD function had to be updated. This means going through all the model functions listed in the Mongoose docs, finding them in the project, and replacing the callbacks with promises.
Here's an example of one.
// crawl.save(function(err) {
// if (err) {
// // Error
// }
// else {
// // Success
// }
// })
crawl.save()
.then(() =>{
// Success
})
.catch((err) => {
// Error
});
We have lift off.
Final Notes
Keep in mind that despite how I've laid this guide out, my actual experience involved over 50 open tabs of research and comfort hugs with my dog. And it wasn't even a huge project!
If you've been given the unfortunate task of updating a legacy project, I hope this helps.
Top comments (0)