Preparation
There are many times you are using URL in your html files or others.
Most of the time you don't know if the URL is wrong or not unless you actually send http request like GET, POST in your server side code. Handling them will potentially slow down your program and make your code more complex. Why do we first simply check if the URL that we are gonna use are all OK?
Start Coding
The best language to make CLI program for checking the URL is, I think, Node.js. There are hundreds of useful package for command line arguments, coloring output and request HTTP. Also, Mean stack is currently dominating the whole Web Program market, so why not using Node.js?
The very first thing is downloading 'axios', 'yargs' and 'chalk' for request, CLI and coloring output each. They makes life easier!! npm i axios yargs chalk
Make a custom argument
Thanks to 'yargs', we can make CLI program really easily.
.command()
is for making a command, .example() .usage()
will show the usage and example(how to use) of the commands. .alias()
is making a option and make some alias for that option. More info
yargs
.usage("Usage: url-tester <command> [options] <optionalFilename>")
.command("start", "Test to find any broken URL")
.example(
"url-tester start -f foo1.html, foo2.txt (You can multiple files, delimiter is ',') ",
" Test if there is any broken URL in the files"
)
.example(
"url-tester start -f -a",
" Test broken URL in the only 'html' files in the current dir"
)
.alias("f", "file")
.alias("a", "all")
.demandOption(["f"])
.describe("f", "Load all specified files (delimiter is ',')")
.describe("a", "Load all HTML files in the current dir")
.version()
.alias("v", "version")
.help()
.alias("h", "help")
.demandCommand().argv;
The above code will make something like this
Usage: url-tester <command> [options] <optionalFilename>
Commands:
index.js start Test to find any broken URL
Options:
-f, --file Load all specified files (delimiter is ',') [required]
-a, --all Load all HTML files in the current dir
-v, --version Show version number [boolean]
-h, --help Show help [boolean]
Examples:
url-tester start -f foo1.html, foo2.txt Test if there is any broken URL in
(You can multiple files, delimiter is the files
',')
url-tester start -f -a Test broken URL in the only 'html'
files in the current dir
Find URL in your files!
So, you might want to specify files or use all files in the current directory recursively. It is simply chosen by options.
[command] -f -a
all files in the current folder or [command] -f filename[can be multiple files]
just specific files.
// decide the option if it is -f or -a
if (yargs.argv.a || typeof yargs.argv.f !== "string") {
const tmpFiles = fs.readdirSync(__dirname, { encoding: "utf-8" });
If choose [command] -f -a
, Find out all html files in the current path. And test all URL in each file.
// if -a, store all files into the files variable
files = tmpFiles.filter((file) => {
return file.toLowerCase().endsWith(".html");
});
} else if (typeof yargs.argv.f === "string") {
files = [yargs.argv.f];
If specify files.
// if -f filename.txt, take all files and put into the files variables.
if (yargs.argv._.length > 1) {
for (let i = 1; i < yargs.argv._.length; i++) {
files.push(yargs.argv._[i]);
}
}
}
And then, find all URL in the files
const regex = /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)/g;
const findURL = fileData.match(regex);
Now you found all URL in your files. It is time to test them!
Send header request
All we need to only check header not the response data. To send a header request only, axios provides simple API axios.head(url,[config])
.
axios will automatically redirect if the URL is *302 or *307 and *308. So when I accessed response.status
They are all *200. (If I am wrong, please let me know! through github) BUT, I still implemented the code to check the status if it was one of them.
if (response.status === 301) {
// implementation
} else if (response.status === 307) {
.
.
.
If the status is OK (*200), then everything is alright. If the status is bad (*404) or some other error, then user should recognize that.
console.log(
chalk.black.bgGreen.bold(
"In " + file + " file, the URL: " + url + " is success: "
)
);
console.log(chalk.green.underline.bold("STATUS: " + response.status));
}
} catch (error) {
// If 404 error :
if (error.response) {
console.log(
chalk.white.bgRed.bold(
"In " + file + " file, the URL: " + url + " is a bad url: "
)
);
return console.log(
chalk.red.underline.bold("STATUS: " + error.response.status)
);
}
If other error like time out and non-exist URL.
// non-exist URL
if (error.code == "ENOTFOUND") {
console.log(
chalk.white.bgGrey.bold(
"In " + file + " file, the URL: " + url + " is unknown url: "
)
);
chalk.white(console.log(error.code));
// timeout error
} else if (error.code == "ETIMEDOUT") {
console.log(
chalk.white.bgGrey.bold(
"In " + file + " file, the URL: " + url + " is TIMEOUT: "
)
);
chalk.white.underline(console.log(error.code));
} else {
// server error or other error : error.code will indicate which error it has
console.log(
chalk.white.bgGrey.bold(
"In " + file + " file, the URL: " + url + " has following issue: "
)
);
chalk.white.underline(console.log(error.code));
}
To improve
The basic feature of this simple CLI program is done. However there are so many room to improve its performance and functionality. I am still working on it, and it may be steadily improved one by one. If you are more interested in this Open Source Development project, please visit
github repo
Top comments (0)