DEV Community

Cover image for Let's find wrong URL in your file!!
KiminLee
KiminLee

Posted on

Let's find wrong URL in your file!!

Preparation

There are many times you are using URL in your html files or others.
Most of the time you don't know if the URL is wrong or not unless you actually send http request like GET, POST in your server side code. Handling them will potentially slow down your program and make your code more complex. Why do we first simply check if the URL that we are gonna use are all OK?

Start Coding

The best language to make CLI program for checking the URL is, I think, Node.js. There are hundreds of useful package for command line arguments, coloring output and request HTTP. Also, Mean stack is currently dominating the whole Web Program market, so why not using Node.js?

The very first thing is downloading 'axios', 'yargs' and 'chalk' for request, CLI and coloring output each. They makes life easier!! npm i axios yargs chalk

Make a custom argument

Thanks to 'yargs', we can make CLI program really easily.
.command() is for making a command, .example() .usage() will show the usage and example(how to use) of the commands. .alias() is making a option and make some alias for that option. More info

yargs
  .usage("Usage: url-tester <command> [options] <optionalFilename>")
  .command("start", "Test to find any broken URL")
  .example(
    "url-tester start -f foo1.html, foo2.txt (You can multiple files, delimiter is ',') ",
    " Test if there is any broken URL in the files"
  )
  .example(
    "url-tester start -f -a",
    " Test broken URL in the only 'html' files in the current dir"
  )
  .alias("f", "file")
  .alias("a", "all")
  .demandOption(["f"])
  .describe("f", "Load all specified files (delimiter is ',')")
  .describe("a", "Load all HTML files in the current dir")
  .version()
  .alias("v", "version")
  .help()
  .alias("h", "help")
  .demandCommand().argv;

The above code will make something like this

Usage: url-tester <command> [options] <optionalFilename>

Commands:
  index.js start  Test to find any broken URL

Options:
  -f, --file     Load all specified files (delimiter is ',')          [required]
  -a, --all      Load all HTML files in the current dir
  -v, --version  Show version number                                   [boolean]
  -h, --help     Show help                                             [boolean]

Examples:
  url-tester start -f foo1.html, foo2.txt   Test if there is any broken URL in  
  (You can multiple files, delimiter is     the files
  ',')
  url-tester start -f -a                    Test broken URL in the only 'html'  
                                            files in the current dir

Find URL in your files!

So, you might want to specify files or use all files in the current directory recursively. It is simply chosen by options.
[command] -f -a all files in the current folder or [command] -f filename[can be multiple files] just specific files.

// decide the option if it is -f or -a
if (yargs.argv.a || typeof yargs.argv.f !== "string") {
  const tmpFiles = fs.readdirSync(__dirname, { encoding: "utf-8" });

If choose [command] -f -a, Find out all html files in the current path. And test all URL in each file.

  // if -a, store all files into the files variable
  files = tmpFiles.filter((file) => {
    return file.toLowerCase().endsWith(".html");
  });
} else if (typeof yargs.argv.f === "string") {
  files = [yargs.argv.f];

If specify files.

  // if -f filename.txt, take all files and put into the files variables.
  if (yargs.argv._.length > 1) {
    for (let i = 1; i < yargs.argv._.length; i++) {
      files.push(yargs.argv._[i]);
    }
  }
}

And then, find all URL in the files

  const regex = /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)/g;
  const findURL = fileData.match(regex);  

Now you found all URL in your files. It is time to test them!

Send header request

All we need to only check header not the response data. To send a header request only, axios provides simple API axios.head(url,[config]).

axios will automatically redirect if the URL is *302 or *307 and *308. So when I accessed response.status They are all *200. (If I am wrong, please let me know! through github) BUT, I still implemented the code to check the status if it was one of them.

if (response.status === 301) {
        // implementation
      } else if (response.status === 307) {
        .
        .
        .

If the status is OK (*200), then everything is alright. If the status is bad (*404) or some other error, then user should recognize that.

console.log(
          chalk.black.bgGreen.bold(
            "In " + file + " file, the URL: " + url + " is success: "
          )
        );
        console.log(chalk.green.underline.bold("STATUS: " + response.status));
      }
    } catch (error) {
        // If 404 error :
      if (error.response) {
        console.log(
          chalk.white.bgRed.bold(
            "In " + file + " file, the URL: " + url + " is a bad url: "
          )
        );
        return console.log(
          chalk.red.underline.bold("STATUS: " + error.response.status)
        );
      }

If other error like time out and non-exist URL.

// non-exist URL
      if (error.code == "ENOTFOUND") {
        console.log(
          chalk.white.bgGrey.bold(
            "In " + file + " file, the URL: " + url + " is unknown url: "
          )
        );
        chalk.white(console.log(error.code));

        // timeout error
      } else if (error.code == "ETIMEDOUT") {
        console.log(
          chalk.white.bgGrey.bold(
            "In " + file + " file, the URL: " + url + " is TIMEOUT: "
          )
        );
        chalk.white.underline(console.log(error.code));
      } else {
          // server error or other error : error.code will indicate which error it has
        console.log(
          chalk.white.bgGrey.bold(
            "In " + file + " file, the URL: " + url + " has following issue: "
          )
        );
        chalk.white.underline(console.log(error.code));
      }

To improve

The basic feature of this simple CLI program is done. However there are so many room to improve its performance and functionality. I am still working on it, and it may be steadily improved one by one. If you are more interested in this Open Source Development project, please visit
github repo

Top comments (0)