DEV Community

Cover image for Command Line Warrior - filter out non-existing URLs
Vetle Leinonen-Roeim
Vetle Leinonen-Roeim

Posted on

Command Line Warrior - filter out non-existing URLs

Problem description

Given a file of URLs, how can we filter out any that gives us HTTP 404 Not Found and write the URLs that exist to a new file?

TL;DR

cat urls.txt \
  | xargs -I{} sh -c 'curl -sIL {} -w "%{http_code}" -o /dev/null \
    | grep -q -v 404 && echo {}' > ok_urls.txt
Enter fullscreen mode Exit fullscreen mode

Explanation

First, we pipe the list of URLs to xargs using cat:

cat urls.txt | xargs ...
Enter fullscreen mode Exit fullscreen mode

Using xargs we read the input from cat and execute a shell command for each line. The -I{} tells xargs that we want to replace the string {} with the input (in this case a URL).

Since we need to output the URL we got as an input, we will actually use this twice: once first when checking the URL, second when outputting the URL that was valid.

To run multiple commands for each line, we tell xargs to run a shell with another command, as specified with -c.

In the next part of the script, we first use curl to access the URL, telling it to be silent and don't give us output we don't need with -s, only give us the header -I and follow any redirects -L. To get the status code only, we use -w "%{http_code}". This flag can be used to tailor the output from curl. -o /dev/null sends any other output to somewhere it can be discarded.

To filter out 404 Not Found we can use grep -v which will match lines that do not contain 404, and -q makes grep be quiet. This way we can test only on the exit value of grep.

If you combine two commands with &&, the last one will only be run if the previous command was successful - so by putting && echo {} after grep, the URL will only be printed if grep was successful. Remember that {} is replaced with the URL by xargs!

Finally, we send the list of URLs to a new file, and we're done!

Happy hacking,
Vetle

Image of Docusign

Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more