DEV Community

Philipp Hansch
Philipp Hansch

Posted on • Originally published at phansch.net on

6

Testing your Jekyll Website with HTMLProofer

While I was adding Capybara tests for my Jekyll Website, I also stumbled upon HTMLProofer.

HTMLProofer allows to test your HTML output. It can check alt tags of images, if links are working and a few more things.

I decided to give it a try on my website. Initially I was surprised at the amount of output and decided that it was too much to fix all at once. So first, I only had it check for broken links. It found about 30 of 200 outgoing links that were not working.

Most of them were part of of my Hackership series where I sometimes linked to local startups that apparently didn’t make it until today. Unfortunately Link Rot is a thing.

Before I dealt with all the broken links, I started to integrate HTMLProofer into the test suite by adding a custom Rake task:

task :html_proofer do
  build_dir = File.join(File.dirname( __FILE__ ), '_site')
  unless File.directory?('test/_site')
    `jekyll build -d #{build_dir} -V`
  end
  opts = {
    url_ignore: [/localhost/],
    empty_alt_ignore: true,
    file_ignore: [/slides/],
    typhoeus: {
      ssl_verifyhost: 0,
      ssl_verifypeer: false,
      timeout: 30
    }
  }
  HTMLProofer.check_directory(build_dir, opts).run
end
Enter fullscreen mode Exit fullscreen mode

Using rake html_proofer it builds the site and runs HTMLProofer with the given options on the Jekyll output. You can check the Travis CI integration in script/ci.rb and .travis.yml.

If you run into SSL issues with HTMLProofer, you may have to install libcurl4-openssl-dev on Travis.

The last thing I did, was to fix the links, as it was the least compelling part of the task. There are many reasons why a link may be broken and almost each cause can be handled differently.

  • A missing article may be caused by a new URL structure and forgotten redirects, so I looked around on these sites and tried to find the correct link if possible.

  • Broken domains are a lost cause most of the times, although some startups had renamed themselves or were bought up, so using the new domain makes sense there.

  • Domains may be unreachable only temporarily, so I don’t want to remove the link and instead whitelist it.

This all took some time, but it paid off and now I can be certain that there’s no broken links in any of the HTML on this website.

As mentioned in the beginning, HTMLProofer has a couple of more nice features, but I didn’t get around to trying them, yet. In the next post I will probably have a look at the other features.

Heroku

Build apps, not infrastructure.

Dealing with servers, hardware, and infrastructure can take up your valuable time. Discover the benefits of Heroku, the PaaS of choice for developers since 2007.

Visit Site

Top comments (1)

Collapse
 
tterb profile image
Brett Stevenson • Edited

This is very cool, there's also a jekyll plugin that I found after reading this. After running it for the first time I got a similar abundance of results, I figure I'll have to chip away at it over the next few days...

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more