DEV Community

Cover image for PowerShell Script: Scan documentation for broken links
Niels Swimburger.NET ๐Ÿ”
Niels Swimburger.NET ๐Ÿ”

Posted on • Originally published at swimburger.net on

PowerShell Script: Scan documentation for broken links

A lot of documentation will link to other locations on the web using URL's. Unfortunately, many URL's change over time. Additionally, it's easy to make typos or fat finger resulting in incorrect URL's.

Here's a small PowerShell script you can run on your documentation repositories and will tell you which URL's are not resolving in a proper redirect or HTTP StatusCode 200:

Param(
    [Parameter(Mandatory=$true)]
    [string] $DocsRootPath,
    [string] $FileFilter = "*.md"
)
Param(
    [Parameter(Mandatory=$true)]
    [string] $DocsRootPath
)
# use as ./CrawlDocsForBrokenLinks -DocsRootPath path/to/docs

# Url Regex specifically for Markdown keeping []() [][] into account
$UrlRegex = '((?:https?):\/\/[a-z0-9\.:].*?(?=[\s\]\[\)]))|((?:https?):\/\/[a-z0-9\.:].*?(?=[\s\]\[\)]))';
Get-ChildItem -Path $DocsRootPath -File -Recurse -Filter "*.md" `
    | Select-String -Pattern $UrlRegex -AllMatches `
    | ForEach-Object { 
    [Microsoft.PowerShell.Commands.MatchInfo]$MatchInfo = $PSItem; 
    $MatchInfo.Matches `
        | Where-Object { $_.Value.StartsWith('http://') -or $_.Value.StartsWith('https://') } `
        | ForEach-Object {
            $Value = $PSItem.Value;
            $Value = $Value.Trim('"').Trim("'");

            try {
                $Response =  Invoke-WebRequest `
                    -Uri $Value `
                    -UseBasicParsing `
                    -ErrorAction SilentlyContinue;
            }
            catch {
                $Response = $PSItem.Exception.Response;
                Write-Output "$([int]$Response.StatusCode) - $($MatchInfo.Path):$($MatchInfo.LineNumber) ($($Value))";
            }
        };
};
Enter fullscreen mode Exit fullscreen mode

The code does the following:

  • Finds files recursively for the given path, filtering to only markdown files
  • Inside of the files extract URL's using a Regular Expression
  • For each URL, make an HTTP Request and if not successful, write to the console with
    • statuscode
    • path to file
    • line number where the URL was found

Save the code to a file named CrawlDocsForBrokenLinks.ps1 and then you can use it by opening a PowerShell shell and invoking it like this:

./CrawlDocsForBrokenLinks.ps1 -DocsRootPath path/to/docs
# Output looks like this
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\README.md:22 (https://marketplace.visualstudio.com/items?itemName=docsmsft)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\ThirdPartyNotices.md:3 (https://creativecommons.org/licenses/by/4)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\best-practices-availability-paired-regions.md:90 (https://github.com/uglide/azure-content/blob/master/articles/resiliency/resiliency-technical-guidance)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\cloud-services-php-create-web-role.md:166 (http://127.0.0.1:81)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\cloud-services-php-create-web-role.md:170 (http://127.0.0.1:81)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\nodejs-use-node-modules-azure-apps.md:27 (https://github.com/woloski/nodeonazure-blog/blob/master/articles/startup-task-to-run-npm-in-azure)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-java-phone-call-example.md:175 (http://localhost:8080/TwilioCloud/callform)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-nodejs-how-to-use-voice-sms.md:134 (https://CHANGE_ME.azurewebsites.net/outbound_call)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-nodejs-how-to-use-voice-sms.md:244 (https://www.twilio.com/blog/2013/04/introduction-to-twilio-client-with-node-js)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-nodejs-how-to-use-voice-sms.md:246 (https://www.twilio.com/blog/2012/09/building-a-real-time-sms-voting-app-part-1-node-js-couchdb)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-nodejs-how-to-use-voice-sms.md:247 (https://www.twilio.com/blog/2013/06/pair-programming-in-the-browser-with-twilio)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-php-how-to-use-voice-sms.md:97 (https://github.com/twilio/twilio-php/blob/master/README)
#    404 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-php-how-to-use-voice-sms.md:145 (http://readthedocs.org/docs/twilio-php/en/latest/usage/rest)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-php-how-to-use-voice-sms.md:255 (https://github.com/twilio/twilio-php/blob/master/README)
#    0 - C:\Users\nswimberghe\source\repos\azure-docs\articles\partner-twilio-php-make-phone-call.md:26 (https://github.com/twilio/twilio-php/blob/master/README)
Enter fullscreen mode Exit fullscreen mode

You can redirect the output to a file like this:

./CrawlDocsForBrokenLinks.ps1 -DocsRootPath path/to/docs > brokenlinks.log
Enter fullscreen mode Exit fullscreen mode

When you open the log file with VSCode, you can ctrl+click on the path:linenumber combination and VSCode will open the file and put your cursor on the correct line number!

Hopefully, this makes it easier to maintain working URL's in your documentation. Good luck!

Top comments (0)