DEV Community

Cover image for Github Language Stats Bar for a Pure Markdown Repo
Joshua Tzucker
Joshua Tzucker

Posted on • Originally published at joshuatz.com

Github Language Stats Bar for a Pure Markdown Repo

If you have ever used Github, you have probably noticed the language details statistics bar (or “language breakdown bar”):

The Github Language Statistics Bar

It doesn’t really have an impact on anything, but it can tell other devs at a glance what kind of tech is likely in your repo, and it does influence the search results when searching for a repo across Github.

Unfortunately, if you create a repo and fill it with nothing but markdown files (.md), the language breakdown bar will either not appear at all, or if you introduce some non-markdown files, it will only show those. What gives?

Both the problem and the solution are revealed with a little bit of background knowledge.

How is the language stats bar calculated?

Internally, Github uses a tool called Linguist to detect and classify code files. What you might not know though, is that Linguist also has a lot of rules for preventing a file from contributing towards statistics. A file is excluded from stats if it is classified as any of these:

  • Vendored-code
  • Generated-code
  • Documentation
  • Data
  • Prose

For details, see this section of the readme.

Why does Markdown not show up?

The first reason why Markdown does not show up is because it is immediately classified as “Prose”, which we know from above, is ignored when it comes to repo language statistics.

The second reason might or might not apply to you; filename and file structure. For example, if all your markdown files are named “readme.md” or are just in a folder called “Documentation”, they would be classified as “Documentation” and excluded. You can see this from the Regular Expressions patterns in the “documentation.yml” file.

Solution

It used to be that there was not a solution to this issue. However, over time, Github added "overrides" to Linguist, so now users can modify the default classification of files by overriding with git attributes (see issue #3964, which was solved by PR #3807, and example given in #3806).

How do we implement overrides? It is actually surprisingly pain-free.

First, create a ".gitattributes" file in the root of your repo. Then, use the standard git attributes syntax within that file to modify file attributes based on filename or pattern. Git Attributes are a way to assign specific, custom, and arbitrary attributes to specific files.

The linguist attributes you can override are:

  • linguist-documentation
  • linguist-langauge
  • linguist-vendored
  • linguist-generated
  • linguist-detectable

For example, here is a .gitattributes file where I have forced Github to count all Markdown files towards my language stats, except for readme.md:

# Linguist overrides
*.md linguist-detectable=true
README.md linguist-detectable=false
readme.md linguist-detectable=false

Technically, I believe toggling “linguist-detectable” should be all it takes to change whether or not a file is ignored, since it acts as a final override. If you wanted to be 100% sure that markdown is not excluded, you could also override any attribute that might prevent it from being recognized, like so:

*.md linguist-vendored=false
*.md linguist-generated=false
*.md linguist-documentation=false
*.md linguist-detectable=true

Final Notes:

Keep in mind that the solution found here is applicable towards many issues with Github stats; you can easily override any Linguist classifications with Git attributes.

Also, Pro Tip: You can easily check if your file patterns are correct in your .gitattributes file by using the "git check-attr" command. For example, to check the attributes of a markdown file with the filename "web-dev.md", in my folder "cheatsheets", I would use:

git check-attr -a cheatsheets/web-dev.md

Top comments (0)