This would work for preventing non-gist-github.com hosts, but I think we strayed away from this because it wouldn't prevent Bypass 2, where JS is injected via a raw gist link.
require'uri'defvalid?(link)uri=URI.parselinkreturnfalseifuri.scheme!='https'returnfalseifuri.userinforeturnfalseifuri.host!='gist.github.com'returnfalseifuri.port!=443# I think it has to be this if its https?returnfalseifuri.fragmentreturnfalseifuri.query# idk if old gist ids could be arbitrary chars,# but the 10 or so I looked at all seemed to be hexpath,gist=File.splituri.pathreturnfalseunlessgist&&gist.match?(/\A[0-9a-f]{32}\z/)path,user=File.splitpathreturnfalseunlessuser&&user.match?(/\A[a-zA-Z0-9\-]{1,39}\z/)path=='/'end
Plus, a bunch of test cases:
valid=[# suggested one"https://gist.github.com/QuincyLarson/4bb1682ce590dc42402b2edddbca7aaa",# max name size (based on the existing regex, I didn't verify that it matches Github's rules)"https://gist.github.com/aaaaaaaaaabbbbbbbbbbccccccccccddddddddd/4bb1682ce590dc42402b2edddbca7aaa",# min name size"https://gist.github.com/a/4bb1682ce590dc42402b2edddbca7aaa",# each of the gist id char values (seems to be hex)"https://gist.github.com/a/0123456789abcdef0123456789abcdef",# user can have a dash in their name"https://gist.github.com/quincy-larson/4bb1682ce590dc42402b2edddbca7aaa",# just for contrast with some of the tests below"https://gist.github.com/a/0000000000111111111122222222223a",]invalid=[# the same as the valid one, except it's http. I actually thought browsers# would refuse to make http requests from https sites, but either way, it# should be disallowed as it's MITMable"http://gist.github.com/QuincyLarson/4bb1682ce590dc42402b2edddbca7aaa",# host"https://evil.com/QuincyLarson/4bb1682ce590dc42402b2edddbca7aaa",# auth credentials"https://user:pass@gist.github.com/QuincyLarson/4bb1682ce590dc42402b2edddbca7aaa",# http port (not totally sure this matters)"https://gist.github.com:80/QuincyLarson/4bb1682ce590dc42402b2edddbca7aaa",# query"https://gist.github.com/QuincyLarson/4bb1682ce590dc42402b2edddbca7aaa?q",# fragment"https://gist.github.com/QuincyLarson/4bb1682ce590dc42402b2edddbca7aaa#f",# this one exceeds the max name size"https://gist.github.com/aaaaaaaaaabbbbbbbbbbccccccccccddddddddde/4bb1682ce590dc42402b2edddbca7aaa",# less than min name size"https://gist.github.com//4bb1682ce590dc42402b2edddbca7aaa",# invalid gist id chars (outside the hex range)"https://gist.github.com/a/0000000000111111111122222222223x",# quick test showed that GH was case sensitive about the hex in the gist id"https://gist.github.com/a/0000000000111111111122222222223A",# too few path segments"https://gist.github.com","https://gist.github.com/","https://gist.github.com/a",# too many path segments"https://gist.github.com/QuincyLarson/4bb1682ce590dc42402b2edddbca7aaa/4bb1682ce590dc42402b2edddbca7aaa",# these ones were used to bypass validity in the blog"//evil.com/script#gist.github.com","https://gist.github.com@evil.com/","https://gist.github.com/AntonyGarand/a8a0b4a36a040edc6051e888afce8fab/raw/4deb366ddaf0597e82fea808f7f4cb3ad763d98f/poc.js",]valid.all?{|link|valid?link}# => trueinvalid.none?{|link|valid?link}# => true
@ben , isn't there an URI class or something like that in Ruby? I think it should handle parsing links much better than custom regex.
Probably you can slightly simplify the code using URI::regexp, is this what you mean?
Kinda. I think Ruby provides something like:
This would work for preventing non-
gist-github.com
hosts, but I think we strayed away from this because it wouldn't prevent Bypass 2, where JS is injected via a raw gist link.We could do something like this:
I think with the regex though it would be redundant to check the host.
How about something like this?
Plus, a bunch of test cases:
Although, we should probably check whether you're allowed to have unicode in your username 🤔
@joshcheek gonna take that
valid?
method for checking giphy links. :)Also that's pretty great. Might end up implementing it for the gist Liquid tag.