re: Pwned Together: Hacking VIEW POST

re: @ben , isn't there an URI class or something like that in Ruby? I think it should handle parsing links much better than custom regex.

Probably you can slightly simplify the code using URI::regexp, is this what you mean?


Kinda. I think Ruby provides something like:

# pseudocode
link = URI.parse "" #

This would work for preventing hosts, but I think we strayed away from this because it wouldn't prevent Bypass 2, where JS is injected via a raw gist link.

We could do something like this:

URI.parse(link).host == "" &&
  (link =~ /^https\:\/\/gist\.github\.com\/([a-zA-Z0-9\-]){1,39}\/([a-zA-Z0-9]){32}\s/)

I think with the regex though it would be redundant to check the host.

How about something like this?

require 'uri'

def valid?(link)
  uri = URI.parse link
  return false if uri.scheme != 'https'
  return false if uri.userinfo
  return false if != ''
  return false if uri.port != 443 # I think it has to be this if its https?
  return false if uri.fragment
  return false if uri.query

  # idk if old gist ids could be arbitrary chars,
  # but the 10 or so I looked at all seemed to be hex
  path, gist = File.split uri.path
  return false unless gist && gist.match?(/\A[0-9a-f]{32}\z/)

  path, user = File.split path
  return false unless user && user.match?(/\A[a-zA-Z0-9\-]{1,39}\z/)

  path == '/'

Plus, a bunch of test cases:

valid = [
  # suggested one
  # max name size (based on the existing regex, I didn't verify that it matches Github's rules)
  # min name size
  # each of the gist id char values (seems to be hex)
  # user can have a dash in their name
  # just for contrast with some of the tests below

invalid = [
  # the same as the valid one, except it's http. I actually thought browsers
  # would refuse to make http requests from https sites, but either way, it
  # should be disallowed as it's MITMable
  # host
  # auth credentials
  # http port (not totally sure this matters)
  # query
  # fragment
  # this one exceeds the max name size
  # less than min name size
  # invalid gist id chars (outside the hex range)
  # quick test showed that GH was case sensitive about the hex in the gist id
  # too few path segments
  # too many path segments
  # these ones were used to bypass validity in the blog

valid.all?    { |link| valid? link } # => true
invalid.none? { |link| valid? link } # => true

Although, we should probably check whether you're allowed to have unicode in your username 🤔

@joshcheek gonna take that valid? method for checking giphy links. :)

Also that's pretty great. Might end up implementing it for the gist Liquid tag.

code of conduct - report abuse