DEV Community

Cover image for How to sanitize XML tags in Rails
Vladislav Kopylov
Vladislav Kopylov

Posted on

How to sanitize XML tags in Rails

Once I noticed that we can sanitize XML-tags using rails-html-sanitizer and loofah gems. And I want to share the knowledge.

For example, imagine the task, we have a string that contains some HTML-tags.

html_string = <<-STR
<p>
  <span>some text is here</span>
  <a><img src="lala.png" /></a>
</p>
STR
Enter fullscreen mode Exit fullscreen mode

We want to sanitize the string, but don't delete <img> tag.

scrubber = Rails::Html::PermitScrubber.new
scrubber.tags = ['img']
scrubber.attributes = ['src']
html_fragment = Loofah.fragment(html_string)
html_fragment.scrub!(scrubber)

puts html_fragment.to_s
Enter fullscreen mode Exit fullscreen mode

Of course, it works perfectly, and our result is here.

# some text is here
# <img src="lala.png">
Enter fullscreen mode Exit fullscreen mode

Unfortunately, it won't work with tags which name contains symbols :, -. XML-tags often contain those symbols.

xml_string = <<-STR
<item>
  <title>A Life in Russia</title>
  <description>What do you knot about Russia?</description>
  <dc:creator>Sasha Troianovski</dc:creator>
  <media:content height="150" medium="image" url="https://static.worldtimes.com/images/2099/02/13/world/some_photo.jpg" width="151"/>
  <media:credit>Sasha Troianovski for The World Times</media:credit>
  <media:description>Amazing travel to Russia</media:description>
</item>
STR
Enter fullscreen mode Exit fullscreen mode

For example, we want to sanitize a new string, but we need to keep media:content, media:credit and media:description tags.

scrubber = Rails::Html::PermitScrubber.new
scrubber.tags = ['media:content', 'media:credit', 'media:description']
html_fragment = Loofah.fragment(xml_string)
html_fragment.scrub!(scrubber)

puts html_fragment.to_s
Enter fullscreen mode Exit fullscreen mode

Unfortunately, it doesn't work properly, and our result is.

# A Life in Russia
# What do you knot about Russia?
# Sasha Troianovski

# Sasha Troianovski for The World Times
# Amazing travel to Russia
Enter fullscreen mode Exit fullscreen mode

How to solve the problem? Loofah is able to work with XML but we have to tune up a parser and use .xml_fragment instead of .fragment.

scrubber = Rails::Html::PermitScrubber.new
scrubber.tags = ['media:content', 'media:credit', 'media:description']
xml_fragment = Loofah.xml_fragment(xml_string)
xml_fragment.scrub!(scrubber)

puts xml_fragment.to_s
Enter fullscreen mode Exit fullscreen mode

And here is our result.

# A Life in Russia
# What do you knot about Russia?
# Sasha Troianovski
# <media:content height="150" width="151"/>
# <media:credit>Sasha Troianovski for The World Times</media:credit>
# <media:description>Amazing travel to Russia</media:description>
Enter fullscreen mode Exit fullscreen mode

It works perfectly 😊

AWS Security LIVE!

Join us for AWS Security LIVE!

Discover the future of cloud security. Tune in live for trends, tips, and solutions from AWS and AWS Partners.

Learn More

Top comments (0)

Image of Datadog

The Essential Toolkit for Front-end Developers

Take a user-centric approach to front-end monitoring that evolves alongside increasingly complex frameworks and single-page applications.

Get The Kit

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay