How To Sanitize HTML In Rails That Actually Works [Solved]

How To Sanitize HTML In Rails That Actually Works

Ruby on Rails LogoAssuming you don’t want to simply escape everything, sanitizing user input is one of the relative weak points of the Rails framework. On SpeakerRate, where users can use Markdown to format comments and descriptions, we’ve run up against some of the limitations of Rails’ built-in sanitization features, so we decided to dig in and fix it ourselves.

In creating our own sanitizer, our goals were threefold: we want to let a subset of HTML in. As the Markdown documentation clearly states, “for any markup that is not covered by Markdown’s syntax, you simply use HTML itself.” In keeping with the Markdown philosophy, we can’t simply strip all HTML from incoming comments, so the included HTML::WhiteListSanitizer is the obvious starting point.

Additionally, we want to escape, rather than remove, non-approved tags, since some commenters want to discuss the merits of, say, <h2 class="h2">. Contrary to its documentation, WhiteListSanitizer simply removes all non-whitelisted tags. Someone opened a ticket about this issue in August of 2008 with an included patch, but the ticket was marked as resolved without ever applying it. Probably for the best, as the patch introduces a new bug.

Finally, we want to escape unclosed tags even if they belong to the whitelist. An unclosed <strong> tag can wreak havoc on the rest of a page, not to mention what a <div> can do. Self-closing tags are okay.

With these requirements in mind, we subclassed HTML::WhiteListSanitizer and fixed it up. Introducing, then:

HTML::StathamSanitizer. User-generated markup, you’re on notice: this sanitizer will take its shirt off and use it to kick your ass. At this point, I’ve written more about the code than code itself, so without further ado:

  1. module HTML
  2. classStathamSanitizer<WhiteListSanitizer
  4. protected
  6. def tokenize(text, options)
  7. do|token|
  8. if token.is_a?(HTML::Tag)&& options[:parent].include?(
  9. token.to_s.gsub(/</,“&lt;”)
  10. else
  11. token
  12. end
  13. end
  14. end
  16. def process_node(node, result, options)
  17. result <<case node
  18. when HTML::Tag
  19. if node.closing ==:close && options[:parent].first ==
  20. options[:parent].shift
  21. elsif node.closing !=:self
  22. options[:parent].unshift
  23. end
  25. process_attributes_for node, options
  27. if options[:tags].include?(
  28. node
  29. else
  30. bad_tags.include?( node.to_s.gsub(/</,“&lt;”)
  31. end
  32. else
  33. bad_tags.include?(options[:parent].first)?nil: node.to_s.gsub(/</,“&lt;”)
  34. end
  35. end
  36. end
  37. end