php - Best Practice: User generated HTML cleaning - Stack Overflow

I'm coding a WYSIWYG editor width designMode="on" on a iframe. The editor works fine and

I'm coding a WYSIWYG editor width designMode="on" on a iframe. The editor works fine and i store the code as is in the database.

Before outputing the html i need to "clean" with php on the server-side to avoid cross-site-scripting and other scary things. Is there some sort of best practice on how to do this? What tags can be dangerous?

UPDATE: Typo fixed, it's What You See Is What You Get. Nothing new :)

I'm coding a WYSIWYG editor width designMode="on" on a iframe. The editor works fine and i store the code as is in the database.

Before outputing the html i need to "clean" with php on the server-side to avoid cross-site-scripting and other scary things. Is there some sort of best practice on how to do this? What tags can be dangerous?

UPDATE: Typo fixed, it's What You See Is What You Get. Nothing new :)

Share Improve this question edited May 5, 2010 at 14:33 Martin asked May 5, 2010 at 14:26 MartinMartin 5,30711 gold badges48 silver badges60 bronze badges 2
  • If you're determined to implement this yourself, you'd better have a look at ha.ckers/xss.html - a list of known attacks in various browsers. – FalseVinylShrub Commented May 6, 2010 at 13:44
  • Great question - I have wondered how stackoverflow protects itself... – JDelage Commented Mar 25, 2011 at 17:03
Add a ment  | 

4 Answers 4

Reset to default 5

The best practice is to allow only certain things you know aren't dangerous, and remove/escape all the rest. See the paper Automated Malicious Code Detection and Removal on the Web (OWASP AntiSamy) for a discussion on this (the library is for Java, but the principles apply for any language).

If you're really bent on allowing this, you should use a white list approach.

The best approach is probably to disallow HTML and use a simplified markup format instead; you can pre-render to HTML and store that in the database if performance is a concern. Avoiding these sorts of problems is one of the big reasons for using Markdown, Textile, reStructuredText, etc.

NOTE: I linked to GitHub-Flavored Markdown (GFM), not Standard Markdown (SM). GFM addresses some mon problems that end-users have with SM.

I looked into the same question recently with Perl as the server-side language.

While doing so I ran into HTML Purifier which may be what you want. But obviously as it's in PHP and not Perl, I didn't actually test it out.

Also, in my research I came to the conclusion that this is a very tricky business and consider if possible using a simplified markup language like Markdown, as suggested by Hank Gay.

If you are familiar with ASP .NET, just perform a Server.htmlencode() to convert special characters like < > to "& g t;" "&l t ;"

In php, you can use htmlspecialchars() functions.

Once the special characters are encoded, cross-site-scripting can be prevented.

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744724664a4590109.html

相关推荐

  • php - Best Practice: User generated HTML cleaning - Stack Overflow

    I'm coding a WYSIWYG editor width designMode="on" on a iframe. The editor works fine and

    13小时前
    10

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信