默认时,Python正则中的.
是不能匹配换行符的,如果碰到下面这种带有换行的js字符串该怎么办呢?
下面用到的
js2py
,是一个用Python执行js,可对JavaScript渲染的库。这里用来拼接出真正的url
。
import re
import js2py
txt = '''
(new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + 'b9be9b04-7bcd-4a70-b412-70e1eb33fd1c' + '&token=' + '0177FFA5CCF44B442226BA55C2563A922371B60D5DF19CE0' + '&from=inner';
setTimeout(function () {
var url = '';
url += 'http://mp.w';
url += 'eixin.qq.co';
url += 'm/s?src=11&';
url += 'timestamp=1';
url += '576115412&v';
url += 'er=2029&sig';
url += 'nature=3OfX';
url += 'g*vTl0xc6Uv';
url += 'afcTMAEg9B8';
url += 'Ed0UQLlh744';
url += '19o9uA1j0KFuh1W99OnNadkNegwwNkr5B7kI4g7k9vQzqb-BPoSoEESUUcMlerw99vocCRWur0Fp9fVATo*2aTRYiUo&new=1';
url.replace("@", "");
window.location.replace(url)
},100);
'''
# 这里用的是`.*?`匹配换行符
url_var = re.search('(var url.*?url\.replace\("@", ""\);)', txt).group(1)
url_rendered = js2py.eval_js(url_var)
print(url_rendered)
强行照上面写的话,结果就会报错。
解决方法之一,是使用[\s\S]*?
代替.*?
,[\s\S]
是可以匹配包括换行符的任意字符的。
import re
import js2py
txt = '''
(new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + 'b9be9b04-7bcd-4a70-b412-70e1eb33fd1c' + '&token=' + '0177FFA5CCF44B442226BA55C2563A922371B60D5DF19CE0' + '&from=inner';
setTimeout(function () {
var url = '';
url += 'http://mp.w';
url += 'eixin.qq.co';
url += 'm/s?src=11&';
url += 'timestamp=1';
url += '576115412&v';
url += 'er=2029&sig';
url += 'nature=3OfX';
url += 'g*vTl0xc6Uv';
url += 'afcTMAEg9B8';
url += 'Ed0UQLlh744';
url += '19o9uA1j0KFuh1W99OnNadkNegwwNkr5B7kI4g7k9vQzqb-BPoSoEESUUcMlerw99vocCRWur0Fp9fVATo*2aTRYiUo&new=1';
url.replace("@", "");
window.location.replace(url)
},100);
'''
# 这里用的是`[\s\S]*?`匹配换行符
url_var = re.search('(var url[\s\S]*?url\.replace\("@", ""\);)', txt).group(1)
url_rendered = js2py.eval_js(url_var)
print(url_rendered)
解决方法之二,设置re.DOTALL
,就可以使.
匹配换行符了,如下:
import re
txt = '''
(new Image()).src = 'https://weixin.sogou.com/approve?uuid=' + 'b9be9b04-7bcd-4a70-b412-70e1eb33fd1c' + '&token=' + '0177FFA5CCF44B442226BA55C2563A922371B60D5DF19CE0' + '&from=inner';
setTimeout(function () {
var url = '';
url += 'http://mp.w';
url += 'eixin.qq.co';
url += 'm/s?src=11&';
url += 'timestamp=1';
url += '576115412&v';
url += 'er=2029&sig';
url += 'nature=3OfX';
url += 'g*vTl0xc6Uv';
url += 'afcTMAEg9B8';
url += 'Ed0UQLlh744';
url += '19o9uA1j0KFuh1W99OnNadkNegwwNkr5B7kI4g7k9vQzqb-BPoSoEESUUcMlerw99vocCRWur0Fp9fVATo*2aTRYiUo&new=1';
url.replace("@", "");
window.location.replace(url)
},100);
'''
pattern = re.compile(r'(var url.*?url\.replace\("@", ""\);)', re.DOTALL)
res = pattern.search(txt).group(1)
print(res)