问在遍历选择器列表时，Xpath选择器不会过滤出类
EN

Stack Overflow用户

提问于 2019-07-21 23:02:25

回答 5查看 264关注 0票数 0

我正在刮这个网站：https://www.oddsportal.com/darts/europe/european-championship/results/

这个站点使用javascript来呈现表数据，因此我在一个坞容器中使用了scrapy-splash插件。

我想过滤掉类‘暗中心’的所有行，同时迭代选择器列表‘tableRow’。但是，当迭代时，xpath选择器会在每次迭代中查询整个SelectorList，而不是每一项。

tableRows = response.xpath('//table[contains(@id, "tournamentTable")]/tbody/tr')

    for row in tableRows:
        print(row)
        if row.xpath('//*[contains(@class, "dark center")]') is not None:
            print(True)

我的产出：

<Selector xpath='//table[contains(@id, "tournamentTable")]/tbody/tr' data='<tr class="dark center" xtid="39903"><th'>
True
<Selector xpath='//table[contains(@id, "tournamentTable")]/tbody/tr' data='<tr class="center nob-border"><th class='>
True

为什么班里的“中心边框”还真？

python

python-3.x

scrapy

云联络中心6.1折起

灵活稳定的一体化云联络中心，助力快速搭建集电话、在线交流、音视频通话为一体的客户联络平台

回答 5

Stack Overflow用户

回答已采纳

发布于 2019-07-22 00:33:13

你搞错了，XPath。看看this answer.，您忽略了第二个XPath表达式中的点。简言之：

# Search document root for mentioned node.
row.xpath('//*[contains(@class, "dark center")]')

# In fact it's the same as
response.xpath('//*[contains(@class, "dark center")]')

# Search element root for mentioned node(what you're really need) is
row.xpath('./*[contains(@class, "dark center")]')
# or .//*[contains(@class, "dark center")] possibly, depending on DOM structure

大更新这里.啊哈..。事实上，我真的很傻。好吧..。实际上，您的代码中有两个错误。第一个是我提到的Xpath表达式。第二种是比较算子。

row.xpath('any XPath here') is not None

永远都会回到真实。因为函数返回类型是一个列表，所以它可以是空的，但它永远不可能是NoneType。就这么办了。我还改进了Xpath选择器..。最后，您需要的完全准确的代码是：

tableRows = response.xpath('//table[contains(@id, "tournamentTable")]/tbody/tr')

for row in tableRows:
    print(row)
    if row.xpath('./self::tr[contains(@class, "dark center")]'):
        print(True)

票数 1

Stack Overflow用户

发布于 2019-07-22 09:54:16

这里的主要问题是下载的页面中没有dark center。这些类是在页面加载后，由一些javascript代码创建的。如果你在View Page Source搜索它们，你就找不到它们。

但是，您想要的数据在另一个url中。类似于：https://www.oddsportal.com/ajax-sport-country-tournament-archive/14/rwHQ6U5F/X0/1/0/1/?_=1563816992686

$ curl -s https://www.oddsportal.com/ajax-sport-country-tournament-archive/14/rwHQ6U5F/X0/1/0/1/\?_\=1563816992686 | cut -c -500

-|-{"s":1,"d":{"html":"<table class=\" table-main\" id=\"tournamentTable
\"><colgroup><col width=\"50\" \/><col width=\"*\" \/><col width=\"50\" 
\/><col width=\"50\" \/><col width=\"50\" \/><col width=\"50\" \/><col width=\"50\" \/><\/colgroup><tbody><tr class=\"dark center\" xtid=\"39903\"
><th class=\"first2 tl\" colspan=\"7\"><a class=\"bfl sicona s14\" href=\"
\/darts\/\">Darts<\/a><span class=\"bflp\">\u00bb<\/span><a class=\"bfl\"
href=\"\/darts\/europe\/\"><span class=\"ficon f-6\">&nbsp;<\/