首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在Nokogiri中获得独特的链接

如何在Nokogiri中获得独特的链接
EN

Stack Overflow用户
提问于 2013-08-27 17:09:33
回答 3查看 339关注 0票数 0

我有下面的html,它有两个重复的href。

代码语言:javascript
复制
<div class="pages">
  <a href="/search_results.aspx?f=Technology&Page=1" class="active">1</a>
  <a href="/search_results.aspx?f=Technology&Page=2">2</a>
  <a href="/search_results.aspx?f=Technology&Page=3">3</a>
  <a href="/search_results.aspx?f=Technology&Page=4">4</a>
  <a href="/search_results.aspx?f=Technology&Page=5">5</a>
  <a href="/search_results.aspx?f=Technology&Page=2">next &rsaquo;</a>
  <a href="/search_results.aspx?f=Technology&Page=6">last &raquo;</a>
</div> 

# p => is the page that has this html
# The below gives 7 as expected. But I don't need next/last links as they are duplicate    
p.css(".pages a").count

#So I tried uniq which obviously didnt work

p.css(".pages").css("a").uniq            #=> didn't work
p.css(".pages").css("a").to_a.uniq       #=> didn't work
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2013-08-27 17:16:37

尝试从匹配元素(el.attr('href'))中提取"href“属性:

代码语言:javascript
复制
html = Nokogiri::HTML(your_html_string)
html.css('a').map { |el| el.attr('href') }.uniq
# /search_results.aspx?f=Technology&Page=1
# /search_results.aspx?f=Technology&Page=2
# /search_results.aspx?f=Technology&Page=3
# /search_results.aspx?f=Technology&Page=4
# /search_results.aspx?f=Technology&Page=5
# /search_results.aspx?f=Technology&Page=6
票数 4
EN

Stack Overflow用户

发布于 2013-08-27 18:17:05

使用#xpath也可以这样做。以下是我要做的:

代码语言:javascript
复制
require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-HTML
<div class="pages">
  <a href="/search_results.aspx?f=Technology&Page=1" class="active">1</a>
  <a href="/search_results.aspx?f=Technology&Page=2">2</a>
  <a href="/search_results.aspx?f=Technology&Page=3">3</a>
  <a href="/search_results.aspx?f=Technology&Page=4">4</a>
  <a href="/search_results.aspx?f=Technology&Page=5">5</a>
  <a href="/search_results.aspx?f=Technology&Page=2">next &rsaquo;</a>
  <a href="/search_results.aspx?f=Technology&Page=6">last &raquo;</a>
</div> 
HTML

doc.xpath("//a/@href").map(&:to_s).uniq
# => ["/search_results.aspx?f=Technology&Page=1",
#     "/search_results.aspx?f=Technology&Page=2",
#     "/search_results.aspx?f=Technology&Page=3",
#     "/search_results.aspx?f=Technology&Page=4",
#     "/search_results.aspx?f=Technology&Page=5",
#     "/search_results.aspx?f=Technology&Page=6"]
票数 3
EN

Stack Overflow用户

发布于 2013-08-27 18:37:59

执行相同工作的另一种方法是在xpath表达式本身中处理uniq值选择:

代码语言:javascript
复制
require 'nokogiri'

doc = Nokogiri::HTML::Document.parse <<-HTML
<div class="pages">
  <a href="/search_results.aspx?f=Technology&Page=1" class="active">1</a>
  <a href="/search_results.aspx?f=Technology&Page=2">2</a>
  <a href="/search_results.aspx?f=Technology&Page=3">3</a>
  <a href="/search_results.aspx?f=Technology&Page=4">4</a>
  <a href="/search_results.aspx?f=Technology&Page=5">5</a>
  <a href="/search_results.aspx?f=Technology&Page=2">next &rsaquo;</a>
  <a href="/search_results.aspx?f=Technology&Page=6">last &raquo;</a>
</div> 
HTML

doc.xpath("//a[not(@href = preceding-sibling::a/@href)]/@href").map(&:to_s)
# => ["/search_results.aspx?f=Technology&Page=1",
#     "/search_results.aspx?f=Technology&Page=2",
#     "/search_results.aspx?f=Technology&Page=3",
#     "/search_results.aspx?f=Technology&Page=4",
#     "/search_results.aspx?f=Technology&Page=5",
#     "/search_results.aspx?f=Technology&Page=6"]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/18471613

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档