步骤:
1.首先在浏览器安装 "save as we "插件(用于把网页保存成HTML文件)
<火狐浏览器/QQ浏览器/360浏览器/谷歌浏览等都支持此插件>
2.获取一篇百度文库文章word/pdf格式等都可以(以<富甲美国>为例)
3.点击"save as we",跳出提示按continue save 继续就可以把网页保存为HTML,
4.完全之策已准备就绪,只欠东南风了!
5.制作HTML解析软件,在窗体上添加一个按钮,一个RichTextBox1文本框,一个textbox控件
6.直接上代码
Imports HtmlAgilityPack
Imports System.Text
Public Class Form1
Sub Get_YBQ()
If TextBox1.Text <> "" Then
RichTextBox1.Clear()
Dim url As String = TextBox1.Text
Dim wc As New HtmlWeb With {
.OverrideEncoding = Encoding.Default,
.AutoDetectEncoding = True
}
Dim htmldoc As HtmlDocument = wc.Load(url)
Dim rootNode As HtmlNode = htmldoc.DocumentNode
Try
Dim xl As HtmlNodeCollection = rootNode.SelectNodes("//div[@class=" & Chr(34) & "ie-fix" & Chr(34) & "]/p")
If xl IsNot Nothing Then
Dim strr As String = ""
For Each node As HtmlNode In xl
RichTextBox1.AppendText(node.InnerText)
Next
End If
Catch ex As Exception
MessageBox.Show(ex.Message)
End Try
End If
End Sub
Private Sub Button2_Click(sender As Object, e As EventArgs) Handles Button2.Click
OpenFileDialog1.Title = "请选择HTML文档"
OpenFileDialog1.Filter = "HTML文件|*.html|HTM文件|*.htm"
OpenFileDialog1.ShowDialog()
TextBox1.Text = OpenFileDialog1.FileName
If OpenFileDialog1.FileName <> "" Then
Get_YBQ()
End If
End Sub
End Class
7.此控件可以直接输入网址获取HTML和打开本地HTML文件进行解析(这里不用在线是因为百度文库网页有保护不能直接获取网页源码)
8.如有问题请添加QQ群提问
9.声明:本HTML解析只做技术交流,切勿用于非法用途,否则后果自负!谢谢合作!