使用Scrapy可以很方便的完成网上数据的采集工作,它为我们完成了大量的工作,而不需要自己费大力气去开发。
Scrapy Tutorial
在本文中,假定您已经安装好Scrapy。...import SgmlLinkExtractor
from bbs.items import BbsItem
class forumSpider(CrawlSpider):
# name of...bbs.sjtu.edu.cn']
start_urls = [ 'https://bbs.sjtu.edu.cn/bbsall' ]
link_extractor = {
'page': SgmlLinkExtractor...(allow = '/bbsdoc,board,\w+\.html$'),
'page_down': SgmlLinkExtractor(allow = '/bbsdoc,board,...\w+,page,\d+\.html$'),
'content': SgmlLinkExtractor(allow = '/bbscon,board,\w+,file,M\.