脚本应该找到带有文章的子页面的地址,并从它们中收集必要的数据。数据应该转到数据库。数据应该通过处理HTML文档来收集。我还没有完成单词计数器,但目前我有两个循环(2.1,2.2),这应该进入到每一篇文章中,并从它们和作者的名字中获取内容。UserWarning: "link/" looks like a URL. Beautiful Soup is not an HTTP client.
You should probably
SELECT a.art_id AS art_id, a.ecs_id AS ecs_id, a.parent_id AS parent_id, a.primarytext AS primarytextutprisTodate, pp.price AS innpris, pp.fromdate AS innprisFromdate, pp.todate AS innprisTodate
LEFT JOIN ecs_purchaseprice pp ON a.art_id=pp.a