首页
学习
活动
专区
圈层
工具
发布
首页
学习
活动
专区
圈层
工具
MCP广场
社区首页 >问答首页 >Web抓取脚本不工作

Web抓取脚本不工作
EN

Stack Overflow用户
提问于 2019-03-20 02:44:09
回答 1查看 33关注 0票数 0

我一直在尝试构建一个Web抓取脚本来监控网站html中的任何变化,当它发现网站的html发生变化时,它不会给我发电子邮件和短信。我遇到了一个问题,脚本看不到任何更改,它只是在60 seconds.There后重新启动,根本没有错误。idk如果我在代码中遗漏了什么,那就是不让它搜索,然后继续并重新启动。

代码如下:

代码语言:javascript
复制
import time
print('>>> Time Imported')
time.sleep(1)
from bs4 import BeautifulSoup as soup
print('>>> BeautifulSoup Imported')
time.sleep(1)
import requests
print('>>> Requests Imported')
time.sleep(1)
import ssl
print('>>> SSL Imported')
time.sleep(1)
import smtplib
print('>>> smtplib Imported')
time.sleep(1)
from lxml import html
print('>>> LMXL and HTML Imported')
time.sleep(1)
from twilio.rest import Client
print('Twilio Imported')
time.sleep(1)
# End Imports

#start Script
while True:
    url = 'http://A****.com'
    print('>>> We have connected to ' +url)
    time.sleep(1)

    headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
    print('>>> Headers Initiating')
    time.sleep(1)

    page_response = requests.get(url, timeout=5)
    print('>>> We got a response from ' +url)
    time.sleep(1)

    page_content = soup(page_response.content, "html.parser") # Takes 1 Min 48 Seconds to run
    print('>>> Content Imported')
    time.sleep(2)

    print('>>> To prove i have connected, here is ' +url+ ' headers')
    time.sleep(2)
    print(' ')
    print(page_content.title)
    #tree = html.fromstring(page_response.content)
    #price = tree.xpath('//span[@class="bid-price-val current-bid"]/text()')
    #print(price)
    time.sleep(2)
    print(' ')
    time.sleep(1)
    print('>>> Initiating WebMonitor, If a change is found. That will be the next line')
    time.sleep(7)

    if str(soup).find('["330000"]') == -1:
        time.sleep(60)                       #The script restarts here 
                                             #never sees the change
                                             #Even tho there was one
        continue
    else:
        print('>>> Theres been a change in '+url)
        from twilio.rest import TwilioRestClient
        accountSID = 'A*******'
        authToken = 'a********'
        twilioCli = TwilioRestClient(accountSID, authToken)
        myTwilioNumber = '1******'
        myCellPhone = '7*****'
        message = client.messages.create(
            body = "There has been a change at "+url,
            from_= "+14955551234",
            to = "7862199047",
            )

        print(message.sid)

        msg = 'Subject: This is the script talking, Check '+url
        fromaddr = 'r****'
        toaddrs = ['m****','2','3']

        server = smtplib.SMTP('smtp.gmail.com', 587)
        server.starttls()
        server.login("r****", 'r****')

        print('From: ' + fromaddr)
        print('To: ' + str(toaddrs))
        print('Message: ' + msg)
        server.sendmail(fromaddr, toaddrs, msg)
        server.quit()
        break
    #def monitor():
EN

回答 1

Stack Overflow用户

发布于 2019-03-20 02:56:06

看起来您的问题属于这一行:

代码语言:javascript
复制
 if str(soup).find('["330000"]') == -1:

当您说str(soup)时,您正在尝试将Beautiful Soup类转换为字符串。这并不能很好地工作;它只会创建一个类似于"<class 'bs4.BeautifulSoup'>"的字符串。在该字符串上使用soup的find()方法将永远不会找到匹配项,因此无论是否有任何更改,结果都将始终为-1。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55248027

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档