下面是我的网站抓取代码;它点击一个表单,然后重定向到一个页面。从该页面我需要提取img src url,并将其导出为csv的文本形式。我使用下面的代码从td标记中提取内容。当我运行相同的代码时,它不工作,因为td标记没有内容,只有img标记。任何帮助都将不胜感激。我对网络抓取还是个新手。提前谢谢。
browser.find_element_by_css_selector(".textinputvalue='APPLY'").click()
#select_finder = "//tr[contains(text(), 'NB')]//a"
            select_finder = "//td[text()='NB')]/../td[2]/a"
            browser.find_element_by_css_selector(".content a").click()
            assert "Application Details" in browser.title
            file_data = []
            try:
                assert "Application Details" in browser.title
                enlargement = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[3]/td[2]/b").text
                enlargement_answer1 = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[4]/td[2]").text
                enlargement_answer2 = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[4]/td[3]").text
                enlargement_text = enlargement + enlargement_answer1 + enlargement_answer2
                considerations = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[2]/b").text
                considerations_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[3]").text
                considerations_text = considerations + considerations_answer
                alteration = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[6]/b").text
                alteration_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[4]/td[7]").text
                alteration_text = alteration + alteration_answer
                units = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[5]/td[3]/b").text
                units_answer = browser.find_element_by_xpath("/html/body/center/table[15]/tbody/tr[5]/td[4]").text
                units_text = units + units_answer
                occupancy = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[6]/td[3]/b").text
                occupancy_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[6]/td[4]").text
                occupancy_text = occupancy + occupancy_answer
                coo = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[7]/td[3]/b").text
                coo_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[7]/td[4]").text
                coo_text = coo + coo_answer
                floors = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[8]/td[3]/b").text
                floors_answer = browser.find_element_by_xpath("/html/body/center/table[16]/tbody/tr[8]/td[4]").text
                floors_text = floors + floors_answer
            except (NoSuchElementException, AssertionError) as e:
                floors_text.append("No Zoning Characteristics Present")
                coo_text.append("n/a")
                occupancy_text.append("n/a")
                units_text.append("n/a")
                alteration_text.append("n/a")
                considerations_text.append("n/a")
                enlargement_text.append("n/a")
            with open('DOB.csv', 'a') as f:
                wr = csv.writer(f, dialect='excel')
                wr.writerow((block_number, lot_number, houseno, street, condo_text,
                             vacant_text, city_owned_text, file_data, floors_text, coo_text, occupancy_text, units_text, alteration_text,
                              considerations_text, enlargement_text ))
            browser.close()发布于 2018-06-14 22:53:12
正如你所说的,你是web抓取的新手,我鼓励你读一读:http://selenium-python.readthedocs.io/locating-elements.html,你正在以不推荐的方式独占使用XPath。
文档中写道:“您可以使用XPath来定位元素的绝对值(不建议),也可以相对于具有id或名称属性的元素进行定位。”尝试使用其他定位器来获取图像。
例如:driver.find_element_by_css_selector("img[src='images/box_check.gif']")
https://stackoverflow.com/questions/50845744
复制相似问题