我在一个项目中工作,我需要在互联网上可用的图像及其网址的数据集。为此,我必须下载几千个no。图像的数量。因此,我计划从https://www.pexels.com/、https://pixabay.com/等图片托管网站以及Flickr等少数类似网站下载图片。
"""
dumpimages.py
Downloads all the images on the supplied URL, and saves them to the
specified output file ("/test/" by default)
Usage:
python dumpimages.py http://example.com/ [output]
"""
from bs4 import BeautifulSoup as bs
from urllib.request import (
urlopen, urlparse, urlunparse, urlretrieve)
import os
import sys
def main(url, out_folder="/test/"):
"""Downloads all the images at 'url' to /test/"""
soup = bs(urlopen(url))
parsed = list(urlparse(url))
for image in soup.findAll("img"):
print("Image: %(src)s" % image)
filename = image["src"]
# filename = filename.replace("/","|")
filename = image["src"].split("/")[-1]
parsed[2] = image["src"]
outpath = os.path.join(out_folder, filename)
if image["src"].lower().startswith("http"):
urlretrieve(image["src"], outpath)
else:
urlretrieve(urlunparse(parsed), outpath)
def _usage():
print("usage: python imgcrawl.py http://example.com [outpath]")
if __name__ == "__main__":
url = sys.argv[-1]
out_folder = "/test/"
if not url.lower().startswith("http"):
out_folder = sys.argv[-1]
url = sys.argv[-2]
if not url.lower().startswith("http"):
_usage()
sys.exit(-1)
main(url, out_folder)
对于这一点,我写了一个简单的python脚本,如上图所示,它获取了网页上所有可用的图像作为输入,但我想让它以这样的方式,如果我给主页,那么它可以下载该网站上的所有可用图像。如果有任何其他的替代方法来获取带有URL数据的图像,那么我将非常感谢您的帮助。
发布于 2018-12-05 10:14:37
非常高兴地说,我在Python中做了完全相同的事情。请查看我在github https://github.com/digitaldreams/image-crawler-python中的存储库
https://stackoverflow.com/questions/49974970
复制相似问题