我正试图从一个用木偶技师的网站上搜集数据。当我每次请求数据时,它都会给我第一页的数据,即使我传递的是任何其他页面的url。在google上,它给出了与搜索url相关的正确页面数据,但是当我从API或postman请求时,它总是给我第一页数据。下面是我的剧本..。
async function main() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setViewport({ width: 1200, height: 720 })
await page.goto('https://member.daraz.pk/user/login', { waitUntil: 'networkidle0' }); // wait until page load
await page.type('input[type="text"]', 'username', { delay: 10 });
await page.type('input[type="password"]', 'pass', { delay: 10 });
// click and wait for navigation
await page.click('.next-btn-large');
await page.waitFor(8000);
const page1 = await browser.newPage();
await page1.setViewport({ width: 1200, height: 720 })
await page.waitFor(1000);
for (let i = 1; i < 10; i++) {
await page.goto(`https://www.daraz.pk/air-conditioners/gree/?page=${i}`, { waitUntil: 'networkidle0' });
// always return first page data
}
}
main();```
发布于 2020-02-05 23:17:10
我在注释中建议的脚本是加载图像src值,并要求这些图像在页面加载之前是可见的。因此,如果您没有使正确的选项卡可见,它可能不会加载它们。这是一种按需加载的图像,内置在页面中。最好看看页面中没有以这种方式加载的其他方面。我修改了我的脚本来做到这一点。
这是一个适合我的剧本。我不知道页面中需要哪些数据,但这将获得页面中每个产品的sku-simple
值和title
。为了简洁起见,我只将每页中的前10种产品输出到控制台,然后将其拨回只遍历3页。很明显,你可以随意调整。我还从我的脚本中删除了用户名/pwd,因为我看到它不再公开了。你可以自己填。
const puppeteer = require('puppeteer');
async function main() {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.setViewport({ width: 1200, height: 720 })
await page.goto('https://member.daraz.pk/user/login', { waitUntil: 'networkidle0' }); // wait until page load
await page.type('input[type="text"]', 'xxx', { delay: 10 });
await page.type('input[type="password"]', 'yyy', { delay: 10 });
// click and wait for navigation
await page.click('.next-btn-large');
await page.waitFor(8000);
const page1 = await browser.newPage();
await page1.setViewport({ width: 1200, height: 720 })
await page.waitFor(1000);
// page.on('console', msg => console.log('PAGE LOG:', msg.text()));
for (let i = 1; i <= 3; i++) {
await page.goto(`https://www.daraz.pk/air-conditioners/gree/?page=${i}`, { waitUntil: 'networkidle0' });
let srcs = await page.$$eval(".c2prKC", elements => {
return elements.map(el => {
let skuSimple = el.getAttribute("data-sku-simple");
let link = el.querySelector(".c16H9d a");
let title = "<unknown>";
if (link) {
title = link.getAttribute("title");
}
return {skuSimple, title};
});
});
console.log(`Data for page ${i}:`);
console.log(srcs.slice(0,10));
}
//await browser.close();
}
main();
我在控制台中看到这样的输出,因此它显然是在获取页面并从这些页面中的DOM中检索数据:
Data for page 1:
[
{
skuSimple: 'GR678HL0KV5HWNAFAMZ-4744951',
title: 'Gree Inverter AC - GS-18CITH12G - 1.5 ton - Inverter Air Conditioner - Cozy Series - Heat N Cool - Grey'
},
{
skuSimple: 'GR678HL09YUCKNAFAMZ-3940302',
title: 'Gree GS-12FITH1W - Fairy Inverter Air Conditioner Series - White'
},
{
skuSimple: 'GR678HL0RTUHWNAFAMZ-3940305',
title: 'Gree GS-18FITH1W - Fairy Inverter Air Conditioner Series - White'
},
{
skuSimple: 'GR678HL1E0WZSNAFAMZ-1741958',
title: 'Gree Split Air Conditioner - GS-12LM4 - 1 Ton - White'
},
{
skuSimple: '2779851_PK-1252862621',
title: 'Gree 18CITHI 12G- DC Inverter AC - 1.5 Ton'
},
{
skuSimple: 'GR678HLEOKNJNAFAMZ-668566',
title: 'Gree Gree GS-12LM -1 Ton Air Conditioner - White'
},
{
skuSimple: '114820460_PK-1266640670',
title: 'Gree Windows AC 0.75 Ton with Remote Control 60% Electricity Saving'
},
{
skuSimple: '2864384_PK-1246026961',
title: 'Gree Inverter AC - GS-12CITH12G - 1.0ton - Inverter Air Conditioner - Cozy Series - Heat N Cool - Grey'
},
{
skuSimple: '105610333_PK-1253012621',
title: 'Gree 1.0 Ton Dc Inverter AC Heat & Cool R-410A Air Conditioner - 12cith12G - Grey'
},
{
skuSimple: '105616318_PK-1253002672',
title: 'Gree 1.5 Ton Dc Inverter AC Heat & Cool R-410A Air Conditioner - 18cith12G - Grey'
}
]
Data for page 2:
[
{
skuSimple: '109636918_PK-1260070281',
title: 'New Gree DC Inverter Ac 1(ton) 12CIT'
},
{
skuSimple: '114536248_PK-1266322653',
title: 'Gree 1.0 Ton Heat & Cool DC Inverter Air conditioner 12CITH'
},
{
skuSimple: '109830097_PK-1260278793',
title: 'AC Dawlance Inspire Plus Inverter 30 1.5 Ton Split Saving 26000 Yearly'
},
{
skuSimple: '121648880_PK-1277580612',
title: 'Gs-24Lm4L - 2 Ton Ac - White - Brand Warranty'
},
{
skuSimple: '106364064_PK-1254400160',
title: 'Gree Floor Standing GF-48FW - Floor Standing Low Voltage Startup Series - White'
},
{
skuSimple: '109324039_PK-1259442545',
title: 'Gree G10 Inverter 1.5 Ton (18000 BTU) GS-18CITH2/2G Split Air Conditioner'
},
{
skuSimple: '122056481_PK-1278142392',
title: 'AC Gree 12FITH1C 1 Ton DC Inverter Split AC 50% to 70% Energy Saving'
},
{
skuSimple: '115570453_PK-1267506144',
title: 'AC Gree GS-12CITH13M Inverter 1 Ton (Wifi) Split 60% to 70% Energy Saving'
},
{
skuSimple: 'GR678HL0ZWE2CNAFAMZ-4776611',
title: 'Gree 1.5 Ton Dc Inverter Heat & Cool R-410A Air Conditioner - 18cith11B - Black'
},
{
skuSimple: '110096660_PK-1260802813',
title: 'GREE 1.0 TON SPLIT COOL ONLY AIR CONDITIONER 12LM4'
}
]
Data for page 3:
[
{
skuSimple: 'GR678HL017DY0NAFAMZ-4102700',
title: 'Gree 1.5 Ton Dc Inverter Heat & Cool R-410A Air Conditioner - 18cith11S - Silver'
},
{
skuSimple: '115554341_PK-1267490372',
title: 'Gree GS-18CITH13M Inverter 1.5 Ton (Wifi) Split Up to 60% Energy Saving'
},
{
skuSimple: '109428468_PK-1259596998',
title: 'Gree Inverter Air conditioner 2 ton'
},
{
skuSimple: '124818788_PK-1282694870',
title: 'Gree Inverter Air Conditioner - GS-24CITH11W - Cozy Inverter Series - 02ton - White'
},
{
skuSimple: '3407444_PK-1247135008',
title: 'Gree 2 Ton Dc Inverter Heat & Cool R-410A Air Conditioner - 24cith11S - Silver'
},
{
skuSimple: '109826799_PK-1260322442',
title: 'Gree GS-18CITH13M Inverter 1.5 Ton (Wifi) Split Up to 60% Energy Saving'
},
{
skuSimple: '130883483_PK-1290780443',
title: 'Gree - Inverter Split Air Conditioner - 1.5 Ton'
},
{
skuSimple: '107714050_PK-1256398549',
title: 'Gree Inverter Air conditioner 1.5 ton'
},
{
skuSimple: 'GR678HL0Q02DENAFAMZ-5098883',
title: 'GS-18LM4 - Gree Air Conditioner - 1.5 Ton - White'
},
{
skuSimple: 'GR678HL1IIQ8YNAFAMZ-5098768',
title: 'Gree Gree - GS - 12CITH12G - 1.0 ton - Inverter Air Conditioner - Grey'
}
]
发布于 2020-02-04 05:15:52
当我使用刮刀器时,我喜欢do块来处理增量/递减。这确保开发人员有意监视和控制增量变量。首先,for循环没有本地化的i
。
let PAGES = 1;
do {
await page.goto(url, [options]);
// do whatever you want with scraped page.
PAGES++;
}while (PAGES < 10);
https://stackoverflow.com/questions/60056913
复制相似问题