javascript - How to make many Node.js requests (using request module) -
main purpose: i'm trying scrape data off of around 10,000 different pages using node.js.
problem: scrapes through first 500~1000 fast , turns turtle (its variable slows down) beyond that, , seems stuck forever.
i'm using request
module in node.js make requests use cheerio
start scraping,
this code replicates problem:
var request = require('request'); var requestscalledcounter = 0; var requestscompletedcounter = 0; var max_requests = 500; var start = function () { while (requestscalledcounter < max_requests) { request("http://www.google.com", function (error, response, html) { requestscompletedcounter++; }); requestscalledcounter++; } }; start();
output:
test 1:
447/500
89.4%timed out: no requests completed after 5 seconds
447 completed
test 2:
427/500
85.39999999999999%timed out: no requests completed after 5 seconds
427
extra details might help:
i have array of url's going scrape, looping through them making request every url in array. has 10,000 url's.
i agree @cviejo in comments. should use existing project. increase understanding, here implementation have 10 requests outstanding @ time.
var request = require('request'); var requestscalledcounter = 0; var requestscompletedcounter = 0; var pending = 0; var max_pending = 10; var max_requests = 500; var doreq = function () { request("http://www.google.com", function (error, response, html) { requestscompletedcounter++; pending--; }); pending++; requestscalledcounter++; } var start = function () { while (pending < max_pending && requestscalledcounter < max_requests) { doreq(); } if (requestscalledcounter < max_requests) { settimeout(start, 1); } }; start();
Comments
Post a Comment