javascript - How to make many Node.js requests (using request module) -


main purpose: i'm trying scrape data off of around 10,000 different pages using node.js.

problem: scrapes through first 500~1000 fast , turns turtle (its variable slows down) beyond that, , seems stuck forever.

i'm using request module in node.js make requests use cheerio start scraping,

this code replicates problem:

var request = require('request');  var requestscalledcounter = 0; var requestscompletedcounter = 0; var max_requests = 500;  var start = function () {     while (requestscalledcounter < max_requests) {         request("http://www.google.com", function (error, response, html) {             requestscompletedcounter++;         });         requestscalledcounter++;     } };  start(); 

output:

test 1:

447/500
89.4%

timed out: no requests completed after 5 seconds
447 completed

test 2:

427/500
85.39999999999999%

timed out: no requests completed after 5 seconds
427

extra details might help:

i have array of url's going scrape, looping through them making request every url in array. has 10,000 url's.

i agree @cviejo in comments. should use existing project. increase understanding, here implementation have 10 requests outstanding @ time.

var request = require('request');  var requestscalledcounter = 0; var requestscompletedcounter = 0; var pending = 0; var max_pending = 10; var max_requests = 500;  var doreq = function () {     request("http://www.google.com", function (error, response, html) {         requestscompletedcounter++;         pending--;     });     pending++;     requestscalledcounter++; }  var start = function () {     while (pending < max_pending && requestscalledcounter < max_requests) {         doreq();     }     if (requestscalledcounter < max_requests) {         settimeout(start, 1);     } };  start(); 

Comments

Popular posts from this blog

java - Suppress Jboss version details from HTTP error response -

gridview - Yii2 DataPorivider $totalSum for a column -

Sass watch command compiles .scss files before full sftp upload -