i seemed to have run into a snag. using nodejs and http module to scrape data from amazon's free kindle ebooks site.
Code
var path = '/Best-Sellers-Kindle-Store-Teen-Young-Adult-Horror-eBooks/zgbs/digital-text/6064559011?tf=1#2';
var options = {
host: 'www.amazon.com',
port: 80,
path: path
};
http.get(options, onHttpGetSearchPage.bind(null, page_number, path));
supposed to have #1, #2, #3, #4, #5, but it seems to be ignored. i looked up the documentation and didn't see anything.
Quote
Options:
host: A domain name or IP address of the server to issue the request to. Defaults to 'localhost'.
hostname: To support url.parse() hostname is preferred over host
port: Port of remote server. Defaults to 80.
localAddress: Local interface to bind for network connections.
socketPath: Unix Domain Socket (use one of host:port or socketPath)
method: A string specifying the HTTP request method. Defaults to 'GET'.
path: Request path. Defaults to '/'. Should include query string if any. E.G. '/index.html?page=12'. An exception is thrown when the request path contains illegal characters. Currently, only spaces are rejected but that may change in the future.
headers: An object containing request headers.
auth: Basic authentication i.e. 'user:password' to compute an Authorization header.
agent: Controls Agent behavior. When an Agent is used request will default to Connection: keep-alive. Possible values:
undefined (default): use global Agent for this host and port.
Agent object: explicitly use the passed in Agent.
false: opts out of connection pooling with an Agent, defaults request to Connection: close.
keepAlive: {Boolean} Keep sockets around in a pool to be used by other requests in the future. Default = false
keepAliveMsecs: {Integer} When using HTTP KeepAlive, how often to send TCP KeepAlive packets over sockets being kept alive. Default = 1000. Only relevant if keepAlive is set to true.
The optional callback parameter will be added as a one time listener for the 'response' event.
This post was edited by carteblanche on Aug 8 2015 06:27pm