GET request increases the number of errors

zenrast · 14.04.2014

Hello,

i have a script that does this:
1) i have a list A of 200 URLs
2) i have a list of B 50 domains

Then for each URL in list A(i open each url), i check if there is any of the domains on list B.
In other words, checking all URL for all domains.

Now to the problem:
If i check this inside zenno with the "browser" mode, i get 100% successful finished threads.
But if i use the GET request instead of the "browser" i got about 20% failure (unfinished threads).

I changed to GET because it should work faster. And indeed i get an 30% increase in speed.
The other thing that i saw is that the CPU is at MAX when i use GET.

The rest of the script is the same. I just changed the "go to browser" to "get" and in both cases i save the results into a variable to process.

Any idea why i get failures with GET?

rostonix · 14.04.2014

What error go you get?

zenrast · 14.04.2014

There are no errors displayed while i run the script (not in debug mode).
Its just that there is no result in the results table/file for those "threads" that fail.

zenrast · 03.05.2014

Hi,

i think i found the error. It is when i want to request a file that is not a HTML. Fore example i request an URL that is a PDF file (like: cnn.com/documents/cnn.pdf).
In this case the GET method maxes the CPU and the script starts to produce failures. Suppose the "browser" mode somehow "handled" this, but for now i need to
exclude PDF like URL from my list of URLs to check.

rostonix · 03.05.2014

Retrieve only headers and make check if this pdf or not.

rostonix · 03.05.2014

http://i.gyazo.com/a1b3f1a2fbd62b0893f04b3a7e2849e0_1.png
Among other info you'll get this header:
Content-Type: application/pdf

zenrast · 03.05.2014

Nice

Can i also check if the file is HTML? ...because maybe there are more those non html types...like .doc etc..
BTW: Is retrieving headers the same amount of "job" (like 5 sec per request) as when retrieving all of the data?

rostonix · 03.05.2014

Can i also check if the file is HTML?

Sure.

Is retrieving headers the same amount of "job" (like 5 sec per request) as when retrieving all of the data?

How do you think?)

zenrast · 03.05.2014

For example GET 200 URLs is 200*5sec = 1.000 sec.
GET HEADERS + GET ALL = 200 + 200 = 400 * 5 sec = 2000 sec.

Is that correct?

rostonix · 03.05.2014

You GET header, check if it's html, if it is you GET body

zenrast · 03.05.2014

Yes, i get it. Unfortunately in 99% cases there will be HTML, so i will need to make a 2 GET requests most of the time (so doubling the GET times).

Поиск

GET request increases the number of errors

zenrast

Client

rostonix

Мистер

zenrast

Client

zenrast

Client

rostonix

Мистер

rostonix

Мистер

zenrast

Client

rostonix

Мистер

zenrast

Client

rostonix

Мистер

zenrast

Client

Кто просматривает тему: (Всего: 0, Пользователи: 0, Гости: 0)