Scraping via GET request & captcha issue.

shabbysquire

Client
Регистрация
25.11.2012
Сообщения
544
Благодарностей
26
Баллы
28
Hi.

I'm using Get request to scrape a site, but sometimes come across a captcha message. Obviously I can't input captcha at this stage.

Is it possible to change from GET to normal ZP browser > fill in captcha > then go back to scraping via GET?

Thanks.
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 715
Баллы
113
Only with logic actions. Check is something in page source tells you about blocking results with captcha. If found - open page in browser.
 

shabbysquire

Client
Регистрация
25.11.2012
Сообщения
544
Благодарностей
26
Баллы
28
Ok thanks!

Just a few more and I'll get off your case.

Re HTTP Get, in user agent can I user various UA in spintax format? Or just load from file?

Also when working with my own proxies in HTTP Get, does it really matter which setting I choose?
http get proxy.JPG
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
683
Баллы
113
You would have to spin the user agents before that action and then use the resulting variable in the UA box. And yes, you would want to setup the proxy first and then use Project proxy in the GET box if you want to hide from the site where you are coming from.

On your first question, it used to be possible to get the captcha image from the page and save it to your disk and then Go To url file:///C:\images\downloaded.jpg and have captcha service solve that. Unfortunately Firefox has put in a security tap on doing that and you can't serve up an image that has been downloaded via HTTP. It used to be easy to get the image because each site has its own captcha key and the captcha image code is served on the page so that the captcha site knows what image to serve.

Now that FF has done this it is best to serve the page where the captcha is to the browser and then use the resulting recognition result in your POST result.
 
  • Спасибо
Реакции: shabbysquire

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)