Basic Scraper example

  • Автор темы Автор темы Perropoly
  • Дата начала Дата начала

Perropoly

Client
Регистрация
21.01.2011
Сообщения
4
Реакции
0
Баллы
0
For now watched videos tutorials about registration in web sites.

Could be possible to have a scrape example template (not video), for example, from the first 2 google search results for the word: cat

Something like

- search keyword cat in google
- scrape title and urls from first page
- store in file
- traverse to second page,
- scrape and store

From this template I can expand and add more logic.

Thanks
Perrropoly
 
thx!
I've been playing with this expression for scraping google ->

(?<=\" href\=\").*?(?=\<\/A\>\<\/H3\>\<BUTTON class\=vspib type\=submit\>\<\/BUTTON\>\<\/SPAN\>)
 
I'm cloning / adapting your example to scrape other sites. When we get the dom page of the text (with the web browser), how we open or transfer the page to the regexp builder in order to create the regular expression. I'm doing it with expresso, it must be a way to do it inside Zennoposter.

I created the template in the action recording sector, moved it to the template editor to start building the regexp and extract data.
I found the action recording section more for registering and filling forms, while for scrapping you must create the step branches in the editor. Is there a way to select and scrape from the action editor and then make modifications in the template editor?.
 
I'm cloning / adapting your example to scrape other sites. When we get the dom page of the text (with the web browser), how we open or transfer the page to the regexp builder in order to create the regular expression. I'm doing it with expresso, it must be a way to do it inside Zennoposter.

Wrote it in the other thread:
If you are recording you have to click the "Page text" Icon from the menu. Then the source opens within zenno-editor and you can click the 4th button from top "Copy to macros builder" to have the DOM text in the regexp builder.
 

Вложения

  • example.png
    example.png
    190,9 KB · Просмотры: 1 258
Последнее редактирование:
It's possible
Results save in folder Results file test.txt

is this google.xml still working for you guys? i'm quite sure it was a few weeks ago but now it isn't! weird. it won't parse any results, as the execution result of the Get is blank, which means no save results happen. It's beyond me what could be wrong.
 
simply fix the regular expression
 

Кто просматривает тему: (Всего: 0, Пользователи: 0, Гости: 0)