Need Help for this Problem

AgenturZenit

Client
Регистрация
04.06.2012
Сообщения
30
Благодарностей
0
Баллы
6
I d like to scarpe/Spider a Website take the Content from this site and every time new Content was pulished on the Website the scraper scrape this Content and when Content was deleted on this Website i Need the Information from the scraper that it was deleted is this possible with zenno?

example:
the scraper/Spider scrape the Website www.example.com/profile/natalya
www.example.com/profile/Andy
www.example.com/profile/Mike
www.example.com/profile/............
and so on...

let´s say 14 days later ne Content published on thie Website like
www.example.com/profile/Arnold

the the scraper/Spider have to take only the Content of www.example/profile/Arnold
not the Content of the earlier scraped sites
and when a Profil is deleted i Need a Information about this
can i make this with Zennoposter and wen the answer is yes how can i make it can you help me

thanks for helping
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 712
Баллы
113
yes usre it's possible but it's not 1 action task and there's not simple guide for you from me.
You just need to organize data. Have archive list 1 which will be filled by scraper (project that scrape new URLS and add them to list). You can use this snippet for blacklist logic http://www.zennoexperts.com/downloads/blacklist.xmlz
Also you need checker which periodically check each link from list 1 and if it's dead remove links from list.
 
  • Спасибо
Реакции: AgenturZenit

AgenturZenit

Client
Регистрация
04.06.2012
Сообщения
30
Благодарностей
0
Баллы
6
thanks for help rostonix :-)
how i scrape with zennoposter ?

thanks
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 712
Баллы
113
with regular expressions. you can use regexp builder to create one if you cant write regexpressions from scratch.
Logic:

You use Text processing - Regex action
What to parse: {-Page.Dom-}
 

AgenturZenit

Client
Регистрация
04.06.2012
Сообщения
30
Благодарностей
0
Баллы
6
perfect thanks for help
 

AgenturZenit

Client
Регистрация
04.06.2012
Сообщения
30
Благодарностей
0
Баллы
6
Hi i have tested only the snippet "blacklis"t but i think i make a mistake then every time i run the code i get a error (Action not execuded)
i open the List1,List2,List3 an put my files to it this is the only Thing i have to do right?
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 712
Баллы
113
List<string> blacklist = project.Lists["List1"].ToList();
List<string> newdata = project.Lists["List2"].ToList();
var good = project.Lists["List3"];
lock(SyncObjects.ListSyncer)
{
List<string> exclude = new List<string>();
exclude = newdata.Except(blacklist).ToList();
foreach(string data in exclude)
{good.Add(data);}
}



Try this
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)