Content parser

Perfecto

Client
Регистрация
06.08.2013
Сообщения
94
Благодарностей
5
Баллы
8
Hi,

What is the best way to parse a content inside html balise (p, h1, h2...)
It must be compatible with different website.
 

lokiys

Moderator
Регистрация
01.02.2012
Сообщения
4 812
Благодарностей
1 187
Баллы
113

Perfecto

Client
Регистрация
06.08.2013
Сообщения
94
Благодарностей
5
Баллы
8
Thanks for your answer. I don't understand how to use this with Zenno. The parsing data module in Zenno is not enough ?
 

lokiys

Moderator
Регистрация
01.02.2012
Сообщения
4 812
Благодарностей
1 187
Баллы
113
Thanks for your answer. I don't understand how to use this with Zenno. The parsing data module in Zenno is not enough ?
You have asked about the best way to parse HTML, so I think HTML agility pack is the best way.
No, it is not the default option for zennoposter...
But sure You can use parsing module what is zenno default option as well.
 
  • Спасибо
Реакции: Pierre Paul Jacques

Perfecto

Client
Регистрация
06.08.2013
Сообщения
94
Благодарностей
5
Баллы
8
I just want to scrap content like this
But it must be compatible with many different sites. The method described in this video still the best or are there new features since 2012?
 

lokiys

Moderator
Регистрация
01.02.2012
Сообщения
4 812
Благодарностей
1 187
Баллы
113
Not sure what You mean compatible with different sites as You scrape exact content from exact site usually.
But using Regex is fine for scrapping.
Take a look at Parse brick in zenno...
Right click on content You want to parse and ''Parse content''
 

Perfecto

Client
Регистрация
06.08.2013
Сообщения
94
Благодарностей
5
Баллы
8
I try to explain my need more clearly. Sorry English is not my native language. I need to scrape the content of some HTML tags and keep the article with the same structure as the original. The problem with the "parse data" module is that I will have all the h2 tags together all the p tags together... I want to have a result of type :
h2
p
h2
h3
p
h3
p
h2
 

lokiys

Moderator
Регистрация
01.02.2012
Сообщения
4 812
Благодарностей
1 187
Баллы
113
If we talk about scrapping data then how to use regex is what You have to learn. Wiki - Regex
Parse content is just helper action. So go with Regex and learn it.
About logic for Your scrapping, I can not answer much, because I do not understand what is Your goals...
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)