Content parser

  • Автор темы Автор темы Perfecto
  • Дата начала Дата начала

Perfecto

Client
Регистрация
06.08.2013
Сообщения
108
Реакции
9
Баллы
18
Hi,

What is the best way to parse a content inside html balise (p, h1, h2...)
It must be compatible with different website.
 
  • Спасибо
Реакции: Pierre Paul Jacques
Thanks for your answer. I don't understand how to use this with Zenno. The parsing data module in Zenno is not enough ?
 
Thanks for your answer. I don't understand how to use this with Zenno. The parsing data module in Zenno is not enough ?

You have asked about the best way to parse HTML, so I think HTML agility pack is the best way.
No, it is not the default option for zennoposter...
But sure You can use parsing module what is zenno default option as well.
 
  • Спасибо
Реакции: Pierre Paul Jacques
I just want to scrap content like this
But it must be compatible with many different sites. The method described in this video still the best or are there new features since 2012?
 
Not sure what You mean compatible with different sites as You scrape exact content from exact site usually.
But using Regex is fine for scrapping.
Take a look at Parse brick in zenno...
Right click on content You want to parse and ''Parse content''
WC9evh7.png
 
I try to explain my need more clearly. Sorry English is not my native language. I need to scrape the content of some HTML tags and keep the article with the same structure as the original. The problem with the "parse data" module is that I will have all the h2 tags together all the p tags together... I want to have a result of type :
h2
p
h2
h3
p
h3
p
h2
 
If we talk about scrapping data then how to use regex is what You have to learn. Wiki - Regex
Parse content is just helper action. So go with Regex and learn it.
About logic for Your scrapping, I can not answer much, because I do not understand what is Your goals...
 

Кто просматривает тему: (Всего: 0, Пользователи: 0, Гости: 0)