HTML Cleaner - Keep specifics links

Perfecto

Client
Регистрация
06.08.2013
Сообщения
94
Благодарностей
5
Баллы
8
Hi,

I made this regex to clean the HTML code and keep only the H, p, strong and b tags :

Код:
<div.*?>|<span.*?>|<figcaption.*?>|<img.*?>|<hl.*?>|</hl.*?>|</span.*?>|<picture.*?>|</picture.*?>|</div.*?>|<svg.*?>|<path.*?>|<figure.*?>|</figcaption.*?>|</figure.*?>|class.*?(?=>)|</path.*?>|</svg.*?>|<source.*?>|</source.*?>|(?<!\()<a.*?>|</a.?>(?!\))|<aside.*?>|</aside.*?>|rel=".*?"|target=".*?"|<header[\w\W]*header>|Share.*?(?=<)|Previous\ article|Next\ article
It works but I would like to go further by deleting the links without deleting those whose domain name would be in a whitelist.
How to do this ? Thanks for your help.
 

EtaLasquera

Client
Регистрация
02.01.2017
Сообщения
526
Благодарностей
112
Баллы
43
I wish you must create a list with "whitelist" and delete itens outside the whitelist.
 

Perfecto

Client
Регистрация
06.08.2013
Сообщения
94
Благодарностей
5
Баллы
8
I also need to remove the internal links while keeping the anchors.
C#:
<a\ href="/.*?</a>
This regex extract all internal link but how to keep the anchor and remove the link ?
 

Кто просматривает тему: (Всего: 0, Пользователи: 0, Гости: 0)