Help with some regex

shabbysquire

Client
Регистрация
25.11.2012
Сообщения
544
Благодарностей
26
Баллы
28
Having some issues in crafting a regex, and hope for some advice.

I'm scraping domains with the lookahead and lookbehind regex.

Here is a sample domains to capture:

Код:
http://domain.com/
https://domain.com/

http://www.domain.com/
https://www.domain.com/
And my regex:

Код:
(?<=https?://|https?://www.).*?(?="|</a>|/)
I only want to capture the main domain without the www., like: domain.com. My regex captures both www and non-www. I know that the regex engine is always eager to match anything, but would appreciate some help.

Cheers!
 

LexxWork

Client
Регистрация
31.10.2013
Сообщения
1 190
Благодарностей
791
Баллы
113
just remove it after regex )
 

shabbysquire

Client
Регистрация
25.11.2012
Сообщения
544
Благодарностей
26
Баллы
28
I have done that, but it's just a challenge for me to improve my regex skills. ;-)

The solution is to ignore the: www. So I need to find out what it is!
 

shabbysquire

Client
Регистрация
25.11.2012
Сообщения
544
Благодарностей
26
Баллы
28
Done:

Код:
(?<=https?://(?:www\.)?)(?!www\.).*?(?=['/"]|</a>)
 
  • Спасибо
Реакции: Ribas

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)