Scraping only partial urls

shabbysquire

Client
Регистрация
25.11.2012
Сообщения
544
Благодарностей
26
Баллы
28
I'm having an issue with scraping some websites.

I've come across site where the ahref on the page only gives a partial url. For example, a normal url might be:

Код:
[URL]http://www.ebay.co.uk/itm/Viking-Traditional-Ladies-Comfort-7-Speed-town-bike-black-/271123065658?pt=UK_Bikes_GL&var=&hash=item3f2031ab3a[/URL]

But when scraping a page, I only can get a partial url:

/itm/Viking-Traditional-Ladies-Comfort-7-Speed-town-bike-black-/271123065658?pt=UK_Bikes_GL&var=&hash=item3f2031ab3a

Even inspecting the ahref on the page with Firebug, it shows the partial url. I've tried various ways in regex, but the full url isn't there.

Anyway, I've saved the partial url to my list. How could I prefix the start of the url, i.e. ]http://www.ebay.co.uk/, to the other half?
 

shabbysquire

Client
Регистрация
25.11.2012
Сообщения
544
Благодарностей
26
Баллы
28
Managed it ok. For the scaped partial urls, use list processing to grab a line & create a variable. Then grab a go to page and I used this:

http://www.website.com/{-partial_url_variable-}

Bit of a bodge, but does the job! :rolleyes:
 

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)