Help making a google places scraper.

  • Автор темы Автор темы Dr_Scythe
  • Дата начала Дата начала

Dr_Scythe

Новичок
Регистрация
04.07.2011
Сообщения
10
Реакции
0
Баллы
0
I've had zennoposter for around a month or two now. Still getting my head round everything it can do and how to go about doing things.

I'm looking for some guidance in creating a scraper to scrape google places into a spreadsheet.

If anyone has some experience with this I'd love some pointers.

Either reply to this thread or hit me up on skype. ID: grantderepas
 
It is possible, although it is a mega parse job. Zenno can do this no problem. - Do you want to export the entire google places ? :cool: Or a specific area / business genre ?

Give an example of what you want to accomplish -
 
Lol entire google places might be a bit much.

Was thinking more along the lines of entering a keyword such as city+profession to output a spreadsheet of business's of X profession in a target area.

Places info could be scraped into a txt document in csv format that could then be opened in excel?
 
I was considering creating a places scraper and then wisely (I think) decided it was above my pay grade.

Good Luck!
 
EDIT: To aid in load time, I have all of the following set NOT to load: images, video, sound, run active x, load active x, popup. I then have the following set to make sure they DO load: scripts, java & frames

If you visit this URL: http://maps.google.com/maps

And search something in this format "pizza clarks summit pa" <- clarks summit being a town name & pa being "pennsylvania", then transfer the DOM source to the regex portion of ZP, the following regex returns the following data:

(?<=drg:true\,laddr:\")(.*?)(?=\")

----------------------------------- match # 0 -----------------------------------
100 Old Lackawanna Trl, Clarks Summit, PA 18411-9108 (Fiorillo's Pizza)
----------------------------------- match # 1 -----------------------------------
100 East Grove Street, Clarks Summit, PA 18411-1750 (Colarusso's Cafe)
----------------------------------- match # 2 -----------------------------------
1002 South State Street, Clarks Summit, PA 18411-2249 (Dino \x26 Francesco's Pizza-Pasta)
----------------------------------- match # 3 -----------------------------------
100 Highland Avenue, Clarks Summit, PA 18411-1571 (Basilicos Pizzeria)
----------------------------------- match # 4 -----------------------------------
900 South State Street, Clarks Summit, PA 18411-1756 (Pizza Hut)
----------------------------------- match # 5 -----------------------------------
223 Northern Boulevard, S Abington Twp, PA 18411-9304 (Bellissimo Pizzeria and Ristorante)
----------------------------------- match # 6 -----------------------------------
1121 Northern Blvd, South Abington Township, PA 18411 (Domino's Pizza)
----------------------------------- match # 7 -----------------------------------
926 Lackawanna Trl, Clarks Summit, PA 18411-9278 (Wellington's Pub \x26 Eatery)
----------------------------------- match # 8 -----------------------------------
206 Grand Avenue, Clarks Summit, PA 18411-1402 (Jimmy D's Pizza \x26 More)
----------------------------------- match # 9 -----------------------------------
919 Northern Boulevard, S Abington Twp, PA 18411-2241 (Thick N Thin Pizza)

As you can see, it grabs the first 10 results perfectly. You would obviously be able to work with each of these strings individually to do with what you want (I'm thinking replacing ( with a comma and replacing ) with nothing, then appending the string to a .txt file would make it open perfectly as a CSV within excel).
 
Thanks for that crazy.

I'll have a play around with that tonight :)
 
As advised that (?<=drg:true\,laddr:\")(.*?)(?=\") is bringing up the full address and name of the places result. How would I modify that to include phone number?

More importantly how would I go about working out one of these parameters myself? So I don't have to come back asking silly questions :p

Also in my current template It's only taking the first result per page into the txt file. Would I have to set it up to find result (on success) append to text file and then on success back to find result and then when find next result fails go to next page?

Sorry again for all the likely dumb questions!
 
To get the phone # you would have to put in the regex: (?<=drg:true\,laddr:\")(.*?)(?=\")|(?<=sxph:\"\+1).*?(?=\")
The only problem with it is that you will get the address on one match line and the phone# on another. If you want to save all the results on the page, in your step branch you will have to put the 'all' modifier instead of '0'. This will pull all of the matches. You could make a counter loop and save match 0:1 in the first loop, 2:3 in the second...etc.

Here's a template to try out that does exactly that. You will need a Gplaces.txt file in your Resources folder with your keywords that you want to search for in it.

Посмотреть вложение Gplaces.xml

I'm also doing a video on this because I get a lot of questions about scraping pages and using regular expressions. I will show you how crazyflx came up with his regular expression to find the names and addresses. I'll have it uploaded tomorrow.
 
  • Спасибо
Реакции: Dr_Scythe
Very much looking forward to that video.

Cheers for the help!
 
Here's the video on Youtube that shows you how to parse a page with multiple regular expressions.

Код:
Развернуть Свернуть Копировать
http://www.youtube.com/watch?v=OFdd91R4L9o
 
  • Спасибо
Реакции: Dr_Scythe
Thanks heaps for that video.

Has really helped me start scraping successfully.

Got my google maps scraper working well.

Has also helped me build a personal scraper to get all my latest music download links for me :D
 
Glad to help.
 

Кто просматривает тему: (Всего: 0, Пользователи: 0, Гости: 0)