[SOLVED] Issue with scanning urls.

shabbysquire

Client
Регистрация
25.11.2012
Сообщения
544
Реакции
26
Баллы
28
I'm creating a bot that scans my sites looking for dead links (or where there's no DNS/host). Typically when you visit a non-existent url, you get the "Server not found" (in Firefox browser).

When I come across a invalid url in debugger, PM just quits as it failed debug. I need it to move to the next stage, & do a text presence for "Server not found".

Btw, is there a quicker way in ZP to scan for dead links? I use to use Xenu tool, but found it slow and gave unreliable results.
 
Have you tried using HTTP requests and check the header for 404?
 
You should use GET requests and check headers.
 
Thanks for the advice. I've never used (or am familiar with) GET. I'll give it a try though.
 
200 OK is a standard response when page is exists.
 
200 OK is a standard response when page is exists.

Just some general questions.

In GET, I've chosen load headers only ok. Under the 'more' tab, the redirect box is ticked with the number 5. I assume this number (5) is the number of redirects until it gives up?

bigcajones mentioned looking for 404's; how do I split live & dead urls to separate lists?

Thanks.
 
You get headers after request.
Parse them with 200 OK. If found = good, if not = dead link.
 
  • Спасибо
Реакции: shabbysquire
Header's parsing is pretty fast.
 
How do you parse the headers after "Get Request"? I have Get set to headers only and then to variable, but I don't know how to examine the header at this point to get info to parse. I only know how to do it on an open browser using "create check of text presence"

Any help is appreciated

Thanks
 
Text processing - regex action
 

Кто просматривает тему: (Всего: 0, Пользователи: 0, Гости: 0)