Hi guys. I`m currently building an article grabber.
Input - keywords.
Action - search on ezine, click on first 10 results using regex (one at a time), then grabbing the article using the "Article Extraction" icon command in Project maker.
Output - each article is saved to a designated file baring the keyword name + sequential number.
I have two main problems:
1. Regex - currently the regex that I have build is:
(?<=<a\ href="/\?)[\w\W]*?(?=">)
Which gives me the wanted result - name of article - with the unwanted result - name of the writer.
Example:
Corporate-Video-Production---Corporate-Videos-High-Impact-on-Business-Audience&id=439810
expert=Shakir_A.
Corporate-Video-Productions---Need-and-Importance&id=439816
expert=Shakir_A.
How can I omit the writer (named expert on ezine) line in one go?
2. Using the "Article Extraction" I get a lot of junk at the beginning of the file, without the headline. I did not see any parameters for "Article Extraction".
If I want to create my own regex for grabbing, how can I take the headline with the content of the article in one go?
I want to understand regex better, even downloaded a proggi called "regexmagic" but that didnt do any magic at all, just got me banging my head on my keyboard.
Input - keywords.
Action - search on ezine, click on first 10 results using regex (one at a time), then grabbing the article using the "Article Extraction" icon command in Project maker.
Output - each article is saved to a designated file baring the keyword name + sequential number.
I have two main problems:
1. Regex - currently the regex that I have build is:
(?<=<a\ href="/\?)[\w\W]*?(?=">)
Which gives me the wanted result - name of article - with the unwanted result - name of the writer.
Example:
Corporate-Video-Production---Corporate-Videos-High-Impact-on-Business-Audience&id=439810
expert=Shakir_A.
Corporate-Video-Productions---Need-and-Importance&id=439816
expert=Shakir_A.
How can I omit the writer (named expert on ezine) line in one go?
2. Using the "Article Extraction" I get a lot of junk at the beginning of the file, without the headline. I did not see any parameters for "Article Extraction".
If I want to create my own regex for grabbing, how can I take the headline with the content of the article in one go?
I want to understand regex better, even downloaded a proggi called "regexmagic" but that didnt do any magic at all, just got me banging my head on my keyboard.