Hello,
I am scrapping websites through their sitemaps. To keep updated version of my databases containing these websites content I need to "compare" files containing the sitemaps.
When I say "compare" I mean that I need to find the differences between those two files. Because these differences will contain the new URLs/pages I'll have to scrape.
Thus, the script will find for example 100 URLs in a 6 million URLs file which aren't already in the previous file stored. This means it's new pages to scrape. Then, it will send those new URLs to a variable for the 'scrapping script to process.
However, I haven't find any feature which could handle that in Zenno text processing.
How can I manage do to it?
Thanks in advance.
I am scrapping websites through their sitemaps. To keep updated version of my databases containing these websites content I need to "compare" files containing the sitemaps.
When I say "compare" I mean that I need to find the differences between those two files. Because these differences will contain the new URLs/pages I'll have to scrape.
Thus, the script will find for example 100 URLs in a 6 million URLs file which aren't already in the previous file stored. This means it's new pages to scrape. Then, it will send those new URLs to a variable for the 'scrapping script to process.
However, I haven't find any feature which could handle that in Zenno text processing.
How can I manage do to it?
Thanks in advance.