Update WebArchiveMaster - parser Webarchive.
1. Fixed a problem with Config.SFG - now set the number of minimum characters works correctly.
2. Added a new configuration file Kategory.SFG, which is responsible for an exception of announcements of articles. As a rule, only works with the default CMS and cuts off about 60% of unnecessary short announcements. But by cleaning unnecessary data reduced speed. By default, Kategory.SFG is the number "1" is enabled, you can disable the cleaning of the categories to accelerate the parsing, by putting a "0". It is possible with the running of the project to change the values.
3. Now all data is stored in a single folder, without the "www"
4. Adjusted the PHP script, but debris will still cling to - if the text is small, and the data on the page a lot (comments, advertising slogans, which are more text), you will inevitably assabley parser will capture them. If the text is clean, more or less, all unnecessary otmechaetsya.
5. The new configuration file - Zapros.SFG. He is responsible for checking domains for availability - if the domain is running, then it is not checked and taken the following from the list. The default is the number "1" if you believe that is way too much loss of domains that respond to that work, and in fact disabled and parked (the server doesn't give errors), you can put "0" and the domains will not be checked. But increases the likelihood the parsing is obviously non-unique texts.
http://zennolab.com/discussion/threads/webarchivemaster-parser-vebarxiva.40540/page-5