Compare 2 lists and take only news lines

MarioBros

Client
Регистрация
19.03.2013
Сообщения
55
Благодарностей
6
Баллы
8
Hello,

Just for informaion, i already saw this topic : http://zennolab.com/discussion/showthread.php?6686-Compare-two-list&highlight=compare

I have 2 lists.

List 1 (Master-List.txt)
List 2 (New-List.txt)

Every day, i get new links and add it in Master-List.txt (List 1).
The new links of the day are write in New-List.txt (List 2).

But i need to have a "better" List 2 with only links are not already in List 1 (Master-List.txt)
For example, if today i get 100 links but only 50 are really new because i never get them before, i want write only this 50 new links in List 2.

I tried to create a project with help of the topic but i dont understand something, i think i need a loop to compare each line from the master list?

Thank your for your help.
 

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
683
Баллы
113
Actually the opposite. You need a loop to compare line from list 2 to merged list 1. Use regex for the comparison. Or you can add everything to the master list and then remove duplicates to give you a shiny new list.
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 715
Баллы
113

MarioBros

Client
Регистрация
19.03.2013
Сообщения
55
Благодарностей
6
Баллы
8
In fact, i didn't understand all what you say in this topic because you don't talk about loop and for me it's not possible to do that without loop but i'm not a programmer so maybe i'm wrong.

To explain a little bit better my problem i start again :

Every day i scrape many links with zenno (between 1000 and 2000), i must use these links only one time and never 2 times or more.
So i created a "Master list" to add every day all links to this list. I want use only the new links i never use before, that mean every day after scraping links i need to compare these new links to the master list to know what links i use before.

I hope my explanation is better like this.

So to that i need to create loop, no? there are not other way i think, it's true?
If you can say me step by step how to do that, it will be so cool :-)

Zenno can work easy if i have thousands of links to check?
 

rostonix

Известная личность
Регистрация
23.12.2011
Сообщения
29 067
Благодарностей
5 715
Баллы
113

bigcajones

Client
Регистрация
09.02.2011
Сообщения
1 216
Благодарностей
683
Баллы
113

MarioBros

Client
Регистрация
19.03.2013
Сообщения
55
Благодарностей
6
Баллы
8

Stroks

Client
Регистрация
09.02.2012
Сообщения
219
Благодарностей
14
Баллы
18
This is a much faster solution, still takes around 100 seconds to check a 30K long list with a 10k master list. It goes through list temp and writes elements of the list in a string only if the element isn't present in the master list. The master list is merged in a string with .(dot) between each 2 elements for faster execution.

Код:
var sourceList = project.Lists["Temp"];
string str = Convert.ToString(project.Variables["master"].Value); //merged master list with . between elements
string liststring = "";

lock(SyncObjects.ListSyncer)
{
   
	for(int i=0; i < sourceList.Count; i++)
    {   
        var keyword = sourceList[i]; // get line from list
        if (!str.Contains("." + keyword + ".")) liststring = liststring + keyword + Environment.NewLine;
        // check if  master string contains text, if yes add it to the string
         
    }
}



return liststring;
 
  • Спасибо
Реакции: CSS

shabbysquire

Client
Регистрация
25.11.2012
Сообщения
544
Благодарностей
26
Баллы
28

Кто просматривает тему: (Всего: 1, Пользователи: 0, Гости: 1)