Fast working on large lists

qlwik

Client
Регистрация
03.04.2013
Сообщения
207
Благодарностей
5
Баллы
18
Hi,
I need to work on large lists and I don't know how to do it fast.

For example I got list with 3M urls and I want to keep 5 urls from every single domain and delete rest. I was trying doing this with load data from file, regex and output, also I was trying to do it with lists, but all this is too slow, I would have to wait year to complete this list.

Is it possible to do it with ZP fast?
 

VladZen

Administrator
Команда форума
Регистрация
05.11.2014
Сообщения
22 453
Благодарностей
5 913
Баллы
113
What operations with list did you use?
Did you try Delete lines>matching regex or similar?
 

qlwik

Client
Регистрация
03.04.2013
Сообщения
207
Благодарностей
5
Баллы
18
I have made like that:
1. list connected with large file -> 2. get first line -> 3. extract domain from it with regex -> 4. loop repeated X times that will get and delete line containing text (domain) and save it to another list connected with other file -> 5. delete all lines containing text (domain) -> 6. go back to point 2.

X is variable loaded from file only once on the beginning.
 

VladZen

Administrator
Команда форума
Регистрация
05.11.2014
Сообщения
22 453
Благодарностей
5 913
Баллы
113
I have made like that:
1. list connected with large file -> 2. get first line -> 3. extract domain from it with regex -> 4. loop repeated X times that will get and delete line containing text (domain) and save it to another list connected with other file -> 5. delete all lines containing text (domain) -> 6. go back to point 2.

X is variable loaded from file only once on the beginning.
Get Line.png
 

qlwik

Client
Регистрация
03.04.2013
Сообщения
207
Благодарностей
5
Баллы
18
Zrzut ekranu z 2019-04-24 16:50:02.png
Zrzut ekranu z 2019-04-24 16:50:15.png
Zrzut ekranu z 2019-04-24 16:50:25.png
Zrzut ekranu z 2019-04-24 16:50:32.png
 

qlwik

Client
Регистрация
03.04.2013
Сообщения
207
Благодарностей
5
Баллы
18
temp is a list connected with big file
 

VladZen

Administrator
Команда форума
Регистрация
05.11.2014
Сообщения
22 453
Благодарностей
5 913
Баллы
113
Hey. I gave you easier way to get lines with domain from list. Did you try that?
 

qlwik

Client
Регистрация
03.04.2013
Сообщения
207
Благодарностей
5
Баллы
18
But it will take only one line, so I need to make a loop anyway, maybe I don't understand something?
 

VladZen

Administrator
Команда форума
Регистрация
05.11.2014
Сообщения
22 453
Благодарностей
5 913
Баллы
113
  • Спасибо
Реакции: qlwik

EtaLasquera

Client
Регистрация
02.01.2017
Сообщения
526
Благодарностей
112
Баллы
43
I do someting like this a little time ago, but it work with milion of e-mail's.
Код:
var tbl = project.Tables["Table1"];
var lst = project.Lists["List1"];
int i = 0;
int s = tbl.RowCount;
string data = "";
try{
   while (s != 0){
     string str = tbl.GetCell("A",0);
     List<int> found = new List<int>();
     int qtd = 0;
     int ini = str.IndexOf("@",0) + 1;
     int end = str.IndexOf(".",ini) - ini;
     string domain = str.Substring(ini,end);
     int j = 0;
     while (qtd < 5){
         if (j > tbl.RowCount) break;
         data = tbl.GetCell("A",j);
         if (data.Contains(domain)){
           lst.Add(data);
           qtd++;
         }
         j++;
       }
     j = 0;
     while (j < tbl.RowCount){
       data = tbl.GetCell("A",j);
       if (data.Contains(domain)){
         found.Add(j);
       }
       j++;
     }
     
     tbl.DeleteRow(found);
     found.Clear();
     s = tbl.RowCount;
   }
}
catch{
   project.SendErrorToLog("end");
}
For domains the secret is the position of your substring "domain" on this part of code:
Код:
int ini = str.IndexOf("@",0) + 1; //@ on email
int end = str.IndexOf(".",ini) - ini; //first dot after @
string domain = str.Substring(ini,end); //[email protected] will be yahoo
for 1.4 milion e-mails that code need 2s to process a list.
 

Вложения

  • 14,6 КБ Просмотры: 111
Последнее редактирование:
  • Спасибо
Реакции: Vvafel, Astraport и qlwik

qlwik

Client
Регистрация
03.04.2013
Сообщения
207
Благодарностей
5
Баллы
18
Ok guys thanks for help, I will try both solutions.
 

Кто просматривает тему: (Всего: 0, Пользователи: 0, Гости: 0)