Ranges not Working with Tables?

JanPaul999 · 02.02.2014

I first tried: "read cell"...

I set the column to "A", and then experimented with ranges for the rows, like "1-5" or "0-end" or "all", but none of these would work...
I got the error "string was not in correct format".

Ok so I tried "take line" instead. I set it to "specify number" and then set that to a range also like I did with "read cell".
Then I set "to variables", and then I set Column A (which is the only column I need) to be stored in a variable, the problem here though is...

Even though I specified a range it only put's 1 row of that column into my variable, but I need all of them. :S

So basically I have excel files that have random number of rows (usually a few hundred) and 10 columns. I'm trying to get all the values of column A in a list. I'm not interested in the other columns...

How can I do this?

And why are those methods above not working?
In the documentation it says you can use ranges with tables so I'm confused and have no idea why it won't work.

bigcajones · 02.02.2014

Maybe try something from this page...

http://zennolab.com/discussion/showthread.php?10734-Поиск-по-спискам-и-таблицам-через-C-макрос

JanPaul999 · 02.02.2014

I don't know C# bigcajones. I'm attempting to use regex now to extract the column that I need from the CSV file.

My current (only half-working) match pattern is "http.*?(?=")", this gets all URLs in the CSV file.
The problem though is that I only want the first URL on each line, not the second one (each line has 2 URLs).

Can anyone modify this regex to only grab the first URL from each line in te CSV file? Again my current match pattern is: http.*?(?=")

Would be awesome as I'm struggling with this.

Here is the format of the contents in the CSV file.

"SourceURL","AnchorText","SourceCitationFlow","SourceTrustFlow","FirstIndexedDate","LinkType","LinkSubType","TargetURL","TargetCitationFlow","TargetTrustFlow","FlagRedirect","FlagFrame","FlagNoFollow","FlagImages","FlagDeleted","FlagAltText","LastSeenDate","FlagMention","DateLost","ReasonLost"
"http://twb.oswshop11.nl/shop/pages.php?pageid=8","abc",53,0,08/04/2011 00:00,"TextLink","TextLink_Normal","http://www.abc.nl/",30,16,0,0,0,0,0,0,22/12/2013 00:00,0,,""
"http://likeur.startpagina.nl/","abc",26,24,22/10/2008 00:00,"TextLink","TextLink_Normal","http://www.abc.nl/",30,16,0,0,0,0,0,0,24/12/2013 00:00,0,,""

drvosjeca · 02.02.2014

Did you try this way?

http:.*?(?=",".*http)

it is simple to limit things if they always come in same format

JanPaul999 · 02.02.2014

drvosjeca написал(а):
Did you try this way?

http:.*?(?=",".*http)

it is simple to limit things if they always come in same format

Awesome, thanks, that looks to be working. Can you explain the limiting part so I can figure this out myself next time?

JanPaul999 · 02.02.2014

Just tested it out and it's working mostly but not completely because in some instancs there can be 3 URLs on 1 line, for example this line:

"http://www.kerkgebouwen-in-limburg.nl/view.jsp?content=12427","http://www.elkandre.nl/kapel/kapel.htm",21,10,24/09/2009 00:00,"TextLink","TextLink_Normal","http://www.elkandre.nl/Kapel/kapel.htm",9,6,0,0,0,0,0,0,01/12/2013 00:00,0,,""

Using that modified regex there are two links extracted from that, like this:

http://www.kerkgebouwen-in-limburg.nl/view.jsp?content=12427
http://www.elkandre.nl/kapel/kapel.htm",21,10,24/09/2009 00:00,"TextLink

Any idea how to fix that? I'm trying to understand the regex but not fully grasping it.

rostonix · 02.02.2014

You can still use "Read cell" action.
But do not use ranges within it.
Just create a loop and use counter's value in it.

lokiys · 02.02.2014

if i understand you right then you can do like that...

Make one variable with default value = 0

add that variable to take A and your variable in first case it will be 0 (zerro)

then in next action save that A 0 to your pre-defined list.

next add action where you increase your variable by 1 and now it is 1 so and loop it again to take A 1 and again save it in your list...

Hope that helps...

drvosjeca · 02.02.2014

I dont quite understand why are you guys trying to make this more complicated???

Thing is simple here... all strings in his case have same structure no matter how many URL's are in every line, so only thing you need to do is extend regex a bit and that is all.

Try this Jan Paul and let me know

http://.*?(?=",".*",".*",")

JanPaul999 · 03.02.2014

Thanks, works like a charm

JanPaul999 · 03.02.2014

I'm humping against another regex challenge now, in the HTML below I need to get the first link of this format (domain.com/download_results.php?i=82395&mode=pageranked) after a specified domain in this format (www-domain-nl).

<tr class='row_color1'><td>82395</td>
<td>www-domain-nl.txt</td>// I need the first link after this domain text that has the structure of the link bolded below
<td>87</td>
<td>87</td>
<td>03 Feb</td>
<td>no</td>
<td>finished</td>
<td>
<table cellpadding='0' cellspacing='0'><tr>
<td>download: </td>
<td><a href='http://domain.com/download_results.php?i=82395&mode=all' class='download_link'>all</a></td><td> <a href='http://domain.com/download_results.php?i=82395&mode=pageranked' class='download_link'>pageranked</a></td></tr></table>
</td>
</tr>
<tr class='row_color2'><td>82394</td>
<td>www-domain2-nl.txt</td>
<td>42</td>
<td>42</td>
<td>03 Feb</td>
<td>no</td>
<td>finished</td>
<td>
<table cellpadding='0' cellspacing='0'><tr>
<td>download: </td>
<td><a href='http://domain.com/download_results.php?i=82394&mode=all' class='download_link'>all</a></td><td> <a href='http://domain.com/download_results.php?i=82394&mode=pageranked' class='download_link'>pageranked</a></td></tr></table>

Anyone know how to build this regex?

drvosjeca · 03.02.2014

I dont see a challenge here...

If this is all you have in your code than take a look again and you will see that in first case you have that "(http removed for formating purposes)" before the link and in other case you dont have it... so that makes it more simple to get what you need.

All you need in that case is: (?<=\)).*?(?=')

Now if there is more to this code and you just wanna pick after specific domain like you said, than all you need is domain and line breaks to count them in.

in that case regex like this should work: (?<=www-domain-nl[\w\W]*?\)).*?(?=')

no magic needed

JanPaul999 · 03.02.2014

drvosjeca написал(а):
If this is all you have in your code than take a look again and you will see that in first case you have that "(http removed for formating purposes)" before the link and in other case you dont have it... so that makes it more simple to get what you need.

All you need in that case is: (?<=\)).*?(?=')

With "(http removed for formatting purposes)" I meant in the normal source file there is http:// there but the forum was turning that into a link so I removed the http

drvosjeca написал(а):
Now if there is more to this code and you just wanna pick after specific domain like you said, than all you need is domain and line breaks to count them in.

in that case regex like this should work: (?<=www-domain-nl[\w\W]*?\)).*?(?=')

no magic needed

I tried that one on the above code, but it doesn't seem to work, it doesn't grab anything.

drvosjeca написал(а):
I dont see a challenge here...

yeah it's probably really easy if you know regex well. For me it's been years since I worked with regex so I'm very rusty in that department.

drvosjeca · 03.02.2014

same thing, you just need to remove brackets and ad href in between...

(?<=www-domain-nl[\w\W]*?<a href.*href.*)http.*?pageranked(?=')

Поиск

Ranges not Working with Tables?

JanPaul999

Client

bigcajones

Client

JanPaul999

Client

drvosjeca

Client

JanPaul999

Client

JanPaul999

Client

rostonix

Мистер

lokiys

Moderator

drvosjeca

Client

JanPaul999

Client

JanPaul999

Client

drvosjeca

Client

JanPaul999

Client

drvosjeca

Client

Кто просматривает тему: (Всего: 0, Пользователи: 0, Гости: 0)