This seems quite a tough and long post, I've posted in two forums already to no avail. Regexp is either frustrating or i'm a noob, I think it's a bit of both.
1. I can't quite seem to catch the question mark in the first line. And it inexplicably captures the second string too, which doesnt make any sense to me as I thought I had the lookbehind covered. Here are the two lines.
with:
in order to get only first line. However I get both, not as one match but as two. It baffles me as to why I'm picking up the second line up too because it's clearly aimed only at the first line. What on earth leads it to think it's found a <P> when clearly there's just an <A> beats me... I can't help thinking the issue may be somewhere in the spin but the more I look the more I feel I'm gonna go nuts. To my mind all the stuff inside the spin doesnt lead off to some error at all. And I've put the A and the H3 there, glaringly so and yet it still matches it all. The key is in what it captures of the second line:
It seems to think there's an (?=\<\</A></H3>) after that </STRONG> but all there is is a space bar, and besides, when I leave only letters with no space bars it comes up with the same result. And there's certainly no H3 to be seen, so I dunno what match 1 is referring to.
All the spin inside is because I'm matching variations in a bigger file, which I've got covered. I'm also surprised I'm not picking up the question mark in the real text at the end of what I'm looking for. I've tried sticking it all over the place, inside the spin, outside, with and without line breaks, to no avail. Would appreciate a hand, thanks.
2. I'm having trouble with line breaks, trying to match line breaks of a certain kind. I'd like to match all the strings that are before other strings that have phrases like '0 stars', '1 star' and so on.
An example of this is the following:
I want the line in the middle. So I thought about the following:
without an inexplicable excape double in front of the d: (?<=\^)$(?=\^\d)
but it doesn't work. I tried a ? in front of the line break, like this:
and without the escape, but that didn't work either. What am i doing wrong?
1. I can't quite seem to catch the question mark in the first line. And it inexplicably captures the second string too, which doesnt make any sense to me as I thought I had the lookbehind covered. Here are the two lines.
HTML:
<H3><A href="/question/index;_ylt=AuceFBRGAkkNJn5iiu3ZDYYjzKIX;_ylv=3?qid=20070704123624AA9H28e"><STRONG class=highlight>Accountant</STRONG>?</A></H3>
<P>...to do to get into university to be an <STRONG class=highlight>accountant</STRONG> ? what requirement do I need? how about the average...</P>
with:
Код:
(?<=<H3><A href="/question/index;\w+={[a-z, A-Z, 0-9]*_[a-z, A-Z, 0-9]|[a-z, A-Z, 0-9]*}*;_\w+=\d\?\w+=[a-z, A-Z, 0-9]*">){[a-z, A-Z, 0-9]* <STRONG class=highlight>[a-z, A-Z, 0-9]*</STRONG> [a-z, A-Z, 0-9]*\?|<STRONG class=highlight>[a-z, A-Z, 0-9]*</STRONG>[a-z, A-Z, 0-9]*|[a-z, A-Z, 0-9]*<STRONG class=highlight>[a-z, A-Z, 0-9]*</STRONG>|<STRONG class=highlight>[a-z, A-Z, 0-9]*</STRONG>}?(?=\<\</A></H3>)
in order to get only first line. However I get both, not as one match but as two. It baffles me as to why I'm picking up the second line up too because it's clearly aimed only at the first line. What on earth leads it to think it's found a <P> when clearly there's just an <A> beats me... I can't help thinking the issue may be somewhere in the spin but the more I look the more I feel I'm gonna go nuts. To my mind all the stuff inside the spin doesnt lead off to some error at all. And I've put the A and the H3 there, glaringly so and yet it still matches it all. The key is in what it captures of the second line:
----------------------------------- match # 0 -----------------------------------
<STRONG class=highlight>Accountant</STRONG>
----------------------------------- match # 1 -----------------------------------
to do to get into university to be an <STRONG class=highlight>accountant</STRONG>
It seems to think there's an (?=\<\</A></H3>) after that </STRONG> but all there is is a space bar, and besides, when I leave only letters with no space bars it comes up with the same result. And there's certainly no H3 to be seen, so I dunno what match 1 is referring to.
All the spin inside is because I'm matching variations in a bigger file, which I've got covered. I'm also surprised I'm not picking up the question mark in the real text at the end of what I'm looking for. I've tried sticking it all over the place, inside the spin, outside, with and without line breaks, to no avail. Would appreciate a hand, thanks.
2. I'm having trouble with line breaks, trying to match line breaks of a certain kind. I'd like to match all the strings that are before other strings that have phrases like '0 stars', '1 star' and so on.
An example of this is the following:
Sue an accountant who filed your taxes incorrectly when penalty is involved?
An accountant who handled...it ok to sue the accountant for the penalty?
0 Stars In United States - Asked by monaya - 6 answers - 3 years ago
I want the line in the middle. So I thought about the following:
Код:
(?<=\^)$(?=\^\\d)
but it doesn't work. I tried a ? in front of the line break, like this:
Код:
(?<=\?\^)$(?=\^\\d)