is the back reference supported?

Tips on writing regular expressions for searching the post list

Moderators: Quade, dexter

is the back reference supported?

Postby varfick » Sat Dec 12, 2015 10:38 am

I'm trying to filter out some spam generally having a long cryptic filename string repeated twice in the subject.
The pattern like
Code: Links not allowed for unregistered users
^(\w{0,4}\d{6}\w{9})\[\d{1,3}/\d{1,3}\] - "\1\.(rar|zip)" yEnc$
does not work, returning all the posts instead of just matches.
Probably, I use a wrong syntax for back reference (\1) but I did not find exactly which flavor of regexp is used in NB.
Can anyone help?

Regards!
Varfick
varfick
n00b
n00b
 
Posts: 6
Joined: Fri Dec 11, 2015 3:29 pm

Re: is the back reference supported?

Postby Quade » Sat Dec 12, 2015 11:54 am

The one gotcha is that you can't use spaces. Spaces have special meaning. If you intend to match on spaces use \s or [ ]

Spaces act like "and". I'd try it with all the spaces compressed out and then tell me if it works for you.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44999
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: is the back reference supported?

Postby varfick » Sat Dec 12, 2015 12:28 pm

Quade wrote:The one gotcha is that you can't use spaces. Spaces have special meaning. If you intend to match on spaces use \s or [ ]

Spaces act like "and". I'd try it with all the spaces compressed out and then tell me if it works for you.

Oops! I completely forgot about those ANDy spaces =)
In the filter window for
Code: Links not allowed for unregistered users
^(\w{2,4}\d{6}\w{9})\[\d{1,3}/\d{1,3}\]\s-\s"\1\.(zip|rar)"\syEnc$
I get "test text accepted - will be included" even if I put "garbage" as the tested string.
If I change spaces to either \s or [ ] in the search field I get "Invalid search string".

My NB is version 6.62(b4358).
varfick
n00b
n00b
 
Posts: 6
Joined: Fri Dec 11, 2015 3:29 pm

Re: is the back reference supported?

Postby Quade » Sat Dec 12, 2015 1:13 pm

I can't tell what you're trying to match from the RE so, my suggestion is to take the RE apart and try it one section at a time. Typically I test them by pulling up a list of posts and using the search entry at the top of the list.

- Putting the $ on the end means even a space after the "yEnc" will make it not match.

- The quotes probably need to be escaped.

Does it work if you just paste the RE in both places? I mean try it without a back reference.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44999
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: is the back reference supported?

Postby varfick » Sun Dec 13, 2015 9:11 am

I've done that. Both parts match just OK without anchors and the back reference in the second part being replaced the whole first piece (\w{0,4}\d{6}\w{9}).
The idea is to have the pattern
Code: Links not allowed for unregistered users
^(\w{0,4}\d{6}\w{9})\[\d{1,3}/\d{1,3}\]\s-\s"\1\.(rar|zip)"\syEnc$
match the second string, but not the first with these two, for example:
Code: Links not allowed for unregistered users
(NordicAlbino)[04/10] - "SWH6929002tuSeommSE.rar" yEnc
SWH6929002tuSeommSE[04/10] - "SWH6929002tuSeommSE.rar" yEnc

I also debugged the regex in RegexBuddy (std::regex, VC++2015, case insensitive), it works. That's why I asked which regex flavor you use in NB...
varfick
n00b
n00b
 
Posts: 6
Joined: Fri Dec 11, 2015 3:29 pm

Re: is the back reference supported?

Postby Quade » Sun Dec 13, 2015 11:48 am

http://www.pcre.org/


SWH6929002tuSeommSE[04/10] - "SWH6929002tuSeommSE.rar" yEnc

^([A-Za-z]{2,6}[0-9]{4,10}[A-Za-z]{2,12})\[[0-9]{1,4}\/[0-9]{1,4}\].*\1([.]rar|[.]zip).*yEnc

^([A-Za-z]{2,6}[0-9]{4,10}[A-Za-z]{2,12})\[[0-9]{1,4}\/[0-9]{1,4}\].*\"\1([.]rar|[.]zip)\".*yEnc

These worked in the RE tester but Newsbin doesn't like these RE's for some reason.

I'll have to check it deeper once Newsbin is building again.

^([A-Za-z]{2,6}[0-9]{4,10}[A-Za-z]{2,12})\[[0-9]{1,4}\/[0-9]{1,4}\].*

It's good up to here but then when I add \1 to the end it fails to compile.

^([A-Za-z]{2,6}[0-9]{4,10}[A-Za-z]{2,12})\[[0-9]{1,4}\/[0-9]{1,4}\].*\1
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44999
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97

Re: is the back reference supported?

Postby varfick » Sun Dec 13, 2015 2:02 pm

Gotcha! You use PCRE with numbered capture groups disabled! =))
The modified expression with a named capture group and a reference by name
Code: Links not allowed for unregistered users
^(?<first>(\w{0,4}\d{6}\w{9}))\[\d{1,3}/\d{1,3}\]\s-\s"\k<first>\.(rar|zip)"\syEnc$
does the trick. The case is closed.
Thanks for you help and an excellent product!

Regards,
Max
varfick
n00b
n00b
 
Posts: 6
Joined: Fri Dec 11, 2015 3:29 pm

Re: is the back reference supported?

Postby Quade » Sun Dec 13, 2015 4:56 pm

As far as I can tell, it's not disabled in Newsbin. That's why I have to look at it more.
User avatar
Quade
Eternal n00b
Eternal n00b
 
Posts: 44999
Joined: Sat May 19, 2001 12:41 am
Location: Virginia, US

Registered Newsbin User since: 10/24/97


Return to Regular Expressions

Who is online

Users browsing this forum: No registered users and 23 guests