d2jsp
Log InRegister
d2jsp Forums > Off-Topic > Computers & IT > Programming & Development > New Regex Request > Paying 500fg For Answer And Explanation
12Next
Add Reply New Topic New Poll
Member
Posts: 9,231
Joined: Jan 10 2012
Gold: 2,980.00
Apr 18 2019 05:40pm
Hi guys ! I'm trying to convert a RegEx to another "need", but I'm having trouble succeeding. Here's the original RegEx I had :

Code
(?<=^([^"\r\n]|"([^"\\\r\n]|\\.)*")*)expressionToBeCapturedButNotInsideQuotes


This RegEx was taking account of " characters. Now I need to convert the regex to capture an expression anywhere but between &quot; notations (excluding escaped ones -> \&quot;). The main problem is that the original expression I had contained character sets, which is not working for what I'm trying to accomplish now.

Expected behavior if we're looking to match the word "test" (red means it should match) :

&quot; test
test &quot;
\&quot; test
test \&quot;
&quot; test &quot; test &quot; ----> This is considered as an opening quote, so the preceding "test" should match.
&quot; \quot&; test &quot; test &quot; ----> This is considered as an opening quote, so the preceding "test" should match.

This is the current RegEx I have :

Code
(?<!(?<!\\)&quot;.*?)(?!.*?(?<!\\)&quot;)test


Current behavior :

&quot; test
test &quot;
\&quot; test
test \&quot;
&quot; test &quot; test &quot;
&quot; \quot&; test &quot; test &quot;


Can someone help me accomplishing what I need and explain the resulting RegEx to me please ? I'm paying 500fg to the first one giving me the right RegEx with an explanations.

Thanks !


NOTE : If you find a RegEx where &quot; test doesn't match (even if I would like it to match in this case), it's not that bad, but test &quot; shouldn't match.

This post was edited by Access on Apr 18 2019 05:53pm
Member
Posts: 2,903
Joined: Aug 25 2009
Gold: 170.00
Apr 22 2019 10:43am
You srsly need a parser, not a regex.

Member
Posts: 6,988
Joined: Apr 16 2019
Gold: 50.00
Apr 22 2019 02:12pm
What language are you programming in? If I could see a full block of text + know the language it would help me out a bit.

Not to say your examples weren't detailed greatly and well, I just don't know if RegEx is the best approach to this.

(As Free mentioned with the parser, there may be a cleaner path to go down)

This post was edited by 3oDAtlas on Apr 22 2019 02:13pm
Member
Posts: 9,231
Joined: Jan 10 2012
Gold: 2,980.00
Apr 22 2019 03:27pm
I have no doubt that a parser would be the best possible solution if there were no restrictions. I asked my teacher and he said there's no need for a parser (for this particular request). He also told me that using a parser is not the point of the required work.

I have all of the project figured out. I only need to find a way to ignore an expression between &quot;

Like shown in the first Regex, it is possible to do the same with character sets to ignore when between ", but is it possible to do the same with &quot; ?

Language is C#

As for the code sample, I'll post one in a couple of hours (or possibly minutes).
Member
Posts: 6,988
Joined: Apr 16 2019
Gold: 50.00
Apr 22 2019 03:58pm
Quote (Access @ Apr 22 2019 05:27pm)
I have no doubt that a parser would be the best possible solution if there were no restrictions. I asked my teacher and he said there's no need for a parser (for this particular request). He also told me that using a parser is not the point of the required work.

I have all of the project figured out. I only need to find a way to ignore an expression between &quot;

Like shown in the first Regex, it is possible to do the same with character sets to ignore when between ", but is it possible to do the same with &quot; ?

Language is C#

As for the code sample, I'll post one in a couple of hours (or possibly minutes).


Alright, I getcha. I'll look forward to your update!
Member
Posts: 9,231
Joined: Jan 10 2012
Gold: 2,980.00
Apr 22 2019 07:29pm
So here is a code sample to test :

Code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
using System.Net;
using System.Web;

namespace HtmlEncodeTest
{
class Program
{
static void Main(string[] args)
{
string pattern = &quot;(?&lt;=^([^\&quot;\r\n]|\&quot;([^\&quot;;\\\\r\n]|\\\\.)*\&quot;)*)test&quot;;
string test = &quot;/*Ceci*/ est // un string test \&quot;test new et un autre test&quot;;

// testfffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff

List&lt;string&gt; wordsToMatch = new List&lt;string&gt;
{
&quot;test&quot;,
&quot;est&quot;
};

foreach (string word in wordsToMatch)
{
pattern = $&quot;\\b(?&lt;=^([^\&quot;\r\n]|\&quot;([^\&quot;\\\\\r\n]|\\\\.)*\&quot;)*){word}\\b&quot;;
string color = @&quot;hsl(240, 100%, 50%)&quot;;

/* test = Regex.Replace(test, pattern, match =&gt; Span(match, color));
}

Console.WriteLine(test); */


//MatchCollection matches = Regex.Matches(test, pattern, RegexOptions.Multiline);

//foreach(Match match in matches)
//{
// string replaced = Regex.Replace(test, pattern, new MatchEvaluator(Span));
//}



//Console.WriteLine(WebUtility.HtmlEncode(&quot;Enter a string having &#39;&amp;&#39;, &#39;&lt;&#39;, &#39;&gt;&#39; or &#39;\&quot;&#39; in it: &quot;));
}

public static string Span(Match match, string color)
{
StringBuilder sb = new StringBuilder(match.Value);
sb.Insert(0, $&quot;&lt;span style=&#39;color: {color}&#39;&gt;&quot;);
sb.Append(&quot;&lt;/span&gt;&quot;);

return sb.ToString();
}
}
}
Member
Posts: 9,231
Joined: Jan 10 2012
Gold: 2,980.00
Apr 25 2019 06:10pm
Closed ! I actually found something that suited my needs in this article : https://www.rexegg.com/regex-best-trick.html

Basically the trick is to match both what you don't want and what you want, but you wrap what you want in a capture group. Then, when you get a match, you can look for the group at the index 1 (first capture group, index 0 is the whole match) like so :

Code
match.Groups[1].Value


The Regex logic is :

Code
theExpressionYouDon'tWantToMatch|(theExpressionYouWantToMatch)


You can also put more than one expressions that you don't want to match.

Example :

Code
Expression1|Expression2|Expression3|(ExpressionToMatch)
Member
Posts: 12,703
Joined: May 17 2013
Gold: 12,935.00
Apr 26 2019 10:24am
Quote (Access @ 26 Apr 2019 02:10)
Closed ! I actually found something that suited my needs in this article : https://www.rexegg.com/regex-best-trick.html

Basically the trick is to match both what you don't want and what you want, but you wrap what you want in a capture group. Then, when you get a match, you can look for the group at the index 1 (first capture group, index 0 is the whole match) like so :

Code
match.Groups[1].Value


The Regex logic is :

Code
theExpressionYouDon'tWantToMatch|(theExpressionYouWantToMatch)


You can also put more than one expressions that you don't want to match.

Example :

Code
Expression1|Expression2|Expression3|(ExpressionToMatch)


You're already past regular expressions when you do that. It's not regular :-)

You basically followed my first advice and went on a step closer to a parser that uses groups of regular expressions to catch substrings you are looking for.

This post was edited by Klexmoo on Apr 26 2019 10:25am
Member
Posts: 23,719
Joined: Aug 21 2007
Gold: 433.48
Trader: Trusted
Apr 26 2019 11:23am
I mean you just can filter out any line containing the word quot and just search the remaining ones for the word test

regex:

Code
^((?!quot).)*$


will match only on the line without quot in it:

Code
&quot; test
test &quot;
\&quot; test
test \&quot;
&quot; test &quot; test &quot;
&quot; \quot&; test &quot; test &quot;
test
test


codewise it could look as follows

Code
read line do
if line not equal ^((?!quot).)*$
if line equal .*test.*
do whatever you wanna do
fi
fi
done


This post was edited by Meridius on Apr 26 2019 11:23am
Member
Posts: 9,231
Joined: Jan 10 2012
Gold: 2,980.00
Apr 26 2019 12:29pm
Quote (Klexmoo @ Apr 26 2019 11:24am)
You're already past regular expressions when you do that. It's not regular :-)

You basically followed my first advice and went on a step closer to a parser that uses groups of regular expressions to catch substrings you are looking for.


Well the point was to avoid using a parser, so If you're telling me that I achieved a similar behavior, I guess I accomplished what I wanted :P

Quote (Meridius @ Apr 26 2019 12:23pm)
I mean you just can filter out any line containing the word quot and just search the remaining ones for the word test

regex:

Code
^((?!quot).)*$


will match only on the line without quot in it:

Code
&quot; test
test &quot;
\&quot; test
test \&quot;
&quot; test &quot; test &quot;
&quot; \quot&; test &quot; test &quot;
test
test


codewise it could look as follows

Code
read line do
if line not equal ^((?!quot).)*$
if line equal .*test.*
do whatever you wanna do
fi
fi
done


That wasn't what I was looking to do. If so, it would have been so simple haha.
Go Back To Programming & Development Topic List
12Next
Add Reply New Topic New Poll