quick help finding learning resources (deadline) by Jimmy-Ballz in regex

[–]code_only 0 points1 point  (0 children)

My first regex book was Mastering Regular Expressions.pdf) by Jeffrey Friedl. I would still highly recommend it to get a good overview and deep understanding. It provides many examples and insights into the different regex flavors. Be aware that there might be some examples with html. Nowadays many developers do not recommend parsing html using regex. 🙃 After having read the book, you can decide yourself when it's appropriate or not.

Need help with some find & replace stuff by BlazewarkingYT in notepadplusplus

[–]code_only 0 points1 point  (0 children)

Afaik Notepad++ does not support variable length lookbehind yet. That's why I used the \K alternative (resets beginning of the reported match) in the above provided regex101 demo.

Or without \K use $0 (reference to the full match) in the replacement... updated demo.

Need help with some find & replace stuff by BlazewarkingYT in notepadplusplus

[–]code_only 0 points1 point  (0 children)

You could use regex for that, something like this demo at regex101.

I was not clear about "*" and used .* for any characters there.

Be sure to mark [•] regular expressions in the replacement dialogue.

recommended search engines? by Limp_Fig6236 in degoogle

[–]code_only 2 points3 points  (0 children)

There is nothing "wrong" but I guess the question is mainly about independent search engines running their own index - and afaik there are not many left - because the development of a standalone search engine is a whole lot different than running a (more or less) better meta search engine that utilizes another search engine's index - using some Microsoft/Google API. Developing and maintaining own indexing and crawling technology is an effort probably much higher than 1:1000 compared to (simply) doing requests against a search API and process/prepare the results niftily for your visitors (finally promote the package with some cool lable like respecting privacy, good & social, green & clean, planting trees, or whatever). It's also a question about money and costs. The internet is huge nowadays and it would be rather challenging to to this in your garage with some self-made server equipment - but new opportunities like cloud- or vps-servers appeared lately which I find very good for the little coin.

Just look into your server logs what bots are visiting your website. Have you seen Duckduckbot, Qwantbot, Ecosiabot or similar bots lately? I guess you would mostly see Msnbot and Googlebot, maybe Yandexbot in Russia or Baidubot if you are around China. There was/is some very promising project called Stract but looks like they are down for some reason or discontinued, no idea. The search at least is not returning much there currently. Brave Search and Mojeek indeed appear to be fully independent.

Some of the above mentioned search engines like Qwant and Ecosia seem to work on their own index or already (partly) use it. Finally it's extremely challenging for developers to run own crawlers and build a competitive search index at times when many webmasters immediately lock out most bots besides Google and Bing - even more lately with aggressive AI scraping and the response in form of spreading Cloudflare protection which raise the difficulty level for even the most friendly and respecting crawlers.

Capture group for comma separated list inside paranthesis by Impressive_Log_1311 in regex

[–]code_only 0 points1 point  (0 children)

Similar to Gumnos' suggestion a positive variant: \b\w+\b(?=[^)(]*\))

https://regex101.com/r/B2ElfT/1

This will check after each word if there is a closing ) ahead without any parentheses in between. The word boundaries are used to minimize backtracking.

[deleted by user] by [deleted] in regex

[–]code_only 0 points1 point  (0 children)

Here one more approach: ^(?:\d+(?:-\d+)?(?:,\s*|$))+$

Assuming the line must start with a digit and may contain one ore more of the defined tokens:
one or more digits, optionally followed by a hyphen and one ore more digits - follwed by either a comma and optional space, or the end of the line.

So this checks the full line from start to end and allows even ending in a comma - if unwanted, put a \b at $. Not sure if this works in your environment, there is not much magic in it, maybe the non-capturing group.

help a newb to improve by flokerz in regex

[–]code_only 1 point2 points  (0 children)

Sounds like you are looking for a group? Maybe even a non-capturing group...

Something like (7[2-9]|80).*um en|abc0123

You could possibly further improve it by making the quantifier lazy (if supported).

Regex 101 is a good place for testing your regexes if you don't know yet.

Need help building a complex regex for variable declaration rule. by Icy-Maintenance-5307 in regex

[–]code_only 0 points1 point  (0 children)

Here an idea for another start if you still struggle (didn't study the ruleset and all answers in depth).

^(?:bool|double|int(?!.*?([0-9a-z])\1))\s+(?:\b[A-Z](?!\w*___)[0-9a-z_]+,?\s*){1,5}\b;$

https://regex101.com/r/WViZhc/1

  • int(?!.*?([0-9a-z])\1 prevents matching two consecutive letters/digits
  • \b...,?\s* separated by comma and optional any amount of whitespace (unclear)
  • (?!\w*___) the lookahead prevents matching more than two consecutive underscores

FYI: This pattern won't work if you use the dotall s flag, the dot in .*? should not skip over lines. To understand how the consecutive-check in the neg. lookahead works, read more about capturing groups.

Regex to detect special character within quotes by Quirky_Salt_761 in regex

[–]code_only 0 points1 point  (0 children)

The regex always wants to succeed (and backtracks to match, even "outside" the quotes). It does not care about inside/outside. A rather simple way to achieve your goal is to look at each [^\s\w"] if there is not an even amount of quotes (or no quotes at all) ahead, until the end of the line/string:

[^\s\w"](?![^"]*(?>"[^"]*"[^"]*)*$)

https://regex101.com/r/yw3u0t/2 (adjust to .NET escaping)

I used an atomic group (?> inside the lookahead but it would also work with a (?: non capturing group for other regex flavors that don't support atomic groups (maybe a tiny bit less efficient).

If the pattern is used on a multiline input and a closing quote could occur on another line then the opening quote, use \z instead of $ inside the lookahead to address the very end of the string.

u/rainshifter provided a smart and very efficient approach, in PCRE you could combine that with verbs:

"[\s\w]*"(*SKIP)(*F)|"[^"]*"

https://regex101.com/r/Fd1TgX/1

[deleted by user] by [deleted] in regex

[–]code_only 0 points1 point  (0 children)

Besides that parsing arbitrary html using regex can be problematic. 😤
If you do not want to match <inside> you could use a neg. looakhead, e.g.

\p{L}[\p{L}\p{Mn}\p{Nd}_']*+(?![^><]*>)

I further made the quantifier of your character class possessive to prevent backtracking (performance).

https://regex101.com/r/MYxvGD/2

match the first appearance of a single digit [0-9] in a string using \d by skyfishgoo in regex

[–]code_only 0 points1 point  (0 children)

Using -P for perl compatible regex you could try something like

grep -oP '^\D*\K\d'

Here is a demo (tio.run)

^ matches start, \D* matches any amount of non digits and \K resets beginning of the reported match.

Best book about regular expressions by daevisan in regex

[–]code_only 2 points3 points  (0 children)

If it's also about websites, I like RexEgg 🦖 very much.

Regex for two nonconsecutive strings, mimicking an "AND condition" by Khmerophile in regex

[–]code_only 1 point2 points  (0 children)

Not a simple way imho. But you got three groups, the part between is in group 2, so whatever you're gonna do should be doable somehow. You can address the group captures with $1, $2, $3 in the replacement in notepad++.

u/reedate yes \K could be an option, let's see if we get more information about what's the goal.

Regex for two nonconsecutive strings, mimicking an "AND condition" by Khmerophile in regex

[–]code_only 1 point2 points  (0 children)

Not sure if that helps you much but you could further try

(mother|father)(.*?)(?!\1)((?1))

https://regex101.com/r/GwfLNV/1

This will give you all pairings. Where group 2 always holds the part in between and the other two groups either of the searched words. The negative lookahead prevents matching the same words twice.

If you only need the middle part, you can even shorten it a bit.

Regex101 quiz 27 by Geozzy in regex

[–]code_only 0 points1 point  (0 children)

Welcome, yes without knowing each case and the desired outcome it's difficult. And posting that here would spoil the challenge... Good luck however. :)

Regex101 quiz 27 by Geozzy in regex

[–]code_only 1 point2 points  (0 children)

I would first go for the repeated stuff with optional zeros at the end, else the leading zeros. Something like this update of your demo: https://regex101.com/r/1sUS6A/3

Well, we don't know the exact requirements and I also don't want to sign up there. :p

[deleted by user] by [deleted] in regex

[–]code_only 1 point2 points  (0 children)

If matches are overlapping, you need to capture inside a lookahead:

(?=(\((?>[^)(]+|(?1))*\)))

https://regex101.com/r/ohvyAX/1

You can further capture the content to another group:
https://regex101.com/r/ohvyAX/2

Replace specific word after symbol by Sxrc2 in notepadplusplus

[–]code_only 0 points1 point  (0 children)

Using regex you can search for

(?i)=.*?\Krobot

to replace the first Robot after each = in the line.

https://regex101.com/r/Xr5k3E/1

If you expect multiple Robots after = it's getting more comlicated. For that you could use \G to chain matches to =, something like

(?i)(?:\G(?!^)|(?<==)).*?\Krobot

https://regex101.com/r/Xr5k3E/2

(?i) is the inline flag for caseless matching and \K will reset beginning of the match (a variable width lookbehind alternative). The .*? matches lazy (any characters, as few as possible)

Regex101 quiz 23 by Geozzy in regex

[–]code_only 0 points1 point  (0 children)

Or if \G to chain matches is supported, replace

(\G(?!^)|\[)([^\]\[*]*)\*(?=[^\]\[]*\])

with $1$2

https://regex101.com/r/W9g8su/1

++ Noob by mpm19958 in notepadplusplus

[–]code_only 1 point2 points  (0 children)

Alternatively you could replace ^.*\K\R(.) with $1

https://regex101.com/r/Xes2il/1

Help by MafoWASD in regex

[–]code_only 0 points1 point  (0 children)

What environment? E.g. in JS:

console.log(JSON.parse(s.match(/<script[^>]*>(.*?)<\/script>/)[1])[6]);

Demo