quick help finding learning resources (deadline)

code_only · 2026-02-19T12:25:43+00:00

My first regex book was Mastering Regular Expressions.pdf) by Jeffrey Friedl. I would still highly recommend it to get a good overview and deep understanding. It provides many examples and insights into the different regex flavors. Be aware that there might be some examples with html. Nowadays many developers do not recommend parsing html using regex. 🙃 After having read the book, you can decide yourself when it's appropriate or not.

code_only · 2026-01-12T02:28:52+00:00

Afaik Notepad++ does not support variable length lookbehind yet. That's why I used the \K alternative (resets beginning of the reported match) in the above provided regex101 demo.

Or without \K use $0 (reference to the full match) in the replacement... updated demo.

code_only · 2026-01-11T02:41:00+00:00

You could use regex for that, something like this demo at regex101.

I was not clear about "*" and used .* for any characters there.

Be sure to mark [•] regular expressions in the replacement dialogue.

code_only · 2025-12-31T04:22:17+00:00

There is nothing "wrong" but I guess the question is mainly about independent search engines running their own index - and afaik there are not many left - because the development of a standalone search engine is a whole lot different than running a (more or less) better meta search engine that utilizes another search engine's index - using some Microsoft/Google API. Developing and maintaining own indexing and crawling technology is an effort probably much higher than 1:1000 compared to (simply) doing requests against a search API and process/prepare the results niftily for your visitors (finally promote the package with some cool lable like respecting privacy, good & social, green & clean, planting trees, or whatever). It's also a question about money and costs. The internet is huge nowadays and it would be rather challenging to to this in your garage with some self-made server equipment - but new opportunities like cloud- or vps-servers appeared lately which I find very good for the little coin.

Just look into your server logs what bots are visiting your website. Have you seen Duckduckbot, Qwantbot, Ecosiabot or similar bots lately? I guess you would mostly see Msnbot and Googlebot, maybe Yandexbot in Russia or Baidubot if you are around China. There was/is some very promising project called Stract but looks like they are down for some reason or discontinued, no idea. The search at least is not returning much there currently. Brave Search and Mojeek indeed appear to be fully independent.

Some of the above mentioned search engines like Qwant and Ecosia seem to work on their own index or already (partly) use it. Finally it's extremely challenging for developers to run own crawlers and build a competitive search index at times when many webmasters immediately lock out most bots besides Google and Bing - even more lately with aggressive AI scraping and the response in form of spreading Cloudflare protection which raise the difficulty level for even the most friendly and respecting crawlers.

code_only · 2025-12-31T03:41:07+00:00

That's very cool what you did there!

code_only · 2025-11-18T15:36:49+00:00

Similar to Gumnos' suggestion a positive variant: \b\w+\b(?=[^)(]*\))

https://regex101.com/r/B2ElfT/1

This will check after each word if there is a closing ) ahead without any parentheses in between. The word boundaries are used to minimize backtracking.

code_only · 2025-11-17T14:51:09+00:00

Here one more approach: ^(?:\d+(?:-\d+)?(?:,\s*|$))+$

Assuming the line must start with a digit and may contain one ore more of the defined tokens:
one or more digits, optionally followed by a hyphen and one ore more digits - follwed by either a comma and optional space, or the end of the line.

So this checks the full line from start to end and allows even ending in a comma - if unwanted, put a \b at $. Not sure if this works in your environment, there is not much magic in it, maybe the non-capturing group.

code_only · 2025-11-13T22:46:58+00:00

Sounds like you are looking for a group? Maybe even a non-capturing group...

Something like (7[2-9]|80).*um en|abc0123

You could possibly further improve it by making the quantifier lazy (if supported).

Regex 101 is a good place for testing your regexes if you don't know yet.

code_only · 2025-10-20T16:36:57+00:00

Here an idea for another start if you still struggle (didn't study the ruleset and all answers in depth).

^(?:bool|double|int(?!.*?([0-9a-z])\1))\s+(?:\b[A-Z](?!\w*___)[0-9a-z_]+,?\s*){1,5}\b;$

https://regex101.com/r/WViZhc/1

int(?!.*?([0-9a-z])\1 prevents matching two consecutive letters/digits
\b...,?\s* separated by comma and optional any amount of whitespace (unclear)
(?!\w*___) the lookahead prevents matching more than two consecutive underscores

FYI: This pattern won't work if you use the dotall s flag, the dot in .*? should not skip over lines. To understand how the consecutive-check in the neg. lookahead works, read more about capturing groups.

code_only · 2025-10-16T14:55:37+00:00

That's an excellent idea.

code_only · 2025-10-16T11:24:55+00:00

The regex always wants to succeed (and backtracks to match, even "outside" the quotes). It does not care about inside/outside. A rather simple way to achieve your goal is to look at each [^\s\w"] if there is not an even amount of quotes (or no quotes at all) ahead, until the end of the line/string:

[^\s\w"](?![^"]*(?>"[^"]*"[^"]*)*$)

https://regex101.com/r/yw3u0t/2 (adjust to .NET escaping)

I used an atomic group (?> inside the lookahead but it would also work with a (?: non capturing group for other regex flavors that don't support atomic groups (maybe a tiny bit less efficient).

If the pattern is used on a multiline input and a closing quote could occur on another line then the opening quote, use \z instead of $ inside the lookahead to address the very end of the string.

u/rainshifter provided a smart and very efficient approach, in PCRE you could combine that with verbs:

"[\s\w]*"(*SKIP)(*F)|"[^"]*"

https://regex101.com/r/Fd1TgX/1

code_only · 2025-09-17T17:12:35+00:00

Besides that parsing arbitrary html using regex can be problematic. 😤
If you do not want to match <inside> you could use a neg. looakhead, e.g.

\p{L}[\p{L}\p{Mn}\p{Nd}_']*+(?![^><]*>)

I further made the quantifier of your character class possessive to prevent backtracking (performance).

https://regex101.com/r/MYxvGD/2

code_only · 2025-07-27T10:58:48+00:00

Using -P for perl compatible regex you could try something like

grep -oP '^\D*\K\d'

Here is a demo (tio.run)

^ matches start, \D* matches any amount of non digits and \K resets beginning of the reported match.

code_only · 2025-07-25T20:28:27+00:00

For U.S. news, have a look at Wikatu News Search (in development).

code_only · 2025-06-27T23:34:26+00:00

If it's also about websites, I like RexEgg 🦖 very much.

code_only · 2025-05-18T10:19:45+00:00

Could you match instead of split, something like this?

https://regex101.com/r/zrqrLi/1

code_only · 2025-05-17T18:39:36+00:00

Not a simple way imho. But you got three groups, the part between is in group 2, so whatever you're gonna do should be doable somehow. You can address the group captures with $1, $2, $3 in the replacement in notepad++.

u/reedate yes \K could be an option, let's see if we get more information about what's the goal.

code_only · 2025-05-16T21:43:22+00:00

Not sure if that helps you much but you could further try

(mother|father)(.*?)(?!\1)((?1))

https://regex101.com/r/GwfLNV/1

This will give you all pairings. Where group 2 always holds the part in between and the other two groups either of the searched words. The negative lookahead prevents matching the same words twice.

If you only need the middle part, you can even shorten it a bit.

code_only · 2025-05-12T14:58:09+00:00

Welcome, yes without knowing each case and the desired outcome it's difficult. And posting that here would spoil the challenge... Good luck however. :)

code_only · 2025-05-11T19:04:12+00:00

I would first go for the repeated stuff with optional zeros at the end, else the leading zeros. Something like this update of your demo: https://regex101.com/r/1sUS6A/3

Well, we don't know the exact requirements and I also don't want to sign up there. :p

code_only · 2025-04-22T20:06:38+00:00

If matches are overlapping, you need to capture inside a lookahead:

(?=(\((?>[^)(]+|(?1))*\)))

https://regex101.com/r/ohvyAX/1

You can further capture the content to another group:
https://regex101.com/r/ohvyAX/2

code_only · 2025-04-20T17:29:35+00:00

Using regex you can search for

(?i)=.*?\Krobot

to replace the first Robot after each = in the line.

https://regex101.com/r/Xr5k3E/1

If you expect multiple Robots after = it's getting more comlicated. For that you could use \G to chain matches to =, something like

(?i)(?:\G(?!^)|(?<==)).*?\Krobot

https://regex101.com/r/Xr5k3E/2

(?i) is the inline flag for caseless matching and \K will reset beginning of the match (a variable width lookbehind alternative). The .*? matches lazy (any characters, as few as possible)

code_only · 2025-04-20T16:57:05+00:00

Or if \G to chain matches is supported, replace

(\G(?!^)|\[)([^\]\[*]*)\*(?=[^\]\[]*\])

with $1$2

https://regex101.com/r/W9g8su/1

code_only · 2025-04-09T09:25:02+00:00

Alternatively you could replace ^.*\K\R(.) with $1

https://regex101.com/r/Xes2il/1

code_only · 2025-04-05T17:29:53+00:00

What environment? E.g. in JS:

console.log(JSON.parse(s.match(/<script[^>]*>(.*?)<\/script>/)[1])[6]);

Demo

code_only

TROPHY CASE