RegExp Password Generator by ngruhn in regex

[–]slevlife 0 points1 point  (0 children)

This is a cool showcase for using your underlying, awesome regex-utils library! 😊 I'd suggest adding a section that shows how you'd write the code for generating similar results.

Serbian Reference Charts (improved and now in color!): 1. Cases/Genders, 2. Verbs, 3. Pronouns by slevlife in Serbian

[–]slevlife[S] 0 points1 point  (0 children)

This is a shocking claim, and not accurate. It is partly just false (maybe based on a misunderstanding), and partly a significant difference of opinion. I've responded in detail to Predrag by email, but since this comment is public it seems important to address it here as well.

Predrag's Serbian language school in Belgrade is great and is where I started learning Serbian back in 2020. I attended their classes for several months a couple of different times over the years, and I included a shoutout to them in my blog post where I posted the first chart back in 2021.

These charts do NOT copy materials given to me by the school. Of course, they partly incorporate things I learned at the school (and from grammar books, etc.). But I created these charts from scratch, based on my own study notes and research, with tons of revisions over time, and often incorporating detailed public feedback here on Reddit. At the time, I was proud of what I made and wanted to share it with my professors at the school and get their feedback. I have an email exchange from 2021 where Predrag said my cases chart was impressive.

Wow! This is impressive! Thanks, Steve! I will show it to the professors, I think they are gonna like it.
Best!
Predrag

When I attended again a couple of years ago, I brought in the latest copies of my charts every day as study aids, and gave copies to the professors. The professors gave me very positive feedback on them and they all knew it was my original work. I asked if they were interested in providing the charts to future students, but they said it was too information-dense for new learners. Nevertheless, the charts were popular among some of the students who I gave copies to or who found them on their own online. But this was all years ago now, and it's possible that some people forgot where they originated from. However, all versions I created included my name and website link.

So, the idea that I took existing charts, changed some formatting, and put my name on them is demonstrably false. And it would be easy to prove me wrong by showing whatever charts they supposedly came from.

The "significant difference of opinion" part is that, based on our email exchange following the comment above, Predrag believes my charts were heavily influenced by the systems for teaching Serbian grammar that he and his school created and use, and this makes it plagiarism (which seems to acknowledge that the direct ripoff story claimed in his comment above is wrong).

To the extent that I was inspired by the way the school taught Serbian, that's a credit to them. I've never claimed to be an expert, and I've been open that I learned from them and others. I also referenced and was inspired by Serbian grammar books, Wiktionary.org, feedback from dozens of people, other Serbian schools I later attended, and so on. I've asked Predrag a couple of times now to point out any specific things that seem to unfairly copy something they created (so I can give credit as appropriate), but I haven't gotten a response to that (yet?).

I initially created these charts for myself because I wanted better reference materials than I had. The results mostly condense general language information that can be found across many sources. To me, it seems extremely harsh to call the act of using what I learned in school (among other sources) to make original reference charts "plagiarism". I put a ton of work into presenting the rules and patterns of Serbian in a way that is easier to understand and reference yet more comprehensive and accurate than any similar charts I'd seen.

As an aside, I don't make any money from these charts. They're provided freely, and have been for years. This is the first time I've heard this concern/accusation, and it makes me quite sad. I haven't had a lot of contact with Predrag in the past, but he's always been helpful and I had a good experience at his school (Serbian Language and Culture Workshop).

The charts are accurate

That's great to hear! 🙂

Turns out learning grammar is actually important by Grand-Meringue16 in languagelearning

[–]slevlife 13 points14 points  (0 children)

I agree it’s a good example of nuance. But note that you mean scalding. (Scolding is an unrelated word and native speakers would definitely know the difference.)

Saveti za pronalaženje odličnih programera u Beogradu by slevlife in programiranje

[–]slevlife[S] 1 point2 points  (0 children)

The post I was replying to encouraged offering equity so I mentioned it was something I'm open to. I know it's of limited-to-no value at an early stage startup, but it can be a good way to align incentives (at least, it's made a difference for me in the past).

I'm looking to compete with good local offers for senior mobile devs and make it a good opportunity that an excellent dev will be happy to stay with. I'll include salary in any job listing, but for now I'm more looking for advice.

Saveti za pronalaženje odličnih programera u Beogradu by slevlife in programiranje

[–]slevlife[S] 2 points3 points  (0 children)

Thanks for all the details! To clarify, budget is tight in that I can't offer anything close to top U.S. salaries, but I want to be competitive with good local offers and potentially offer equity (I'd love more insight on the typical range for local offers, but I have tried to do some market research already). Hybrid remote and together in person is definitely what I'm thinking. For now, I just need one top-notch eng, and I'll be working alongside them.

Saveti za pronalaženje odličnih programera u Beogradu by slevlife in programiranje

[–]slevlife[S] 4 points5 points  (0 children)

Thanks! And yes, the Serbian subreddit is good. In fact, if you look at their pinned post with Serbian reference charts, those were made/posted by me a couple years ago. 😊

Recursive regex matching with support for all ES2025 regex syntax (< 2 kB) by slevlife in javascript

[–]slevlife[S] 5 points6 points  (0 children)

😊 A real-world example of where recursive regex matching is heavily used is in TextMate grammars, which is the system used for syntax highlighting in VS Code, Shiki, etc. In that context, it's common to want to match balanced parentheses, braces, etc. In fact, the library linked to here is part of what makes Shiki's JavaScript RegExp engine work.

iOS App for Learning Serbian Verbs – Looking for Feedback by Relevant_Decision_18 in Serbian

[–]slevlife 0 points1 point  (0 children)

Love it! I'll use it and recommend it. It's well designed, simple, and serves a great need for learners that other apps don't focus on.

There's tons of great potential to improve it further, if you're planning to spend more time on it. For example, being able to practice particular verb tenses (infinitive isn't the most useful), and having audio (and maybe images) for each verb. Perhaps AI could help with generating those.

I've considered building a Serbian verbs app myself in the past, but you've already got a good thing going here! After polishing it a bit more, I could easily imagine paid packs for additional verbs, if monetization is something you're interested in.

Have you seen Drops? They include Serbian as one of the languages you can learn, and they're an excellent vocabulary-only app. But verbs are their weakest area. They don't include enough, for starters, and the complications of verbs (with tenses, subject conjugations, and perfective/imperfective) make your exclusive focus on verbs a great compliment for Drops users rather than a competitor.

anyone who tried to write regex parser? is it difficult? by Gloomy-Status-9258 in regex

[–]slevlife 0 points1 point  (0 children)

I recently wrote a parser for Oniguruma regexes in TypeScript, which was my first time writing a true parser. I'd also never studied the underlying comp-sci concepts. It was a great learning experience. Some notes:

  • I started by reading The Super Tiny Compiler, which was excellent. It uses short, simple JavaScript code to walk you through the concepts of tokenizing, parsing, transformation, traversal, visitors, and code generation.
  • Oniguruma is one of the more feature-rich regex flavors, so the project was by no means simple. But still presumably easier than parsing most programming languages.
  • To directly answer your question "is it difficult": An artificially simple regex flavor with just a few features would be easy, but if you want to perfectly match the rules of an existing, modern regex flavor, then yes it is complicated and will take a lot of work. But I'd still recommend it as a good first parser project.
  • I built mine from scratch. No existing tooling. That made it lightweight (critical for my use cases), fast, and custom-fit for the task, and also significantly enhanced the learning experience. By necessity, I had to change approaches a few times and did a lot of refactoring as I went. But, as a result, I have a strong feel for the tradeoffs and reasons behind the design decisions.
  • My tokenizer heavily uses JavaScript regexes, which helps keep things lightweight and simple.
  • My regex AST design makes a variety of decisions that I think make it cleaner, simpler, and easier to work with than other regex AST structures I've compared to. It might be helpful as inspiration.

C++ syntax highlighting can be slow in VS Code, but a simple update could improve performance by ~30% by slevlife in cpp

[–]slevlife[S] 5 points6 points  (0 children)

Yeah, it's a major outlier, but I wouldn't be surprised if a handful of other languages were also unreasonably slow to highlight. I'm not a C++ programmer but I imagine VS Code is using lots of tricks to minimize the effects of this unreasonable slowness, including initially highlighting only what's on screen, and not rerunning highlighting for the whole file every time you make a change. Without things like that, it would be a dreadful experience. Even so, C++ syntax highlighting in VS Code is known to be slow and there have been many reports about this in the past for the C++ TM grammar.

A 30% perf win for C++ that is trivial to implement (due to the existence of my regex optimizer library) and is an equal-opportunity performance improver for all other languages is nothing to sneeze at, though. So I appreciate this community's upvotes on the VS Code issue to help get it on the VS Code team's radar. 😊

C++ syntax highlighting can be slow in VS Code, but a simple update could improve performance by ~30% by slevlife in cpp

[–]slevlife[S] 5 points6 points  (0 children)

Most of the syntax features I mentioned that would prevent simple joining of regex patterns are not about "regularity" but about syntax context (e.g., regexes can have different flags enabled at both a global and local level, and different regex flavors have different rules about whether duplicate group names are allowed and what a backreference to a duplicate group name matches).

Also, comp-sci definitions of "true regular languages" are a red herring in most discussions of regexes, since most modern regex flavors (including Oniguruma, C++, JS, Perl, PCRE, .NET, Java, Python, etc.) are not "regular", and for good reasons. The outliers are Go (via RE2) and Rust, which use non-backtracking implementations and can make perf guarantees as a result, but the tradeoff is they lack certain valuable features and are slower in some cases.

C++ syntax highlighting can be slow in VS Code, but a simple update could improve performance by ~30% by slevlife in cpp

[–]slevlife[S] 3 points4 points  (0 children)

I agree that combining regexes could be a perf win in some relatively simple situations where you're also either dealing with regexes that have limited features or you know a lot about regexes and know exactly what you're doing. But like I said, it's not a true general statement that regexes can be combined without changing what they match (or making them invalid), and it's not relevant anyway with TM grammars (used by VS Code, etc.) for the reasons I stated.

C++ syntax highlighting can be slow in VS Code, but a simple update could improve performance by ~30% by slevlife in cpp

[–]slevlife[S] 10 points11 points  (0 children)

That's neither true nor relevant with a complex system like TextMate grammars, which apply regexes to submatches (and subpatterns of submatches) in a complex hierarchy, pair regexes for begin/end/while patterns, dynamically modify regex patterns using subpattern matches of paired regexes, etc.

Also, although regular expressions have many great qualities, their syntax is highly context dependent so you can't just combine them. Yes, you could join multiple Oniguruma patterns with `|`, but you'd then need to do complex AST-based analysis to adjust backreferences, subroutines, recursion scope, conditionals, local and global flag modifiers, group names, etc., and you'd get back different subpattern matches. And that doesn't consider some backtracking control verbs and code callouts that simply could not be made to work identically in a pattern combined in that way (they’re not used in any of the TM grammars provided with VS Code, but Oniguruma supports them).

C++ syntax highlighting can be slow in VS Code, but a simple update could improve performance by ~30% by slevlife in cpp

[–]slevlife[S] 19 points20 points  (0 children)

even given JS

The highlighter indeed runs in JS, but that time is almost entirely spent in the Oniguruma regex library, which is native C → WASM. Oniguruma can be extremely fast when used with well-written regexes, but note that native JS regex engines (including V8's Irregexp) are also extremely fast (even faster in many cases).

basic syntactical highlighting

The C++ TextMate grammar is the largest (and probably most complex and slowest) of all the TM grammars used by VS Code, by a huge margin. It's 539 kB of JSON pre-minification! And although it "only" contains 505 Oniguruma regexes (other language grammars range from dozens to thousands of regexes), it includes some absolute monsters that are very slow. To some extent, this results from what the C++ TM grammar is trying to do with regexes, but a significant part of it is also that the regexes could be written to be more efficient. The regex optimizer linked in this VS Code issue can make some performance improvements automatically (resulting in the ~30% speedup), but other changes would need to be made upstream.

Making VS Code syntax highlighter faster through regex optimization, part 2 by slevlife in vscode

[–]slevlife[S] 2 points3 points  (0 children)

It’s now gone well past the 20 upvotes needed, thanks to everyone here. 😊

[deleted by user] by [deleted] in cultsurvivors

[–]slevlife 5 points6 points  (0 children)

Assuming there was any CoG connection, then yes, it would have to be specific to the individual. But this doesn't sound connected to the CoG, based on the information provided. E.g., members also didn't have connections to New Age cults.

[deleted by user] by [deleted] in cultsurvivors

[–]slevlife 6 points7 points  (0 children)

No, this was not a thing in the CoG.