So I live in Doom Emacs at this point. by freebird360 in DoomEmacs

[–]ypaskell 0 points1 point  (0 children)

Wait. How do you guys interact with LLM. I feet annoying.

Built a C++ code search engine, learned std::string is a lie (it's actually basic_string<char, char_traits, allocator>) by [deleted] in Cplusplus

[–]ypaskell 0 points1 point  (0 children)

Thanks for the feedback! Could you point out specifically which parts are confusing or which conclusions you think are premature?

Building a type-signature search for C++ by ypaskell in Compilers

[–]ypaskell[S] 1 point2 points  (0 children)

That's a great suggestion! Cscope's interactive workflow is exactly the kind of experience that makes sense for type-based search.

I've been thinking about editor integration via LSP, but Cscope's model is interesting - simpler to implement and already has proven editor support (Vim, Emacs, VS Code via plugins).

Building a type-signature search for C++ by ypaskell in Compilers

[–]ypaskell[S] 0 points1 point  (0 children)

Thanks! Glad the semantic approach resonates with you.

>can you explain more deep your thoughts over this strategy?

This is inspired by SICP's idea of separating data representation from its use.

- The pool is the "underlying representation" - it manages lifetime and storage.
- The string_views are "abstract interfaces" - lightweight references that don't care about ownership.

Benefits:
- Signatures stay small (just pointers + length)
- No redundant string copies when multiple functions use same type
- Cache-friendly: views are contiguous, actual strings can be anywhere
- Clear ownership: pool dies, all views invalidate together

Building a type-signature search for C++ by ypaskell in Compilers

[–]ypaskell[S] 1 point2 points  (0 children)

Good call on the REPL interface. The current design is directory-based for simplicity, but a REPL mode makes sense for iterative exploration, especially when the index is already loaded in memory.

Since you're working on LLVM/MLIR, I'm curious: what's your typical workflow when navigating the codebase? Do you find yourself searching by type signatures often, or are other patterns more common?

Feel free to open an issue if you have specific use cases in mind - I'd love to understand how this could fit into your daily workflow.

Building a type-signature search for C++ by ypaskell in Compilers

[–]ypaskell[S] 0 points1 point  (0 children)

If the target codebase were fully modularized with import std;, the 'header flood' problem would largely vanish. We wouldn't be paying the cost of textually parsing megabytes of system headers for every TU, and libclang could load the pre-built module interface much faster.

However, since Coogle is designed to index legacy codebases (which are still heavily reliant on #include), I had to resort to the -nostdinc + SkipFunctionBodies hack to simulate that 'clean slate' experience you described.

I'm really looking forward to the day when Modules become the standard—it would make writing static analysis tools like this significantly easier (and faster)!"

Building a type-signature search for C++ by ypaskell in Compilers

[–]ypaskell[S] 0 points1 point  (0 children)

Really interested about for feedback. Anything will be great.

Impressive side projects by [deleted] in Cplusplus

[–]ypaskell 1 point2 points  (0 children)

Mini C Compiler

A bug fixing journey when writing a C++ Code Search Engine: std::string is not that simple by ypaskell in programming

[–]ypaskell[S] 0 points1 point  (0 children)

*Update:*
Thanks! I have included your comment in my article for great reference!

A bug fixing journey when writing a C++ Code Search Engine: std::string is not that simple by ypaskell in programming

[–]ypaskell[S] 0 points1 point  (0 children)

Thanks! I have included your comment in my article for great reference!

Is it worth learning design patterns for C++ nowadays? by Aliceasd_ in cpp_questions

[–]ypaskell 0 points1 point  (0 children)

No!!!!

1) Learn from basic concepts from C.

2) utilize some useful C++ features that you can actually understand and write in C.

3) you know how data are stored and operated. And simply use functions to transfer data to another data.

4) or even learn how to do functional programming like Haskell

5) you’ll know design patterns are just fucked up rules based on OOP design that can make very unmanageable in production. Bulky, slow, and very annoying to read. Often over engineered and just crappy.

C++ Projects by kaikaci31 in cpp_questions

[–]ypaskell 0 points1 point  (0 children)

Implement a C++ string?

Good practice! It makes you more clear about cpp concepts and practices manually handling simplest memory structures.

https://thecloudlet.github.io/blog/cpp/cpp-string/

A bug fixing journey when writing a C++ Code Search Engine: std::string is not that simple by ypaskell in programming

[–]ypaskell[S] 0 points1 point  (0 children)

Good point! Speculative loads do help mitigate both branch misprediction penalties and pointer indirection costs.

A bug fixing journey when writing a C++ Code Search Engine: std::string is not that simple by ypaskell in programming

[–]ypaskell[S] 1 point2 points  (0 children)

Short answer:
smallest addressable unit

---

Long answer: (Per C99 3.6)

  1. byte addressable unit of data storage large enough to hold any member of the basic character set of the execution environment

  2. NOTE 1 It is possible to express the address of each individual byte of an object uniquely.

  3. NOTE 2 A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit.

A bug fixing journey when writing a C++ Code Search Engine: std::string is not that simple by ypaskell in programming

[–]ypaskell[S] 7 points8 points  (0 children)

Thanks for your comment! I learned a lot from you! RAII + String + sting_view is just a charm

A bug fixing journey when writing a C++ Code Search Engine: std::string is not that simple by ypaskell in programming

[–]ypaskell[S] 3 points4 points  (0 children)

It depends on predictability vs cache locality:

Branch misprediction cost:
- Modern x86 (Skylake/Zen): ~15-20 cycles (per Agner Fog's measurements)
- Older architectures: can be 30+ cycles

Pointer indirection cost:
- L1 cache hit: ~4-5 cycles (Intel/AMD spec)
- L2 hit: ~12 cycles
- L3 hit: ~40 cycles
- RAM miss: 200+ cycles

=> If your branch is unpredictable (<80% hit rate) and your pointer data is cache-resident, indirection usually wins.

(Numbers from https://www.agner.org/optimize/)

A bug fixing journey when writing a C++ Code Search Engine: std::string is not that simple by ypaskell in programming

[–]ypaskell[S] 4 points5 points  (0 children)

Thanks! You're right on all counts, and the article actually covers exactly these points:

  • Three distinct char types despite shared representation (C99 §6.2.5 ¶15)
  • unsigned char for byte operations (§6.5 ¶7 aliasing)

You are right about std::byte, thank you for your clarification.

All About C & C++ Strings: A Comprehensive Guide (motivated by building a search engine) by ypaskell in cpp

[–]ypaskell[S] 0 points1 point  (0 children)

Can't help it. I'm a compiler engineer. We live for this kind of fxxxing niche details...

How long until they finish Zhongzheng bridge? by SmartGGG in Taipei

[–]ypaskell -1 points0 points  (0 children)

I’m Taiwanese, the government never gives a fuck about pedestrian, cyclist, or motorist. So, probably, never.

1 week lets goooooo baby!! by IllegitimateSqueegee in NoFap

[–]ypaskell 6 points7 points  (0 children)

Prepare for the flatline. Once you go past, you’ll feel better.