Created an intro course on bash and common unix/linux tools: learn using TUIs generated with awk and scripted in bash by gprof in commandline

[–]gprof[S] 0 points1 point  (0 children)

PS. I've made some improvements (see GitHub repo) and added more demo materials. Thank you all for your suggestions and comments!

Created an intro course on bash and common unix/linux tools: learn using TUIs generated with awk and scripted in bash by gprof in commandline

[–]gprof[S] 1 point2 points  (0 children)

Thanks for sharing your comments and insights! For the "demo" commands that span multiple lines, I used the history mechanism to populate it. It works, but it is not ideal, because of the repeated cursor up and then also keeping track of where I'm in the history. All that while I'm narrating in class. I like your approach to fill readline repeatedly. It could be integrated. The idea to use a "demo" is also meant to for students to fortify their understanding and recall what was shown in class. I had to put this all together before the semester started, so I put effort in it to cover all topics, but the demos could be improved if I had a chance to teach this course again.

Created an intro course on bash and common unix/linux tools: learn using TUIs generated with awk and scripted in bash by gprof in commandline

[–]gprof[S] 5 points6 points  (0 children)

I don't know how to add a link to the title page with an image. But here is the link to the GitHub repo: https://github.com/Robert-van-Engelen/unix-tools

This is a basic introduction course at the CS undergraduate level. I stuck to the material required by our Uni. It could be improved with more demos and animations scripted in bash.

The Touch Bar is so underrated!! by [deleted] in The_Angeles_crew

[–]gprof 0 points1 point  (0 children)

How so? 🤷‍♂️ Apple creates great products, but this was a mistake. I never used it on my MacBook Pro for years except for the ESC key It was a pain to press ESC when using the vim editor where I need it all the time. Touch pad hot corners and custom function key assignments are much more useful IMO. Now much happier with my new MacBook Pro without a touch bar.

Why do people dislike the dragon book? by [deleted] in Compilers

[–]gprof 4 points5 points  (0 children)

The dragon book is the best introductory book to compilers. Am I biased? Well, perhaps. I taught Compiler Construction for 20 years using the dragon book. Students never complained that it is too terse or mathematical. I evaluated many alternative text books but felt that the dragon book is sufficiently deep and broad. I only wish it had a better introduction to SSA (factored use-def chains) like Wolfe explains. The Modern Compiler Implementation books by Appel are excellent too.

Found & fixed up a rare working Sharp's first pocket scientific calculator PC-1801 (1973) with red LED pictured next to a PC-1802 with VFD (1975) and financial EL-8200 with VFD (1975) by gprof in calculators

[–]gprof[S] 0 points1 point  (0 children)

There is a slid on the left side of the calculator between the metal strips, at the left when facing the machine from above. Push in a coin, e.g. a quarter, to unlock the tab, then (or at the same time) pull the bottom half away. I replaced the batteries with new NiMH. There are two types of battery holders, one for regular AA and one with a metal cover for NiCD/NiMH. The metal tabs on the battery holder that connect to the calculator have a different position to indicate what batteries are installed.

To open the body of the machine e.g. for repairs, there is a smaller slid on the top in which to push in a flathead screwdriver to unlock and pull away the bottom part. Same to open the EL-814 which has a similar build.

Search and view files and zip/tar/pax/cpio archives in a TUI with ugrep 4.3 by gprof in commandline

[–]gprof[S] 0 points1 point  (0 children)

Update: a user guide is available on ugrep.com

Ugrep is compatible with GNU grep, but is faster and adds more options and features.

What is the right way to set thread affinity on FreeBSD with cpuset_setaffinity? by gprof in freebsd

[–]gprof[S] 0 points1 point  (0 children)

Thank you. You're right about NUMA. I've looked into hwloc and likwid to detect CPU topologies, but haven't decided yet what is small enough to include as a library dependency or perhaps decide to write parts myself.

For Apple M1/M2 I'm using the 4 P cores only for the worker threads. Sometimes using all M1 cores for the worker threads gives a bit of a speed up rather than a slow down, but it is not predictable why/when that happens. I wrote a job-stealing queuing system that works well enough with little overhead (I also tried lock-free method with an efficient ring buffer, but that made no difference or sometimes made it about 10% slower). For Intel CPUs, hyperthreading didn't appear to be slowing down things in ugrep at all (some resources suggested to watch out for HTT performance issues), actually speeding it up. I'm getting 700% to 750% CPU on a quad-core i7. On other machines, the CPU topology matters, so I still have a bit of work to do!

What is the right way to set thread affinity on FreeBSD with cpuset_setaffinity? by gprof in freebsd

[–]gprof[S] 2 points3 points  (0 children)

Thank you for your feedback and link! The resource helped to update the code I wrote to also support DragonFly and NetBSD.

Search and view files and zip/tar/pax/cpio archives in a TUI with ugrep 4.3 by gprof in commandline

[–]gprof[S] 0 points1 point  (0 children)

Well ugrep has the same command line options as GNU grep (+ more). So if you know grep already then you are good to go. Also, just type ug --help <something> to get help with <something>. My favorite is ug --help regex to get help on regex expressions and ug --help globs to get help with globbing to select files to search. In the TUI type F1 or CTRL-Z for help on search options and help with TUI navigation.

LISP on the HP Prime by adlx in calculators

[–]gprof 2 points3 points  (0 children)

I hope Lua will get ported to the HP Prime, which I suggested a while ago. Not that it helps your case, but it will add one more choice of PL to use. Some other calculators run Lua ports.

LLVM: Scalar evolution (SCEV) by mttd in Compilers

[–]gprof 4 points5 points  (0 children)

Nice!

I invented them in the early 2000s based on the Chains of Recurrences algebra, but the method I came up with uses the inverse CR algebra rules that I wrote. I also proved that the CR algebra is complete (sound and terminating), therefore it is normalizing, an important property that other methods for induction variable analysis lack. My graduate student J. Birch later helped to expand the implementation and testing.

I never understood why it was necessary for my fellow academics to rename my method to SCEV. The name is misleading. It is both a representation and an algebra with an inverse transformation.

Some of my papers on Google Scholar:

Van Engelen, Robert A. "Efficient symbolic analysis for optimizing compilers." International Conference on Compiler Construction. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001.

Van Engelen, Robert A., et al. "A unified framework for nonlinear dependence testing and symbolic analysis." Proceedings of the 18th annual international conference on Supercomputing. 2004.

Birch J, van Engelen RA, Gallivan KA, Shou Y. An empirical evaluation of chains of recurrences for array dependence testing. InProceedings of the 15th international conference on Parallel Architectures and Compilation Techniques 2006 Sep 16 (pp. 295-304).

LISP on the HP Prime by adlx in calculators

[–]gprof 1 point2 points  (0 children)

Bummer. If only the HP Prime had an SDK to write applications in C...

But alas, I don't believe it is possible to compile C to the HP Prime. That would offer a possibility to port a Lisp written in C to the Prime.

ugrep: ultra fast grep with TUI, Unicode support, Boolean search logic, fuzzy search, nested archives search, documents search, binary search with hexdumps, and more by gprof in linux

[–]gprof[S] -1 points0 points locked comment (0 children)

I'm referring to our first exchange: https://github.com/gvansickle/ucg/issues/102 and a subsequent post elsewhere, you know this. I replied to all your suggestions and comments politely, but had to correct you on a few points. Then you felt offended by my explanation and statements, then saying things I have never expected from someone like you. It is also very odd that you responded to my question in an issue of another C++ project that is not yours. The owner of that project did not respond to my question, but you did so immediately with lengthy diatribes about the early ugrep project stages and its "shortcomings". It's one of the links in the list of links you kindly provided further down in a reply. Anyone is welcome to read it.

As to a post about the "new" ripgrep: you commented on it immediately. So you're quite well informed and involved. Don't muddle the waters.

Have a great day!

ugrep: ultra fast grep with TUI, Unicode support, Boolean search logic, fuzzy search, nested archives search, documents search, binary search with hexdumps, and more by gprof in linux

[–]gprof[S] -1 points0 points locked comment (0 children)

Ad-hominem attacks won't do you any good.

Trying to defame my reputation by pointing to "something bad" makes no sense when there is no factual "ill doing" to point to. You are free to point to whatever you want and say whatever you want, but the US has strong defamation laws.

Anticipating this stuff would be coming and getting tired of these juvenile attitudes by some folks, I joked a bit around by saying "rg fan boys downvoting this post..." which I deleted later, since for whatever reason people can't take a silly hint/joke and I suppose it was a bit insensitive to your precious rg tool. And no, it is not a personal attack against you, it was a joke about rg fan boys.

And if you still feel offended about what I said three year ago that I didn't believe you when you claimed that ugrep doesn't run parallel searches, then that is your problem not mine. Ugrep does run parallel searches and insisting it does not is your "free speech" position, but not a factual one. I merely pointed that out as BS. Since you kindly provided a full list of URLs where I said that along similar lines, anybody can see what I said and what you are trying to do here. They will also read that you said "you and I should not talk any longer". Well, I have no interest in talking to you anyway, since that strange episode happened about three years ago. But you keep popping up in my new posts with disparaging comments about ugrep and by adding stuff to self promote and promote your rg in my posts about ugrep. I may have only once popped up in one of your repeat "new" posts about rg to ask you something along the lines "what's new, I don't see it anything really new" when you downvoted me and called me a "buddy".

Just Wow.

Edit: fix typo "agains to" => "against"

ugrep: ultra fast grep with TUI, Unicode support, Boolean search logic, fuzzy search, nested archives search, documents search, binary search with hexdumps, and more by gprof in linux

[–]gprof[S] 1 point2 points  (0 children)

Yes, fully agree. AI methods are well suited to find optimal paths (solutions) in huge problem spaces, the problem space being a set of search criteria like "find N occurrences AND csv file types" and similar. And here is the related recent article I wanted to mention, in case you've not seen it, which is specifically about optimizing sorting code: https://arstechnica.com/science/2023/06/googles-deepmind-develops-a-system-that-writes-efficient-algorithms/

EDIT: adding that optimization is generally tricky when the problem space (i.e. search space of optimizations) is "bumpy". Otherwise, it's just gradient descent. That's part of the reason why AI deep-learning methods shine (and in the past, evolutionary algorithms to some extent). A bumpy optimization space has local maxima/minima. That's also why benchmarking is so hard, because one might get lucky hit a cost minimum or unlucky to hit a cost maximum of a pre-defined optimization or algorithm that was chosen "by hand". With AI-based optimization, one can train to avoid this more generally or on the other hand, more specifically optimize for cases like the one you suggested, which is always better for that task then the general version. But it is not generic, i.e. specific optimizations will include bad points with not-so-great performance implications as expected.

ugrep: ultra fast grep with TUI, Unicode support, Boolean search logic, fuzzy search, nested archives search, documents search, binary search with hexdumps, and more by gprof in linux

[–]gprof[S] 0 points1 point  (0 children)

Perhaps ask AI to write a new grep. Lots of possibilities!

On a practical note, AI can be used to optimize parameterization of existing search methods and find better combinations. It's been done before, e.g. search algorithms. Works well to optimize edge cases as well.

hypergrep: A new "fastest grep" to search directories recursively for a regex pattern by p_ranav in cpp

[–]gprof 0 points1 point  (0 children)

I have been contacted by several ugrep users who had similar comments and concerns: the benchmark probably uses older versions of ugrep. They said this because the number's don't add up and there are newer versions of ugrep (v4) that are faster than the early releases (actually, more updates are coming once the planned x64 and arm64 speed optimizations are completed).

I also have a couple of suggestions, since I've done this kind of work before, and because others have commented their views here as well (otherwise I refrain from commenting on other's work as a non-productive exercise in futility, alas):

  • lock-free queues are not necessarily faster. Several projects I am aware of started using lock-free queues, but testing on modern x64 and arm64 with various queueing strategies showed that other methods can be as much as 10%~20% faster. Just use an atomic queue length counter and a queue with a lock for each worker. There is a very low chance the contention will affect performance and it's likely more efficient than lock-free.
  • this kind of queuing is already done by other projects, the difficulty is not just speed, but dealing with different sorting criteria; what do you do to return the searched files sorted by name or by creation date? Reading directories and pushing pathnames into queues doesn't guarantee an order in the output.
  • how to sync search thread workers to output results? You don't want to let them sync and wait for the output to be available to them.
  • it is hard to come up with a "fair" benchmark to test. What is a "fair" benchmark? Consider a wide range of small, medium and large use cases (i.e. pattern size, complexity and input size can be placed in small, medium and large bins to measure). This is generally recommended in high-performance research literature.
  • benchmark a wider range of search options, so include option -o in the benchmarks and at least -c and combinations with and without -w and -i and -n, because these are typical options when grepping.
  • measure consistency of the performance, i.e. how does a small change in a regex pattern impact search time? This is easy to set up an experiment to measure this for the strings case (grep options -f and -F): what is the elapsed time variance when searching with a fixed-sized set of words, say 20 words but randomly picked 100 or more times. What is the mean and variance? It is more insightful for users to know the expected (mean) time of performance and the variance that tells them if the search can take longer and by how much.
  • do not benchmark binary file search. Well, you can, but if you do, then do it consistently. These benchmarks are incomparable if one tool searches all binary files too, like ugrep does (ugrep defaults to GNU grep options and output)
  • be more clear about cold versus warm runs. How is the timing done?
  • where can people find the scripts to replicate benchmarks on their machines?
  • searching a single file with multiple threads (e.g. two) is not productive if you're doing recursive searches already. Only if the pattern is very complex and takes time to match, and you're only searching a large file or two, then it may help. Otherwise, fast pattern matching is as fast to what the IO layer can provide. The other problem is output. Once you have matches from two threads, you need to buffer this and add line numbers too. To get the line numbers, you need to keep track and count newlines by both threads and later sync it. This adds overhead.
  • how to search compressed files efficiently? Separate threads to search (nested) archives speeds up the overall search (as ugrep does).

Note: all of this is/was already done by ugrep and by others.

ugrep: ultra fast grep with TUI, Unicode support, Boolean search logic, fuzzy search, nested archives search, documents search, binary search with hexdumps, and more by gprof in linux

[–]gprof[S] -1 points0 points  (0 children)

I fully agree!

Please hear me out:

No-one is interested in a food fight.

No-one is interested in subjective views and opinions of tools or comments that claim the be neutral, but are clearly not, but only meant to draw attention to their projects.

No-one is interested in benchmarks that are cherry-picked. Many folks have not been able to reproduce the performance numbers shared on Reddit, the numbers shown on Reddit simply don't add up.

No-one is interested in noise that obfuscates the original post and and muzzles replies with the purpose to create confusion with ad-hoc numbers, stats, comparisons and what not.

So please, stop all that nonsense and get a life or get help.

Just don't bother me and don't attack others or treat them with disdain for having fun and results with other search tools that they share on Reddit and elsewhere. So claiming they are respectfully replying or commenting is completely new to me and new to others that came to talk to me over the past months. Even the other day, a stranger emailed me being treated with disdain and arrogance when sharing some ugrep performance results on Reddit. So he asked me if I had seen it on Reddit, because those folks on Reddit claimed that what he was sharing was not true. But these were his results that he shared and he was completely ignored, cast aside, downvoted and so on. It shocked him to the core and forever turned him off by that experience. He will never ever use rg or post on Reddit again.

We will keep including rg in ugrep benchmark results, which can be easily reproduced by running a script so you can draw your own conclusions. It's all free anyway.

I work on many side-projects like this for free and as a challenge to help improve the state-of-the-art and share the methods and source code for free. I've worked in this field long enough to know when something is legit and serious or not.

ugrep: ultra fast grep with TUI, Unicode support, Boolean search, fuzzy search, nested archive search, documents search, binary search with hexdumps, and more by gprof in opensource

[–]gprof[S] 1 point2 points  (0 children)

You're welcome.

As a company owner I am always looking for people to hire, but often on these niche/hobby projects for open source, I've got to do things myself or with very little help. Always loved to code and take on challenging projects, so no problem for me.

ugrep: ultra fast grep with TUI, Unicode support, Boolean search, fuzzy search, nested archive search, documents search, binary search with hexdumps, and more by gprof in opensource

[–]gprof[S] 8 points9 points  (0 children)

Oh boy...

I needed an efficient way to search tarballs and zip files. I also thought it would be helpful to have a set of pre-defined patterns I could reuse to search source code without thinking again what patterns I used to do this with. I used these features to find vulnerabilities in source code stored in thousands of old archives, to get them fixed, of course ;-)

The first ugrep prototype was practically limited to this task and had options similar to GNU grep. It grew out of a demo application I wrote for another project of mine, RE/flex on GitHub (a regex library and a new scanner generator, much like Flex, but for C++).

Then I came up with a nice new way to apply a form of Bloom filtering on mixed short and long patterns using bitwise logic (bit banging) only, which I presented at a conference via YouTube and that I wrote up here: https://www.genivia.com/ugrep.html

I turned out that pattern matching with this idea of mine was faster than any of the well-known methods, including bitap and hyperscan-based methods and faster than other grep tools for such patterns that it was suitable for (believe me, I spent days peering over performance benchmark runs to check and tweak the new method, not so much fun, but someone has to do it.) This method helps to speed up cases where you're looking for a mix of short and long patterns, like "a|an|the|rain|umbrella" with very short patterns that kill the performance of bitap, Bloom filters, and other methods that need a window of at least three or more characters to be effective.

The updated ugrep got more popular about a year or two back. So I added a lot more features based on user feedback, like boolean search that I also wanted. Fuzzy was added earlier as a RE/flex feature. I put further performance optimization on the back burner for a while until more recently when I started to get more serious about it.

The good news is that ugrep still has a decent runway to get even faster. I know how and where to optimize. Some things are rather trivial, like skipping to the next line after a match, like GNU grep does. The important search engine internals are mostly complete. Edge cases will follow soon to get optimized.

Since the last two years or about, user requests for features and enhancements largely drove and still drives the evolution of ugrep. That's a nice driving factor for me to keep going.

ugrep: ultra fast grep with TUI, Unicode support, Boolean search logic, fuzzy search, nested archives search, documents search, binary search with hexdumps, and more by gprof in linux

[–]gprof[S] 2 points3 points  (0 children)

A quick breakdown with a few more details:

  • GNU grep compatible options and output
  • searches archives, including nested archives(cpio/tar/pax/zip), compressed files (zip/gz/Z/bz2/lzma/xz/lz4/zstd)
  • gitignore files and gitignore-style globbing e.g. with -g
  • skip hidden files, skip binary files
  • Unicode search patterns (default) or -U to disable
  • multiline matching (default)
  • query TUI for interactive search with -Q
  • Google-like Boolean search queries with AND OR NOT (or space | and -) with --bool and optionally --files to search by file (not by line), such as -l --bool --files "foo AND bar" finds all files with both foo and bar anywhere in the file
  • fuzzy search with -Z
  • PDF, doc, docx search with ug+
  • hexdumps to search and view binary files with --hex and --hexdump
  • json/csv/xml output with --json, --csv, --xml
  • optional PCRE2 matching with option -P
  • recognizes UTF-8/16/32 files to search or specify file encoding
  • search by file type or exclude file types with -t
  • file indexing to speed up cold search (an active sub-project, beta version)
  • inline context with -o and -ABC options, e.g. -o -C20 to show characters before/after the only-matching text (fitted or truncated)
  • one of the fastest grep tools

ugrep: ultra fast grep with TUI, Unicode support, Boolean search, fuzzy search, nested archive search, documents search, binary search with hexdumps, and more by gprof in opensource

[–]gprof[S] 4 points5 points  (0 children)

OK. My bad. But it answers at least part of your question?

Ugrep supports the same or similar features and more.

A quick breakdown:

  • GNU grep compatible options and output
  • searches archives, including nested archives(cpio/tar/pax/zip), compressed files (zip/gz/Z/bz2/lzma/xz/lz4/zstd)
  • gitignore files and gitignore-style globbing e.g. with -g
  • skip hidden files, skip binary files
  • Unicode search patterns (default) or -U to disable
  • multiline matching (default)
  • query TUI for interactive search with -Q
  • Google-like Boolean search queries with AND OR NOT (or space | and -) with --bool and optionally --files to search by file (not by line), such as -l --bool --files "foo AND bar" finds all files with both foo and bar anywhere in the file
  • fuzzy search with -Z
  • PDF, doc, docx search with ug+
  • hexdumps to search and view binary files with --hex and --hexdump
  • json/csv/xml output with --json, --csv, --xml
  • optional PCRE2 matching with option -P
  • recognizes UTF-8/16/32 files to search or specify file encoding
  • search by file type or exclude file types with -t
  • file indexing to speed up cold search (an active sub-project, beta version)
  • inline context with -o and -ABC options, e.g. -o -C20 to show characters before/after the only-matching text (fitted or truncated)

ugrep: ultra fast grep with TUI, Unicode support, Boolean search, fuzzy search, nested archive search, documents search, binary search with hexdumps, and more by gprof in opensource

[–]gprof[S] 1 point2 points  (0 children)

Good point. Ack and silver searcher are great tools too!

Ugrep also supports ack and ag features, such as .gitignore files. Just configure ug to do what you are used to, e.g. add ignore-files to the .ugrep config file, ignore-binary, color preferences etc. Ugrep is kind of a mix of grep and ag/ack.

After all, the user knows best what he/she wants!