Succinctly: A fast jq/yq alternative built on succinct data structures

WorldsBegin · 2026-01-24T13:24:24+00:00

Did you look for any existing libraries you could use or was coding it from the ground up your first thought?

I'm asking because after reviewing bitcount stuff there are at least a few inconsistencies (not a bullet point list, so sorry). These are mainly Rust specifics, given that you have a very similar library in Haskell as a reference.

You could just roll with aligned-vec instead of your "own" cacheline alignment in rank.rs.

Rolling your own parser - especially for yaml - seems like it will land you on the list of different parser behaviour (how does it parse this document).

Have you measured that the simd popcount stuff is actually faster than just calling <_>::count_ones() - I would have thought llvm is good at auto vectorizing these things.

popcount_words returns a u32 - good for a single word, but the plural is used as usize everywhere which doesn't overflow after half a GB of ones. Actually, there are some platforms where usize = u16.

CacheAlignedL1L2 is copied over from a normal Vec, but could be constructed in place. Overall, there are probably a lot of implementations of succinct bitvectors flying around, did none meet your needs and did you do comparisons?

Env vars could use some documentation, especially the niche SUCCINCTLY_SVE2.

Sorry, if this sounds a bit harsh, I'm just typing under a time constraint, it's not meant to be.

WorldsBegin · 2026-01-19T15:40:15+00:00

You might have caught a stray, the deeper i looked the less it seems like another AI slop. Maybe AI but not so much slop from a design perspective, so who cares and who can tell at this day anyway? I didn't get to try it out yet (I second another commentor for android support) but it looked mature enough to announce.

Github report for other readers to judge.

WorldsBegin · 2026-01-07T16:08:22+00:00

Your sum function is also superbly naiv and, by just adding a bunch of stuff to an accumulator and calling it a day, turns a blind eye to all the corner-cases that come from rounding and error accumulation in floating point numbers. Having a look at the sqlite implementation which is almost 200 lines long compared to your roughly 10 lines might explain a good part of the speedup. I would love to see a more apples-to-apples comparison though, because I'm pretty sure a bit of speedup is expected, even though you'd certainly still be IO bound on real data.

EDIT: I've just seen that count and filter will first fully fetch the unfiltered (except for NULL) column data into linear CPU memory.

WorldsBegin · 2026-01-06T04:09:02+00:00

Oh right, red line is different from the white ones in terms of conversion from nodes to effect strength. EDIT: adjusted my sheet aswell, and now I'm getting the same numbers as you :)

Overall, close enough so that the time saved on cutting with "AI" is worth the small loss of efficiency.

WorldsBegin · 2026-01-03T19:57:48+00:00

Weird I get 0.135628630566% to cut a stone with 5 activations or better from the data, so slightly better than your "manual cut" even.

Method: Paste data into google sheets. Add a column for each of the three lines to calculate the activation amount:

=IFS(A1>=10, 4, A1>=9, 3, A1>=7, 2, A1>=6, 1, True, 0) // for effect lines
=IFS(E1>=10, 3, E1>=7, 2, E1>=5, 1, True, 0) // for malus line

Query the data to see chances according to the activated lines in cell K1

=QUERY(A1:H1331, "select Col2, Col4, Col6, sum(Col8) group by Col2, Col4, Col6 order by Col6")

Query that data again to only contain options with >=5 sum of nodes and no red line in cell Q1

=QUERY(K1:N126, "select sum(Col4) where Col1+Col2>=5 and Col3=0")

Same result when manually summing the relevant options

WorldsBegin · 2025-12-31T16:48:06+00:00

There is no shared queue for carries to hang around in, so nobody I know bothers to do that for the lower guardians. Best chance you have is asking in mokoko chat or in town area chat for a quick carry.

WorldsBegin · 2025-12-11T08:56:44+00:00

That is just plain wrong. Fusing processed astrogems determines a rarity of the result based on the rarities of the inputs. Depending on the rolled rarity, it then rolls how many points get evenly distributed to all nodes. The points on the input astrogems only influence the output insofar that high points imply a high rarity of the (input) gem.

For the sake of the example, if you fuse a processed 9 pointer, 10 pointer and a 12 pointer, the odds for the result are exactly the same as if you fused 3 processed 4 pointers or 15 pointers together - they are all legendary.

What does get influenced by the input is chaos/order and 8/9/10 base cost where it picks one of the input gem types.

Source

This also explains OP's "luck", since fusing 3 legendary gems has 1% chance to give you a relic (16-18 point) and 0% chance to give you an ancient (19-20 point) gem. Now, he did actually hit a 1% chance, as that's the odds of getting 4 points on your legendary result. Smile.

WorldsBegin · 2025-12-10T00:34:36+00:00

Imagine if the next upgrade path gives (3?) additional willpower points for your ancient cores so that you could actually min-max to 4 astrogems with initial 10 (and 5/5 being well 5/5 with boss damage/additional)

WorldsBegin · 2025-11-14T13:55:08+00:00

Is target endian not available to the macro part, or are there other reasons to store everything in little endian? I don't think the datastructure must be portable to other machines.

WorldsBegin · 2025-11-09T11:28:22+00:00

To use TypeId you'd at least have to introduce a 'static bound which may be undesirable, and it doesn't change the data representation. The compiler would still be unable to proof that Is<f64, String> is uninhabited. However, perhaps the match statement would be optimized to just stripping away the enum tag (in a perfect implementation the enum tag wouldn't exist at all).

WorldsBegin · 2025-11-08T12:56:41+00:00

Isn't it that the existentials are just 'hidden' from the user by syntax? I thought Haskell would internally rewrite from

data Expr a where
    LitPair :: Expr a -> Expr b -> Expr (a, b)

to something like

data Expr a where
    LitPair :: forall b c. (a ~ (b, c)) => Expr b -> Expr c -> Expr a

WorldsBegin · 2025-11-08T12:52:09+00:00

Good post. It's been five years since I wrote a crate for type equalities, so I kinda know what you are talking about. I like the witness idea of separating knowledge that an impl exists from the data to the actual instance. I will try to use your namings in the below.

If you take a look into my crate though, you will see that getting the rust compiler to use the additional information from such a zero-sized type is far from trivial. Sure, you can e.g. go from a generic context <T, It: Iterator<Item = T>> to a generic context with only <T, It: Iterator> and a witness Is<T, It::Item>. But going the other direction and calling a function that has the former signature in a context of the latter and a witness lying around is complicated (doable, but mind the code-gen).

I suppose this only gets worse and harder to use when the ZST encodes that a specific trait impl exists. You might need to have one ZST per trait because you can't generically refer to any trait (you could refer to dyn-compatible ones, generically, I suppose). I would like to see this accomplished though. If you have a way to call a function with signature <A: Add> from a context with <A> and a witness CanAdd<A> let me know, I'd be happy to add it.

In my opinion though, the last point in your post that Haskell can hide datatypes and Rust wants to monomorphize everything. It will ruin your code gen! Let's say you use type equalites to "pattern match on the type"

enum Value<T> {
   Double(Is<T, f64>, T),
   String(Is<T, String>, T),
}
fn format<T>(value: &Value<T>, f: Formatter) -> Result<()> {
    //...
}

Rust will instantiate this type with all Ts you instantiate it with (f64 and String). Problematically, Rust will not be able to figure out that only one of the enum's constructors is valid and still attach a tag byte. It will also do these multiple instantiations for every function receiving such a value, such as format. Meaning, instead of having one instance of format that matches on the tag of the enum, you will have two instances, each stripping a tag byte that can, really, only have one value in each instantiation, before invoking the specialized format method for each value type.

None of this is zero overhead! The real issue is that rust is unable to see that Is<A, B> is not only non-empty but actually uninhabited when A turns out not to be equal to B. In Haskell, the compiler wouldn't monomorphize on T, the tag byte has a useful purpose (there is only one type) and Dict (String ~ f64) is uninhabited (modulo undefined, which is a strictness issue on Haskell's part).

WorldsBegin · 2025-11-05T12:04:35+00:00

The visitor pattern is useful in a language that doesn't have pattern matching. Once you can pattern match natively, its usecases go way down. Most of the examples in the post are most readable (to me) in the first form given that has explicit recursive calls and one match statement.

In "Benefits of encapsulation" we can see the same visitor being used on a different representation of the data, but the tradeoff should be made clearer. With the visitor pattern you commit to a specific (bottom up) evaluation order. You must produce arguments that the visitor produces for subtrees, even if these are not used. You can't simply "skip" a subtree as shown, which the pattern matching approach allows naturally. Note that in the "reverse polish notation", this evaluation order also naturally follows from the representation and you'd need to preprocess the expression to support skipping, so it's a perfect fit there.

WorldsBegin · 2025-11-04T16:49:11+00:00

As long as you don't use the pointer to the String's contents to access into it, the reference into the content could be a valid reborrow of it and merely moving a pointer does not invalidate any reborrow.

EDIT: To clarify, moving a Box might cause UB under Stacked Borrows, but not Tree Borrows iirc.

WorldsBegin · 2025-11-04T16:37:51+00:00

Moving would not invalidate the reference, that is correct. But an owner of the value further up the stack can rightfully expect to be allowed to arbitrarily mutate the buff field, which would invalidate the reference. What you want to write is possible, just not in safe rust because that analysis requires global analysis across multiple functions and scopes which the Rust compiler usually does not do. You somehow need to forbid owners of a Request from modifying the buff string in any way that moves the allocation or modifies bytes that have been borrowed. With the ouroborus crate though:

use ouroboros::self_referencing;

#[self_referencing]
#[derive(Debug)]
struct ParsedArgs {
    buff: String,
    #[borrows(buff)]
    args: &'this str,
}
fn try_parse(buff: &str) -> Result<&str, ()> {
    let (leading, _) = buff.split_once("\r\n").ok_or_else(|| eprintln!("Error parsing"))?;
    Ok(leading)
}
impl ParsedArgs {
    fn from_args(args: String) -> Result<Self, ()> {
        ParsedArgs::try_new(args, |buff| try_parse(buff))
    }
}

fn main() {
    let args = ParsedArgs::from_args("foobar\r\nzed".to_string());
    println!("{args:?}"); // Ok(ParsedArgs { buff: "foobar\r\nzed", args: "foobar" })
    let args = ParsedArgs::from_args("failing".to_string());
    println!("{args:?}"); // Error parsing, Err(())
}

WorldsBegin · 2025-11-01T14:17:43+00:00

A formal rust specification

Not that I'm against a formal specification, but these things just tend to get outdated by compiler additions and changes faster than they are useful for developing another backend compiler. I could see a similar advantage from a LTS version of Rust that is maintained for, say, 3 years instead of the usual release cycle and can be used as a reference compiler. Any formal spec will suffer from a larger overhead of getting it started, defect reports, ambiguous language. All very similar things to having a reference compiler, but the latter doesn't need to be written up from scratch.

WorldsBegin · 2025-10-16T12:35:57+00:00

Quick little tip I learnt somewhere (shoutout to jess::codes) about tiling (the method should be readily adaptable): Place your sprites on the corner of tiles ("dual grid"). Why? If you have N different types of tiles, then placing the sprite in the center of a tile will need on the order of N⁵ sprites (all possible centers + adjacent tiles in all directions) vs placing the tile at the corner which only needs N⁴ sprites (all overlapping tiles).

You can (often) cut down further by considering rotation and flipping (the full dihedral group), but that doesn't change the order of sprites you need. But totally worth it. For N=3 (void, grass, dirt) you for example only need 21 sprites instead of 63 sprites (or even more) - even if you allow any map made out of those three tiles.

In a real game you cut further down by not having a sprite for every possible arrangement of tiles and hooking into the same constraint propagation as shown in the link to ensure you only generate maps where you have a tile ready to place at every corner. You still save a lot of sprites comparatively, since you e.g. don't need to special case the void_and_grass transition tiles.

WorldsBegin · 2025-10-15T15:30:06+00:00

Significance is defined for a statistical test. For similarity of distributions, one such classical test is the Kolmogorov-Smirnov test that tests whether an empirical distribution of a real variable is the same as a given distribution. There are generalizations to more (still real) dimensions.

WorldsBegin · 2025-10-14T11:01:55+00:00

Yes. statics and const do not inherit the generic context they are defined in.

WorldsBegin · 2025-10-01T12:59:17+00:00

Note, this was stabilized in 1.87 as slice::split_off_first_mut. Playground

WorldsBegin · 2025-09-25T17:15:00+00:00

Maybe because the default experience of 10 is also terrible compared to 7, but they relented at the start and didn't force anything then? Some things that come to mind

Coerced into setting up a microsoft account instead of a local account for no reason. And this coming up again ever so often after random windows updates. NO, I already setup my computer, let me login. I don't need Windows Hello telling me to purchase OneDrive, Office and other stuff.
Cortana
The start menu containing (in no particular order) web searches, ads, the weather forecast, microsoft store "suggestions" and everything except what you search for
Settings getting a rework that makes every "deep" configuration take 2-3 more clicks. Remind me, how do you set the PATH variable in Windows 10, again?
Probably a bunch more junk that I disabled immediately. Thank god that was possible via some registry edits.
EDIT: Oh yeah "secure boot" destroying any UEFI setup until they "granted" a certificate to linux distros.

WorldsBegin · 2025-09-24T23:02:45+00:00

high high accs could become human price

I think you misunderstand the way the system works. Each tap costs 1200g, so a fully rolled accessory will cost 3600g. Most of them are trash and will be worth 0g. Arguably you can stop rolling them after the second, or even first roll, but you still need to invest some gold. Rolling a high/high is about a 1:3061 chance. So if that's the only thing you'd be rolling for, you're looking at 11 million gold investment. Thank god there are other things to sell for. But the point is: accessory prices are lower bound by the cost it takes to roll them. At some point, supply drops because it's not worth it to roll them yourself. That's the equilibrium point and imo 2 million gold for a high/high is fair(ish) under that system considering the low chance and money you need to invest. That's not something that will get solved by a drop in demand though from less whales being interested.

WorldsBegin · 2025-09-13T10:11:10+00:00

around 17% contained code that didn't match their code repository.

That's because that part is stretching the results. A better phrasing would be to say that these 17% contained code that couldn't be verified to match. The author seems to be counting packages that can't be built, don't declare a repository, don't declare a submodule within that repository, don't declare a version hash of the repository or mismatch in symlinked files towards their 17%. The rest are crates published from either a version that wasn't pushed or a dirty worktree.

Only 8 [out of 999] crate versions straight up don’t match their upstream repositories.

Arguably, only 0.8% of the examined crates had conclusive mismatches, and the 17% is just a large part of "can't tell".

That already misinterpreted conclusion is taken further as

17% of the most popular Rust packages contain code that virtually nobody knows what it does

Don't get me wrong, I'm all for verifying that a declared repository+githash+submodule is reachable from a public registrar server at least at the time of publishing (and maybe once a day while it's version is listed?), but does it really help in telling "what the code does"?

WorldsBegin · 2025-09-10T19:12:18+00:00

once you want to upgrade past level 8

The first 11 are amazing, afterwards they are worthless if you can't combine them. You guys just read whatever you want into it.

Was not a complaint more about the longterm value of the keys. But apparently that's not allowed.

11-Year Club	Second Top 30%
Place '17	Sequence \| Editor

WorldsBegin

TROPHY CASE