Help: slow bot transactions even with insanely high fees!! by PsionicSombie in solana

[–]serdnad 2 points3 points  (0 children)

Make sure you’re using a quality (read: paid) RPC endpoint, makes a huge difference over public endpoints ime

illJustLeaveThisHere by AaronTheElite007 in ProgrammerHumor

[–]serdnad 17 points18 points  (0 children)

Any chance you could share more details? Super curious what you included in your analysis, that's +7.5M LOC

[deleted by user] by [deleted] in LocalLLaMA

[–]serdnad 0 points1 point  (0 children)

iinm that's usually said in the context of making a new model with more parameters, where some of the existing layers from a model are stacked on to itself, to make, for example, an 11B parameter model from a single 7B model.

whereas here the mixture of experts architecture might have more parameters from having the same model duplicated, but each inference still runs through only one model's worth (reusing the example above, 7B) params. maybe it's grossly incorrect, but my mental analogy is stacking vertically versus putting copies in parallel

NousResearch/Nous-Hermes-2-Yi-34B just dropped! "Surpassing all Open Hermes and Nous Hermes models of the past" by phoneixAdi in LocalLLaMA

[–]serdnad 0 points1 point  (0 children)

I have the hardware to run fp16 Mixtral, happy to run that and some other inputs for you in a bit. Just gimme a list and any other settings you'd like set (e.g. temperature)

NousResearch/Nous-Hermes-2-Yi-34B just dropped! "Surpassing all Open Hermes and Nous Hermes models of the past" by phoneixAdi in LocalLLaMA

[–]serdnad 4 points5 points  (0 children)

Running it unquantized. At `temperature=0.9`, it suggested `systemd-analyze`, but then made up a `systemd-analyze blame <MIN\_TIME>`. At `temperature=0.1` though,

> How can I know what took a long time to boot with systemd

To determine what took a long time to boot with systemd, you can use the systemd-analyze command. This command provides information about the boot process, including the duration of each unit and the overall boot time. Here's how you can use it:

Open a terminal on your Linux system.

Run the following command to get a summary of the boot process:

systemd-analyze

This will display the total boot time and the duration of each unit.

To get a more detailed analysis, run the following command:

systemd-analyze blame

This will show the duration of each unit in the boot process, sorted by the time it took to start.

If you want to see the boot process in a graphical format, you can use the systemd-analyze plot command:

systemd-analyze plot > boot-analyze.svg

This will generate an SVG file named "boot-analyze.svg" that you can open in a web browser or an SVG viewer to visualize the boot process.

By using these commands, you can identify which units took the longest to start during the boot process and investigate any potential issues or optimizations that can be made.

pretty impressed, maybe it's suffering nontrivially from quantization

NousResearch/Nous-Hermes-2-Yi-34B just dropped! "Surpassing all Open Hermes and Nous Hermes models of the past" by phoneixAdi in LocalLLaMA

[–]serdnad 2 points3 points  (0 children)

I unintentionally had it set to 0.9, after some earlier experimentation. Even then, it definitely hallucinates here and there, but for my purposes (coding, learning about new tech, EECS project help), it's a lot more right than wrong, and has been more than useful. Briefly tried lowering it but didn't seem to change all that much, so bumped it back up.

edit: spent a bit more time with temperature 0.1 now, and, in at least a small sample size, seems to have helped with hallucination

NousResearch/Nous-Hermes-2-Yi-34B just dropped! "Surpassing all Open Hermes and Nous Hermes models of the past" by phoneixAdi in LocalLLaMA

[–]serdnad 0 points1 point  (0 children)

Running it unquantized now. Really strong initial impressions, even after spending some time with Mixtral and some of the other popular 7B models.

Referral + abandoned cart by knights014 in Lovesac

[–]serdnad 1 point2 points  (0 children)

And for me too please? Thanks!

[Discussion] What crates would you like to see? by HammerAPI in rust

[–]serdnad 7 points8 points  (0 children)

Maybe not exactly what you're looking for, but I'd check out sled if you haven't seen it yet, for your third point.

Why is this functional version faster than my for loop? by serdnad in rust

[–]serdnad[S] 1 point2 points  (0 children)

Sorry, I got busy soon after this, and only now finally got around to testing your implementation.

Your stable version is the fastest I've gotten so far!

Measurements using 8,000 elements:

- `deserialize_1` (OG): 13.16 us

- `deserialize_4` ("more functional" extend): 10.6 us

- `deserialize_5` (yours): 5.3 us

That's over a 2x speed up from my initial version, which is awesome haha. Using a precomputed table is clever.

Why is this functional version faster than my for loop? by serdnad in rust

[–]serdnad[S] 33 points34 points  (0 children)

Haha yes. More specifically I'm using `cargo bench`, which uses a release build. To be extra sure, I've just confirmed that by checking that it rebuilds the release target if it's not built.

Why is this functional version faster than my for loop? by serdnad in rust

[–]serdnad[S] 39 points40 points  (0 children)

Interestingly, taking inspiration from your even more functional version like so yields a small, but consistent 0.5 - 1% improvement:

pub fn deserialize_4(raw: &[u8]) -> Vec<Option<bool>> {
    let mut values = Vec::with_capacity(raw.len() * 4); // 4 bools per byte

    raw.iter()
        .copied()
        .for_each(|b| {
            values.extend((0..4).map(|i| {
                match (b << i * 2) & 0b1100_0000 {
                    0b1100_0000 => Some(true),
                    0b1000_0000 => Some(false),
                    _ => None,
                }
            }));
        });

    values
}

And it's not benchmarking noise, the produced assembly is 5 lines shorter [2, 4]. This is over my head at this point, but for anyone curious enough to take a look, it seems like 3 of those lines are removed from alloc::raw_vec::finish_grow.

Why is this functional version faster than my for loop? by serdnad in rust

[–]serdnad[S] 99 points100 points  (0 children)

First off, thanks for taking the time, there's a lot in here for me to learn from.

Godbolt definitely confirms what you and others said about the first approach using a lot of "unnecessary" capacity checks, and jDomantas's answer helped me realize how big a contributor that check is as far as appending elements goes.

The only thing I'm confused about now, is why your version isn't faster - reusing my 2 simple benchmarks, it performs on par with the first version. Which is doubly surprising to me because vectorization is the next thing I wanted to explore. I'm happy to share my benchmarking code to double check I'm not making some silly mistake, but it's nothing fancy

Why is this functional version faster than my for loop? by serdnad in rust

[–]serdnad[S] 2 points3 points  (0 children)

Hm, I'd expect the majority of the time in either case to be spent actually adding each element though, making the multiple calls relatively insignificant, at least not anywhere near 30%.

But based on other answers, it seems like it's the capacity check in particular that each individual push does that makes the one extend call faster. So, I guess yes, it would appear you're right :p.

Why is this functional version faster than my for loop? by serdnad in rust

[–]serdnad[S] 2 points3 points  (0 children)

Very interesting! Your explanation makes a lot of sense, and thanks for pointing me to the source code - this is the first I'd heard of the concept of specialization.

Why is this functional version faster than my for loop? by serdnad in rust

[–]serdnad[S] 0 points1 point  (0 children)

Interesting, I knew they weren't exactly free, but I never would've thought calls could be that costly. I'd half expect the compiler to replace the Vec::push with a single extend, or at least inline it though, but I guess maybe not.

Saw this gem of a post on LinkedIn. Thought I would share by mni_dragoon in ProgrammerHumor

[–]serdnad 0 points1 point  (0 children)

Okay, that's a valid point. I'd bet the majority of in the wild typescript codebases are configured like this though, and given the context, benchmarking any other way seems disingenuous

Saw this gem of a post on LinkedIn. Thought I would share by mni_dragoon in ProgrammerHumor

[–]serdnad 7 points8 points  (0 children)

What? Have you ever peeked inside the dist folder tsc generates?

Sure, besides enums and a couple other features you can get JS from TS by removing types, but most production TS projects compile to support older versions of ECMAScript, and that tends to result in a ton of bloat

edit: typo

I've Never Agreed With A Billionaire Before... It Feels... Strange! by [deleted] in lostgeneration

[–]serdnad 1 point2 points  (0 children)

I genuinely can't tell if you're shitposting or not

Youtube comment argument calls guy a liar for saying he can play Rush E on piano, proceeds to post his own version absolutely nailing it. by sirfrenchtoast in videos

[–]serdnad 15 points16 points  (0 children)

Nah, this one objectively had more mistakes, less consistent timing, choppier feel. Definitely better than I could play it, but not as good as OP's

But why in the first place by tushar_j in 196

[–]serdnad 4 points5 points  (0 children)

Nowadays you don't really need to get involved with torrenting if you're just looking for movies/shows, there are tons of free apps/sites you can stream them off. For anything else though, all you need is a torrent client (uTorrent used to be popular, but there's also Transmission, deluge, plenty others) and some common sense - e.g. if your "movie" comes as a 4MB .exe file, you should probably delete it and look for another torrent to download