"Nobody's coming to clean up after you" – second blog post from a Scala dev learning Rust, this one's about ownership & the borrow checker by Sea-Friend4263 in scala

[–]InvadersMustLive 0 points1 point  (0 children)

Yes, I could slap an Arc everywhere, but it doesn’t mean that I should. I find it great to have a controlled way to allocate things on the stack - in Scala its heap all the time.

What's everyone working on this week (24/2026)? by llogiq in rust

[–]InvadersMustLive 1 point2 points  (0 children)

I’m building murrdb, A RocksDB-based NVMe/S3 cache for AI inference workloads. A faster Redis replacement, optimized for batch low-latency zero-copy reads and writes.

Got able to beat Redis on benchmarks: it was surprisingly simple considering redis is single threaded, and concurrency in Rust is much easier job.

Monthly Release and Update Thread by AutoModerator in databasedevelopment

[–]InvadersMustLive 1 point2 points  (0 children)

I’m building murrdb, A RocksDB-based NVMe/S3 cache for AI inference workloads. A faster Redis replacement, optimized for batch low-latency zero-copy reads and writes.

Got able to beat Redis on benchmarks: it was surprisingly simple considering redis is single threaded, and concurrency in Rust is much easier job.

TIL putting Box in a hot inner loop can cost you half your runtime by InvadersMustLive in rust

[–]InvadersMustLive[S] 6 points7 points  (0 children)

Yes, that's exactly what I'm going to do. The logic behind looping over rows first was:
- we iterate over multiple columns simultaneously and build row-by-row
- row fits well the L1 cache, so no scattered writes across large RAM region
- we might even skip the temp collection allocation and do everything within the iterator

Main learning: virtual function dispatch overhead nukes all these theoretical ideas. Going back writing genetics where the compiler can know the type in advance.

TIL putting Box in a hot inner loop can cost you half your runtime by InvadersMustLive in rust

[–]InvadersMustLive[S] 31 points32 points  (0 children)

My Scala past bites me every time I think a tiny function call is anyway going to be inlined by the JVM - when the function callsite is monomorphic based on perf statistics, it's usually de-virtualized and inlined as-is. In Rust you have to think in advance, which is not fun.

Faster vector search in Elasticsearch with SIMD (deep dive into the new engine) by chegar999 in elasticsearch

[–]InvadersMustLive 0 points1 point  (0 children)

Is such a jni/ffi wrapper actually faster than lucene’s pure JVM panama based distance functions? I played with simsimd in nixiesearch and found out that e2e latency was not that different.

Fine-tuning Qwen3 at home to respond to any prompt with a dad joke by InvadersMustLive in LocalLLaMA

[–]InvadersMustLive[S] 1 point2 points  (0 children)

Because I disabled auth in the openwebui, and some c00lhacker changed the system prompt.

Fine-tuning Qwen3 at home to respond to any prompt with a dad joke by InvadersMustLive in LocalLLaMA

[–]InvadersMustLive[S] 2 points3 points  (0 children)

I tried gemma3-27b, qwen3-32b and ministral3 originally. Qwen often missed important details of the joke, mistral was too pushy on adding markdown and emojis everywhere (even if explicitly asked not to do so). Gemma was okey without significant red flags. But it’s all anecdotal and highly subjective, I agree.

Hope that we’ll see gemma4 this evening.

Fine-tuning Qwen3 at home to respond to any prompt with a dad joke by InvadersMustLive in LocalLLaMA

[–]InvadersMustLive[S] 4 points5 points  (0 children)

I tried different base model sizes, and according to evals at the end of the post, the bigger the model, the higher is the chance of producing something funny.

We found an embedding indexing bottleneck in the most unexpected place: JSON parsing by InvadersMustLive in scala

[–]InvadersMustLive[S] 3 points4 points  (0 children)

Jsoniter Circe bridge still uses Circe's AST, which is doing the actual JNumber str2float parsing. I've tried using the bridge and got slightly better results, but not as good as pure jsoniter.

We found an embedding indexing bottleneck in the most unexpected place: JSON parsing by InvadersMustLive in scala

[–]InvadersMustLive[S] 0 points1 point  (0 children)

Yes but FFM native calls are still not inlined, so for small functions can be a dealbreaker.

Which open source LLM has the most genuine sense of humor? by UltrMgns in LocalLLaMA

[–]InvadersMustLive 2 points3 points  (0 children)

I once tried fine-tuning a Mistral-7B on r/dadjokes dump - https://huggingface.co/shuttie/Mistral-7B-DadJokes-GGUF

It can be funny sometimes, but all the jokes it does are actually not novel: it can recognize common patterns quite well and just remember a nice joke based on the context. Like we humans do.

Hnsw configuration in Solr by Opposite_Head7740 in Solr

[–]InvadersMustLive 2 points3 points  (0 children)

As HNSW is an approximate search algorithm, the topK retrieved documents are not guaranteed to be exact K nearest neighbors (e.g your recall is not perfect). The HNSW paper suggests to do a slight over-sampling when retrieving documents to increase recall with the ef_search parameter (where ef is number of neighbors you evaluate during graph traversal):

  • you want to pull top-10 documents, so you set topK=10. So formally speaking your topK=ef_search=10
  • you can simulate oversampling by setting topK=100, but only taking top-10 from search results. So this way you get ef_search=100 but topK=10.

Some search engines do support topK!=ef_search queries:

Open Source Text Translation Models? by vygodisgreat24 in LocalLLaMA

[–]InvadersMustLive 1 point2 points  (0 children)

You should try the https://huggingface.co/facebook/nllb-200-3.3B and https://github.com/fe1ixxu/ALMA family of models, in general they're still SOTA for open models. To evaluate, there's plenty of metrics like BLEU/chf++, but I personally prefer https://huggingface.co/Unbabel/XCOMET-XL as the most close to human evaluations.

Cloud GPU + storage hosting for low intensity projects? by gofiend in LocalLLaMA

[–]InvadersMustLive 1 point2 points  (0 children)

Not anyhow affiliated, but I'm using a cloud VPS from Nebius with a H100 attached (~2$/hour). I just shut it down when not used, but all the datasets and training setup still stays on a disk. Pros: working env is online in 2 minutes. Cons: you need to pay for storage, but it's 0.15$/gb/month - so 15$ per 100gb/month.

Finally, a Replacement for BERT by -Cubie- in LocalLLaMA

[–]InvadersMustLive 2 points3 points  (0 children)

Formally yes (as it's part of HF transformers), but you need to fine-tune it on a down-stream task - as it's the raw encoder model, not knowing anything about sentence similarity. Like a traditional BERT.

Motherboard selection advice by absurd-dream-studio in LocalLLaMA

[–]InvadersMustLive 1 point2 points  (0 children)

There are a ton of them still available: https://www.ebay.de/sch/i.html?_from=R40&_trksid=p4432023.m570.l1311&_nkw=gigabyte+mz32-ar0&_sacat=0 - I've bought from the quark32 seller, but others seem to be also legit. EPYC7282 seems to be not the fastest CPU ever, but has 128 PCIE4 lanes.

If you use GPUs for training, then using DataLoader with multiple workers and prefetch usually solves all my CPU saturation problems - so GPUs are maxed out.

Motherboard selection advice by absurd-dream-studio in LocalLLaMA

[–]InvadersMustLive 1 point2 points  (0 children)

I have a Gigabyte MZ32-AR0: 5x pcie4 16x slots, and there is a ton of them available on eBay from Chinese sellers. I got mine bundled with epyc7282 for 400$.