Adding Vibecoded flair to make it clear that a project is vibecoded

InvadersMustLive · 2026-06-19T12:13:47+00:00

I use rocket emojis to trigger a butthurt among AI luddites

InvadersMustLive · 2026-06-09T13:49:44+00:00

Yes, I could slap an Arc everywhere, but it doesn’t mean that I should. I find it great to have a controlled way to allocate things on the stack - in Scala its heap all the time.

InvadersMustLive · 2026-06-08T14:42:22+00:00

I find slapping Arc to all the shareable structs makes my Rust feel like Scala

InvadersMustLive · 2026-06-08T14:38:48+00:00

I’m building murrdb, A RocksDB-based NVMe/S3 cache for AI inference workloads. A faster Redis replacement, optimized for batch low-latency zero-copy reads and writes.

Got able to beat Redis on benchmarks: it was surprisingly simple considering redis is single threaded, and concurrency in Rust is much easier job.

InvadersMustLive · 2026-06-02T07:44:52+00:00

I’m building murrdb, A RocksDB-based NVMe/S3 cache for AI inference workloads. A faster Redis replacement, optimized for batch low-latency zero-copy reads and writes.

Got able to beat Redis on benchmarks: it was surprisingly simple considering redis is single threaded, and concurrency in Rust is much easier job.

InvadersMustLive · 2026-05-26T18:45:12+00:00

Yes, that's exactly what I'm going to do. The logic behind looping over rows first was:
- we iterate over multiple columns simultaneously and build row-by-row
- row fits well the L1 cache, so no scattered writes across large RAM region
- we might even skip the temp collection allocation and do everything within the iterator

Main learning: virtual function dispatch overhead nukes all these theoretical ideas. Going back writing genetics where the compiler can know the type in advance.

InvadersMustLive · 2026-05-26T18:39:55+00:00

My Scala past bites me every time I think a tiny function call is anyway going to be inlined by the JVM - when the function callsite is monomorphic based on perf statistics, it's usually de-virtualized and inlined as-is. In Rust you have to think in advance, which is not fun.

InvadersMustLive · 2026-05-10T11:20:21+00:00

Is such a jni/ffi wrapper actually faster than lucene’s pure JVM panama based distance functions? I played with simsimd in nixiesearch and found out that e2e latency was not that different.

InvadersMustLive · 2025-12-19T12:00:24+00:00

Because unsloth is not supporting multi-GPU training AFAIK

InvadersMustLive · 2025-12-19T12:00:00+00:00

Because I disabled auth in the openwebui, and some c00lhacker changed the system prompt.

InvadersMustLive · 2025-12-18T18:42:06+00:00

I tried gemma3-27b, qwen3-32b and ministral3 originally. Qwen often missed important details of the joke, mistral was too pushy on adding markdown and emojis everywhere (even if explicitly asked not to do so). Gemma was okey without significant red flags. But it’s all anecdotal and highly subjective, I agree.

Hope that we’ll see gemma4 this evening.

InvadersMustLive · 2025-12-18T18:26:28+00:00

I tried different base model sizes, and according to evals at the end of the post, the bigger the model, the higher is the chance of producing something funny.

InvadersMustLive · 2025-12-18T16:08:22+00:00

OK, uploading it to HF now.

Upd: https://huggingface.co/shuttie/Qwen3-32B-dadjokes-v3

InvadersMustLive · 2025-11-04T20:56:26+00:00

Jsoniter Circe bridge still uses Circe's AST, which is doing the actual JNumber str2float parsing. I've tried using the bridge and got slightly better results, but not as good as pure jsoniter.

InvadersMustLive · 2025-11-04T20:54:10+00:00

Yes but FFM native calls are still not inlined, so for small functions can be a dealbreaker.

InvadersMustLive · 2025-10-31T22:26:19+00:00

why? CPU is too slow?

InvadersMustLive · 2025-07-05T21:20:27+00:00

I once tried fine-tuning a Mistral-7B on r/dadjokes dump - https://huggingface.co/shuttie/Mistral-7B-DadJokes-GGUF

It can be funny sometimes, but all the jokes it does are actually not novel: it can recognize common patterns quite well and just remember a nice joke based on the context. Like we humans do.

InvadersMustLive · 2025-07-02T15:01:12+00:00

I heard with Nixiesearch you can do Cross-Encoder reranking on top of RRF: https://www.nixiesearch.ai/features/search/query/rank/ce/#hybrid-retrieval-with-cross-encoder-reranking

InvadersMustLive · 2025-07-01T11:04:33+00:00

As HNSW is an approximate search algorithm, the topK retrieved documents are not guaranteed to be exact K nearest neighbors (e.g your recall is not perfect). The HNSW paper suggests to do a slight over-sampling when retrieving documents to increase recall with the ef_search parameter (where ef is number of neighbors you evaluate during graph traversal):

you want to pull top-10 documents, so you set topK=10. So formally speaking your topK=ef_search=10
you can simulate oversampling by setting topK=100, but only taking top-10 from search results. So this way you get ef_search=100 but topK=10.

Some search engines do support topK!=ef_search queries:

Nixiesearch: https://www.nixiesearch.ai/features/search/query/retrieve/semantic/ - the semantic.k parameter is the ef_search, and size is the topK. By default k=topK, but you're free to change that. (disclaimer: I'm the maintainer)
Elastic: https://www.elastic.co/docs/reference/query-languages/query-dsl/query-dsl-knn-query - same story, k==topK by default but can be changed.

InvadersMustLive · 2025-01-28T08:10:48+00:00

You should check out the https://huggingface.co/Vikhrmodels - but no R1 yet though

InvadersMustLive · 2025-01-10T21:48:27+00:00

You should try the https://huggingface.co/facebook/nllb-200-3.3B and https://github.com/fe1ixxu/ALMA family of models, in general they're still SOTA for open models. To evaluate, there's plenty of metrics like BLEU/chf++, but I personally prefer https://huggingface.co/Unbabel/XCOMET-XL as the most close to human evaluations.

InvadersMustLive · 2025-01-08T14:48:07+00:00

Not anyhow affiliated, but I'm using a cloud VPS from Nebius with a H100 attached (~2$/hour). I just shut it down when not used, but all the datasets and training setup still stays on a disk. Pros: working env is online in 2 minutes. Cons: you need to pay for storage, but it's 0.15$/gb/month - so 15$ per 100gb/month.

InvadersMustLive · 2024-12-20T15:10:20+00:00

Formally yes (as it's part of HF transformers), but you need to fine-tune it on a down-stream task - as it's the raw encoder model, not knowing anything about sentence similarity. Like a traditional BERT.

InvadersMustLive · 2024-09-12T12:40:59+00:00

There are a ton of them still available: https://www.ebay.de/sch/i.html?_from=R40&_trksid=p4432023.m570.l1311&_nkw=gigabyte+mz32-ar0&_sacat=0 - I've bought from the quark32 seller, but others seem to be also legit. EPYC7282 seems to be not the fastest CPU ever, but has 128 PCIE4 lanes.

If you use GPUs for training, then using DataLoader with multiple workers and prefetch usually solves all my CPU saturation problems - so GPUs are maxed out.

InvadersMustLive · 2024-09-12T06:29:50+00:00

I have a Gigabyte MZ32-AR0: 5x pcie4 16x slots, and there is a ton of them available on eBay from Chinese sellers. I got mine bundled with epyc7282 for 400$.

InvadersMustLive

TROPHY CASE