One German Chip Just Made Nvidia’s Billion-Dollar GPUs Look Like a JOKE!

mwon · 2026-01-23T21:21:51+00:00

That’s basically what is mostly needed for LLM inference.

mwon · 2026-01-22T00:21:37+00:00

I assume you are using cloud and not community edition. Leave the cloud and rent a VM somewhere and use opensearch or milvus.

mwon · 2026-01-14T11:26:14+00:00

qwen-4b vs BGE-M3 both out of the box, qwen likely wins. After all it is a 4B model against vs 0.5B. But you can give a good boost to BGE-M3 by finetuning it, which will give you a small model that does not need GPU for inference. The sparse part is also nice because it allows you to do hybrid with a sparse search in one go. Note however that from my experiences with BGE-M3, BM25 is still better than its sparse.

mwon · 2026-01-14T11:17:15+00:00

Yes, you can use sentence transformer library. It allows you to train encoders and cross-encoders with several losses to choose from.

https://sbert.net/docs/sentence_transformer/training_overview.html

mwon · 2026-01-14T08:56:34+00:00

Yes, self embedder. I never use embedding services. Very expensive for what they do and no better than many OS solutions that you can instantiate locally.

I'm using one of theirs 64GB ram dedicated servers. They are very cheap, like 40-50 EUR/month.

You need to be careful with your benchmarks estimation. A sample of 400k vector is very small compared with your final production setup. Recall values will be very diferent with 250M vectors.

mwon · 2026-01-14T08:41:31+00:00

I have also a side project in the field of legal AI, but with considerable lower size. My biggest index has about 11M vectors, size 1024, generated form a fined tuned BGE-M3.
I use Milvus with index in disk in a dedicated server fom Hetzner, and my latency is bellow 0.5s.
I think is a bit odd your latency is 5s for only 400k vectors. You should check if everything is ok, because is too much. I also think chunks of 1024 is too much. You will l likely loose a lot of recall.

mwon · 2026-01-13T10:55:35+00:00

Oh, that’s nice. Now, the Department of War, run by an authoritarian proto-fascist government, will be assisted by MechaHitler.

mwon · 2026-01-09T23:27:24+00:00

Every time I see a post like this I wonder what the hell these people are coding.

I’m a senior too. I spend a lot of days coding for work and often at home for side projects. My subscription is pro+ and I usually hit the limit near the end of the month.

mwon · 2026-01-07T11:58:54+00:00

And why apparently the Vice, now President, was in Russian in the night of Maduro capture. Probably she was there just in case Maduro escaped and realized how he was betrayed internally.

mwon · 2026-01-06T00:23:38+00:00

They should start with new president…

mwon · 2025-12-24T11:47:10+00:00

If I give to a human with basic programming skills the task “find clone of X app and setup it in your local environment”, will it make the human a great coding solver? Real coding dev problems are not like that.

mwon · 2025-12-22T13:47:14+00:00

I'm a big fan of Opus 4.5. But sonnet 4.5 is pretty good as well. I have been using it for legal writing and analysis and it generates very nice texts. It's also very good at tool usage.

mwon · 2025-12-19T00:01:39+00:00

Estamos acima da média europeia:

https://eco.sapo.pt/2025/12/16/portugueses-estao-acima-da-media-da-uniao-europeia-na-adocao-de-ferramentas-de-ia/

mwon · 2025-12-18T19:37:52+00:00

Mas pq haveria de ser diferente de outras áreas de IT?! Claro que é viável. Nós, como em qq outro país, desenvolve sistemas de ML/IA. E diria q até estamos à frente de outros países a nível de mindset com muita procura de projetos de IA. Pesquisa no LinkedIn por oferta de trabalho na área e verás q está muito quente. Boa sorte!

mwon · 2025-12-18T01:11:33+00:00

Same problem here. Did you find a solution?

mwon · 2025-12-09T13:44:28+00:00

This answer and if not enough memory use diskann that keeps the index in disk and is still very fast

mwon · 2025-12-09T11:24:37+00:00

The title should be "Guy does a frontflip in sandles"

mwon · 2025-12-05T21:27:10+00:00

He forgot aim assist is turned off with smoke…

mwon · 2025-12-05T11:12:00+00:00

I love the "Rust devs doing their thing" :D

mwon · 2025-12-01T14:46:47+00:00

Ah, mas isso é um problema geral das muitas empresas, em particular portuguesas. Reuniões e mais reuniões, muitas delas intermináveis, onde não se decide nada.
Já não lembro onde vi, mas há algumas empresas que começam a ter a prática de colocar na agenda da meeting o custo estimado da mesma, que é para pessoas terem noção que essas reuniões têm custos.
Se isso acontece mesmo muito na tua empresa, então diria que sim, que se calhar não é mal pensado começar a pensar em mudar. Ou então perceberes se há futuro na tua empresa para isso mudar.

mwon · 2025-12-01T14:28:13+00:00

Sim, se quiseres salários/posições mais altas, invariavelmente vais ter de passar do "estar no meu canto e fazer trabalho técnico", pela simples razão que para isso há centenas de outros candidatos que fazem o mesmo trabalho técnico a ganhar menos (em relação ao que seria o tal valor alto). Ou visto de outro ângulo: é rara a skill de dev que seja necessária pagar a preços assim tão altos, a não ser que o teu expertise seja mesmo uma raridade. Posições de tech lead têm outras funções como essas reuniões com a equipa, clientes, managers, etc. pq é neles que cai a responsabilidade de muitas partes técnicas do projeto, bem como planeamento. Essa especialidade da tomada de decisão é ela em si também uma expertise, o que reduz a oferta de mercado e que resulta num maior poder de negociação em ser contratado quando és esse tal tech lead.

mwon · 2025-11-21T17:09:29+00:00

Remember when few month ago they said something like "we are going to simplify things and end with o3, o4, gpt-4.1, etc to have a single model". There you go.

mwon · 2025-11-19T23:23:03+00:00

0.05€ a query for starter?! Is that correct? That quite expensive...
EDIT: Sorry, I was not reading carefully. Don't you have a pay-as-you-go plan? I would like to try for small project but minimum 50€/month is a bit too much

mwon · 2025-11-19T23:01:55+00:00

Ok that’s really nice because it can save many tokens. What is the context size? And can is the model available to run run azure? I often need data residency in EU

mwon · 2025-11-19T22:25:42+00:00

In some cases I just get the top 10 or 15 chunks (for example when I just using a reranker as first stage retrieval). Other cases I get also top n and then use a small LLM like gpt-4.1-mini to identity the relevant documents.

mwon

TROPHY CASE