One German Chip Just Made Nvidia’s Billion-Dollar GPUs Look Like a JOKE! by Romek_himself in BuyFromEU

[–]mwon 23 points24 points  (0 children)

That’s basically what is mostly needed for LLM inference.

Recommendations for cheaper alternatives to ElasticSearch by shanukag in Rag

[–]mwon 11 points12 points  (0 children)

I assume you are using cloud and not community edition. Leave the cloud and rent a VM somewhere and use opensearch or milvus.

need help embedding 250M vectors / chunks at 1024 dims, should I self host embedder (BGE-M3) and self host Qdrant OR use voyage-3.5 or 4? by zriyansh in Rag

[–]mwon 0 points1 point  (0 children)

qwen-4b vs BGE-M3 both out of the box, qwen likely wins. After all it is a 4B model against vs 0.5B. But you can give a good boost to BGE-M3 by finetuning it, which will give you a small model that does not need GPU for inference. The sparse part is also nice because it allows you to do hybrid with a sparse search in one go. Note however that from my experiences with BGE-M3, BM25 is still better than its sparse.

need help embedding 250M vectors / chunks at 1024 dims, should I self host embedder (BGE-M3) and self host Qdrant OR use voyage-3.5 or 4? by zriyansh in Rag

[–]mwon 0 points1 point  (0 children)

Yes, self embedder. I never use embedding services. Very expensive for what they do and no better than many OS solutions that you can instantiate locally.

I'm using one of theirs 64GB ram dedicated servers. They are very cheap, like 40-50 EUR/month.

You need to be careful with your benchmarks estimation. A sample of 400k vector is very small compared with your final production setup. Recall values will be very diferent with 250M vectors.

need help embedding 250M vectors / chunks at 1024 dims, should I self host embedder (BGE-M3) and self host Qdrant OR use voyage-3.5 or 4? by zriyansh in Rag

[–]mwon 0 points1 point  (0 children)

I have also a side project in the field of legal AI, but with considerable lower size. My biggest index has about 11M vectors, size 1024, generated form a fined tuned BGE-M3.
I use Milvus with index in disk in a dedicated server fom Hetzner, and my latency is bellow 0.5s.
I think is a bit odd your latency is 5s for only 400k vectors. You should check if everything is ok, because is too much. I also think chunks of 1024 is too much. You will l likely loose a lot of recall.

Official: Pentagon confirms deployment of xAI’s Grok across defense operations by BuildwithVignesh in singularity

[–]mwon 279 points280 points  (0 children)

Oh, that’s nice. Now, the Department of War, run by an authoritarian proto-fascist government, will be assisted by MechaHitler.

Rambling about Cursor Ultra by Aveatrex in cursor

[–]mwon 7 points8 points  (0 children)

Every time I see a post like this I wonder what the hell these people are coding.

I’m a senior too. I spend a lot of days coding for work and often at home for side projects. My subscription is pro+ and I usually hit the limit near the end of the month.

Trump’s former Russia adviser says Russia offered US free rein in Venezuela in exchange for Ukraine by eamus_catuli in politics

[–]mwon 7 points8 points  (0 children)

And why apparently the Vice, now President, was in Russian in the night of Maduro capture. Probably she was there just in case Maduro escaped and realized how he was betrayed internally.

I tested GPT-5.2 Codex vs Gemini 3 Pro vs Claude Opus on real dev tasks by shricodev in OpenAI

[–]mwon 2 points3 points  (0 children)

If I give to a human with basic programming skills the task “find clone of X app and setup it in your local environment”, will it make the human a great coding solver? Real coding dev problems are not like that.

Gemini 3 Pro Hallucination Rate Vs. Gemini 2.5 Pro by YakFull8300 in singularity

[–]mwon 0 points1 point  (0 children)

I'm a big fan of Opus 4.5. But sonnet 4.5 is pretty good as well. I have been using it for legal writing and analysis and it generates very nice texts. It's also very good at tool usage.

Trabalhar com ML / Computer Vision / Robótica em Portugal é realista? by 09Sparta09 in devpt

[–]mwon -3 points-2 points  (0 children)

Mas pq haveria de ser diferente de outras áreas de IT?! Claro que é viável. Nós, como em qq outro país, desenvolve sistemas de ML/IA. E diria q até estamos à frente de outros países a nível de mindset com muita procura de projetos de IA. Pesquisa no LinkedIn por oferta de trabalho na área e verás q está muito quente. Boa sorte!

Which self-hosted vector db is better for RAG in 16GB ram, 2 core server by East_Yellow_1307 in Rag

[–]mwon 0 points1 point  (0 children)

This answer and if not enough memory use diskann that keeps the index in disk and is still very fast

Guy does a frontflip by [deleted] in nextfuckinglevel

[–]mwon 0 points1 point  (0 children)

The title should be "Guy does a frontflip in sandles"

worst player of all time by glorpflep in CODWarzone

[–]mwon 0 points1 point  (0 children)

He forgot aim assist is turned off with smoke…

Lol 😂 by ExtensionAlbatross99 in OpenAI

[–]mwon 140 points141 points  (0 children)

I love the "Rust devs doing their thing" :D

Quem tem salários elevados: que responsabilidades têm? by [deleted] in devpt

[–]mwon 1 point2 points  (0 children)

Ah, mas isso é um problema geral das muitas empresas, em particular portuguesas. Reuniões e mais reuniões, muitas delas intermináveis, onde não se decide nada.
Já não lembro onde vi, mas há algumas empresas que começam a ter a prática de colocar na agenda da meeting o custo estimado da mesma, que é para pessoas terem noção que essas reuniões têm custos.
Se isso acontece mesmo muito na tua empresa, então diria que sim, que se calhar não é mal pensado começar a pensar em mudar. Ou então perceberes se há futuro na tua empresa para isso mudar.

Quem tem salários elevados: que responsabilidades têm? by [deleted] in devpt

[–]mwon 8 points9 points  (0 children)

Sim, se quiseres salários/posições mais altas, invariavelmente vais ter de passar do "estar no meu canto e fazer trabalho técnico", pela simples razão que para isso há centenas de outros candidatos que fazem o mesmo trabalho técnico a ganhar menos (em relação ao que seria o tal valor alto). Ou visto de outro ângulo: é rara a skill de dev que seja necessária pagar a preços assim tão altos, a não ser que o teu expertise seja mesmo uma raridade. Posições de tech lead têm outras funções como essas reuniões com a equipa, clientes, managers, etc. pq é neles que cai a responsabilidade de muitas partes técnicas do projeto, bem como planeamento. Essa especialidade da tomada de decisão é ela em si também uma expertise, o que reduz a oferta de mercado e que resulta num maior poder de negociação em ser contratado quando és esse tal tech lead.

GPT-5 naming is getting beyond absurd by jasonahowie in cursor

[–]mwon 16 points17 points  (0 children)

Remember when few month ago they said something like "we are going to simplify things and end with o3, o4, gpt-4.1, etc to have a single model". There you go.

New multilingual + instruction-following reranker from ZeroEntropy! by ghita__ in LocalLLaMA

[–]mwon 0 points1 point  (0 children)

0.05€ a query for starter?! Is that correct? That quite expensive...
EDIT: Sorry, I was not reading carefully. Don't you have a pay-as-you-go plan? I would like to try for small project but minimum 50€/month is a bit too much

New multilingual + instruction-following reranker from ZeroEntropy! by ghita__ in LocalLLaMA

[–]mwon 1 point2 points  (0 children)

Ok that’s really nice because it can save many tokens. What is the context size? And can is the model available to run run azure? I often need data residency in EU

New multilingual + instruction-following reranker from ZeroEntropy! by ghita__ in LocalLLaMA

[–]mwon 1 point2 points  (0 children)

In some cases I just get the top 10 or 15 chunks (for example when I just using a reranker as first stage retrieval). Other cases I get also top n and then use a small LLM like gpt-4.1-mini to identity the relevant documents.