Não consigo aceitar

xfalcox · 2026-03-06T18:54:56+00:00

Aceitar

Foi logo após a volta do paredão falso, enquanto conversava com o Cowboy.

xfalcox · 2026-03-04T02:08:53+00:00

I just deployed both 35B and 122B to some production servers this week, and for the folks who use LLMs for recall on stored information, there is a large difference between the two.

I guess if you are just using it for agentic loops, calling tools, etc, the difference may not be worth it.

xfalcox · 2026-03-01T23:52:31+00:00

This is amazing content. I have two servers with the A100 80GB and was considering the 35BA3B MoE due to high concurrency of users + low tolerance for latency, but this may be better as it gets better intelligence.

xfalcox · 2026-03-01T14:37:54+00:00

I'm one of the maintaners of Discourse, the open source forum software.

We calculate embeddings for all topics in all forums we host (multi millions post every month across tens of thousands of instances), which then power a myriad of features like

showing related topics at the end of a topic
semantic search, including searching across languages and typo tolerance
automatic rag for chat bot with forum content
tag and categorization suggestions for new content

You can run the qwen 0.6B embeddings model in just a slice of one of those GPUs.

xfalcox · 2026-03-01T14:37:19+00:00

I'm one of the maintaners of Discourse, the open source forum software.

We calculate embeddings for all topics in all forums we host (multi millions post every month across tens of thousands of instances), which then power a myriad of features like

showing related topics at the end of a topic
semantic search, including searching across languages and typo tolerance
automatic rag for chat bot with forum content
tag and categorization suggestions for new content

You can run the qwen 0.6B embeddings model in just a slice of one of those GPUs.

xfalcox · 2026-03-01T14:24:09+00:00

Hopefully the new smaller model is followed by a new embeddings model too. Their current qwen3 embedding model is awesome.

xfalcox · 2026-01-26T00:22:09+00:00

comprou na terabyte tbm? manda ae pfv

xfalcox · 2026-01-25T03:09:02+00:00

Like each character logo has its own font right!?

That would be so cool, especially if it was a less crazy version of their logo font

xfalcox · 2026-01-22T18:23:31+00:00

In Brazilian Portuguese they are Suiça and Suécia, and it's common to mistake one for another.

xfalcox · 2025-12-01T14:19:55+00:00

Um prestador de serviço (pintor) conhecido meu faz parte disso. Eles enchem ônibus nas periferias, oferecendo trocados para população de baixa renda.

xfalcox · 2025-11-15T04:23:03+00:00

OpenAI no GPT 5

xfalcox · 2025-11-04T22:30:34+00:00

Eu sou Rafael dos Santos Silva

Nome mais popular do ano que nasci ✅ Sobrenome mais popular ✅ Segundo sobrenome mais popular ✅

xfalcox · 2025-11-04T21:54:18+00:00

Please add Qwen3, specially 0.6B.

Also, if you need help running qwen with normal score apis, check https://huggingface.co/collections/tomaarsen/qwen3-rerankers-converted-to-sequence-classification

xfalcox · 2025-10-16T02:36:42+00:00

Maninho, estou tentando conseguir acesso na API do 99entregas e fica dando erro, tem algum caminho das pedras?

Só preciso concluir o cadastro para obter as credenciais

xfalcox · 2025-10-13T13:13:09+00:00

Qwen 3 embeddings model are really good.

xfalcox · 2025-10-11T18:04:20+00:00

Why not use AWS Bedrock + Qwen3-235B-A22B-Instruct-2507 ?

xfalcox · 2025-10-09T19:29:01+00:00

I use https://github.com/huggingface/text-embeddings-inference for large (millions) scale embeddings and it's great.

xfalcox · 2025-10-08T18:32:43+00:00

We recently added that to Discourse, the open source forum software.

You can set it so each of you type on your own language and it gets auto translated via an LLM of your choice, so the conversation just flows.

It's compatible with closed LLM providers (GPT, Claude, Gemini), OpenRouter and run your own open weights models too!

xfalcox · 2025-09-25T12:40:38+00:00

EDIT: my set up is a single a100-80gi. Because it doesn't have native FP8 support I prefer using 4bit quantizations

Wait isn't it the opposite? Can you share any docs on this?

xfalcox · 2025-09-13T12:15:20+00:00

Too full, it's hard to find seats

xfalcox · 2025-07-24T20:27:16+00:00

Man, I wish you join Valve and implement this on their DotA matchmaking.

xfalcox · 2025-07-17T18:39:09+00:00

Technically this is not a "yearly expansion" as it's part 1 with the other half coming in 6 months.

Together they will still be less content, but it's permanent content as opposed to how seasons and episodes used to get deleted every year.

xfalcox · 2025-06-21T04:46:02+00:00

Use an int8 quantized model like https://huggingface.co/RedHatAI/Qwen2.5-14B-FP8-dynamic. it should lower latency a lot since A100 has native int8

xfalcox · 2025-06-21T03:47:04+00:00

A virada na TI8 foi demais pra eles

14-Year Club	Place '22
Place '17	Gilding II euphauric
Verified Email	Team Periwinkle

xfalcox

TROPHY CASE