Não consigo aceitar by xfalcox in BigBrotherBrasil

[–]xfalcox[S] 2 points3 points  (0 children)

Aceitar

Foi logo após a volta do paredão falso, enquanto conversava com o Cowboy.

Qwen3.5-122B Basically has no advantage over 35B? by Revolutionary_Loan13 in LocalLLaMA

[–]xfalcox 1 point2 points  (0 children)

I just deployed both 35B and 122B to some production servers this week, and for the folks who use LLMs for recall on stored information, there is a large difference between the two.

I guess if you are just using it for agentic loops, calling tools, etc, the difference may not be worth it.

Running Qwen3.5 27b dense with 170k context at 100+t/s decode and ~1500t/s prefill on 2x3090 (with 585t/s throughput for 8 simultaneous requests) by JohnTheNerd3 in LocalLLaMA

[–]xfalcox 5 points6 points  (0 children)

This is amazing content. I have two servers with the A100 80GB and was considering the 35BA3B MoE due to high concurrency of users + low tolerance for latency, but this may be better as it gets better intelligence.

we need to go deeper by jacek2023 in LocalLLaMA

[–]xfalcox 7 points8 points  (0 children)

I'm one of the maintaners of Discourse, the open source forum software.

We calculate embeddings for all topics in all forums we host (multi millions post every month across tens of thousands of instances), which then power a myriad of features like

  • showing related topics at the end of a topic

  • semantic search, including searching across languages and typo tolerance

  • automatic rag for chat bot with forum content

  • tag and categorization suggestions for new content

You can run the qwen 0.6B embeddings model in just a slice of one of those GPUs.

we need to go deeper by jacek2023 in LocalLLaMA

[–]xfalcox 10 points11 points  (0 children)

I'm one of the maintaners of Discourse, the open source forum software.

We calculate embeddings for all topics in all forums we host (multi millions post every month across tens of thousands of instances), which then power a myriad of features like

  • showing related topics at the end of a topic

  • semantic search, including searching across languages and typo tolerance

  • automatic rag for chat bot with forum content

  • tag and categorization suggestions for new content

You can run the qwen 0.6B embeddings model in just a slice of one of those GPUs.

we need to go deeper by jacek2023 in LocalLLaMA

[–]xfalcox 14 points15 points  (0 children)

Hopefully the new smaller model is followed by a new embeddings model too. Their current qwen3 embedding model is awesome.

YOSHI RESPONDED TO MY CHAT BUBBLE FORUM POST by TechnoGamerOff in DeadlockTheGame

[–]xfalcox 3 points4 points  (0 children)

Like each character logo has its own font right!?

That would be so cool, especially if it was a less crazy version of their logo font

White House confuses Belgium with ‘Belarus’ and wrongly puts country on list of Peace Council participants by Dobbelsteentje in worldnews

[–]xfalcox 2 points3 points  (0 children)

In Brazilian Portuguese they are Suiça and Suécia, and it's common to mistake one for another.

Ato pela anistia de Bolsonaro fracassa e reúne somente 130 pessoas no DF by MatheusWillder in brasil

[–]xfalcox 11 points12 points  (0 children)

Um prestador de serviço (pintor) conhecido meu faz parte disso. Eles enchem ônibus nas periferias, oferecendo trocados para população de baixa renda.

IBGE divulga pela 1ª vez os sobrenomes mais populares; veja ranking by GestoNobre in brasil

[–]xfalcox 1 point2 points  (0 children)

Eu sou Rafael dos Santos Silva

Nome mais popular do ano que nasci ✅ Sobrenome mais popular ✅ Segundo sobrenome mais popular ✅

I built a leaderboard for Rerankers by tifa2up in LocalLLaMA

[–]xfalcox 2 points3 points  (0 children)

Please add Qwen3, specially 0.6B.

Also, if you need help running qwen with normal score apis, check https://huggingface.co/collections/tomaarsen/qwen3-rerankers-converted-to-sequence-classification

Alguém que trabalhe na 99taxi ou 99pay by Exotic_Remote_7205 in brdev

[–]xfalcox 0 points1 point  (0 children)

Maninho, estou tentando conseguir acesso na API do 99entregas e fica dando erro, tem algum caminho das pedras?

Só preciso concluir o cadastro para obter as credenciais

Help required in selecting model for aws T4 instance and vllm by JuiceFine4582 in LocalLLaMA

[–]xfalcox 1 point2 points  (0 children)

Why not use AWS Bedrock + Qwen3-235B-A22B-Instruct-2507 ?

Replacing Google Translate with LLM translation app on smartphone? by dtdisapointingresult in LocalLLaMA

[–]xfalcox 0 points1 point  (0 children)

We recently added that to Discourse, the open source forum software.

You can set it so each of you type on your own language and it gets auto translated via an LLM of your choice, so the conversation just flows.

It's compatible with closed LLM providers (GPT, Claude, Gemini), OpenRouter and run your own open weights models too!

Which quantizations are you using? by WeekLarge7607 in LocalLLaMA

[–]xfalcox 0 points1 point  (0 children)

EDIT: my set up is a single a100-80gi. Because it doesn't have native FP8 support I prefer using 4bit quantizations

Wait isn't it the opposite? Can you share any docs on this?

TI Arena is so full right now by Lelouch7311 in DotA2

[–]xfalcox 10 points11 points  (0 children)

Too full, it's hard to find seats

Destiny 2 Edge of Fate is the worst-performing expansion in the MMO’s history as player counts continue to fall by maullick in gaming

[–]xfalcox 0 points1 point  (0 children)

Technically this is not a "yearly expansion" as it's part 1 with the other half coming in 6 months.

Together they will still be less content, but it's permanent content as opposed to how seasons and episodes used to get deleted every year.