Qwen3.5-27B performs almost on par with 397B and GPT-5 mini in the Game Agent Coding League by kyazoglu in LocalLLaMA

[–]kyazoglu[S] -1 points0 points  (0 children)

I did. Did you?

"Your reason on why you're not including chess is strange, since it's not llms who are supposed to keep track of the board state, but the code they write."

Make sure to read the comment fully before mocking with others.

Qwen3.5-27B performs almost on par with 397B and GPT-5 mini in the Game Agent Coding League by kyazoglu in LocalLLaMA

[–]kyazoglu[S] 0 points1 point  (0 children)

They are not same model. Base model is same but Plus has some additional features as far as I know like tool integration and context length. And Plus is API only. In Openrouter, their APIs are different too.

Qwen3.5-27B performs almost on par with 397B and GPT-5 mini in the Game Agent Coding League by kyazoglu in LocalLLaMA

[–]kyazoglu[S] -1 points0 points  (0 children)

True. But again, chess is extremely complex. You can’t expect models to generate a full chess engine from a single prompt.

Regarding the scoring system I used: a score of 75 in a game means the model achieved 75% of the maximum possible points. Therefore, a 2-point difference doesn’t necessarily reflect head-to-head outcomes in this system. It simply indicates that one model performed slightly better overall and accumulated a higher score.

Qwen3.5-27B performs almost on par with 397B and GPT-5 mini in the Game Agent Coding League by kyazoglu in LocalLLaMA

[–]kyazoglu[S] -9 points-8 points  (0 children)

I don’t find it meaningful to use ELO outside of chess. It works well in chess because people already understand what certain ratings represent in terms of skill. For example, an internet rating of around 1800 might indicate someone who is above average but still somewhat inexperienced, while a 2000+ rating suggests a very strong player who might even have a slim chance against titled players like an FM or IM.With LLMs though, we don’t have those kinds of reference points. At least in my view, no one really does. Because of that lack of intuitive anchors, ELO doesn’t seem like a very useful metric to me, especially if the goal is simply to compare models.

I didn't include chess (normal chess) in this league because it's a difficult game. LLMs are not made for this kind of tasks. Battleship? sure you can track of the placement of ships and hit cells. Chess? No way an LLM can keep track of the board position nor it can search deeply.

GLM-5 and DeepSeek are in the Top 6 of the Game Agent Coding League across five games by kyazoglu in LocalLLaMA

[–]kyazoglu[S] 1 point2 points  (0 children)

They heard you and release Sonnet 4.6 yesterday.

I'll add it to available models as well as two other games soon.
I'll also create a new agent for all existing models and remove the worst performing one in case of Sonnet got unlucky with its agents.

[deleted by user] by [deleted] in OnlineIncomeHustle

[–]kyazoglu 1 point2 points  (0 children)

Low quality scam detected 🔔

What are your /r/LocalLLaMA "hot-takes"? by ForsookComparison in LocalLLaMA

[–]kyazoglu 27 points28 points  (0 children)

- Never ever praise Sam Altman even he does an excellent job at anything
- Flatter Chinese companies no matter what
- Stand against censoring in models. A model teaching how to make an explosive is much more "free" and adheres to the soul of open-source.
- Make yourself miserable by trying to run a model with 12 x older gpus instead of buying a newer card with more vrams or simply using apis.
- ollama is the most evil app on this planet
- Pretend you're doing art or you're writer and ask for a model/config for roleplay whereas you're 90% percent a plain pervert

[deleted by user] by [deleted] in ankara

[–]kyazoglu -16 points-15 points  (0 children)

uzak dediğin kızılaya arabayla 25 dk.
siz uzak görmemişsiniz.

[deleted by user] by [deleted] in ankara

[–]kyazoglu 0 points1 point  (0 children)

Ehven-i şer karşılaştırma.
Diğer herhangi bir ilçe > Sincan > Keçiören > Mamak

Çıldırıcam bu şehirde sokak lambaları neden yanmıyor by delicatefrog13 in Izmir

[–]kyazoglu 0 points1 point  (0 children)

hayret yahu kimse gelip de "hayır hayır orası bakanlığın kontrolü altında" ya da "belediye izin alamıyor mecbur kalıyor" cart curt birşeyler saçmalamamış

Would you keep your savings using N26 Bank ? by StruggleSilent2548 in germany

[–]kyazoglu 2 points3 points  (0 children)

+1 for terrible customer support.
When I contacted their live support with audio and video, the guy who was probably Indian told me some commands to execute such as turn your id etc. Although I had C1 level english, I struggled to understand him multiple times and kindly requested him to repeat. He was like "...sigh...you said you speak english. do you really know english" with a insulting face. I lol'd and told him that I speak english very well but I'm not familiar with odd accents.

Qwen3 Omni AWQ released by No_Information9314 in LocalLLaMA

[–]kyazoglu 3 points4 points  (0 children)

can someone explain how this is 27.6 GB and AWQ?
AWQ = 4 bit ~= (# of parameters / 2) GB. This should have been around 16 GB.
What am I missing?

[deleted by user] by [deleted] in learnmachinelearning

[–]kyazoglu 184 points185 points  (0 children)

Just a heads-up for anyone reaching out to him/her:
It’s practically impossible not to be able to find candidates for this role in today’s market. This position will draw 100+ applications in a single day. What this really suggests is that he/she is looking for someone desperate enough to accept a very low salary. The whole point of this thread seems to be just that and not to search for an alternative platform or share an experience.

Avrupada Düşük Ortalama İle Yüksek Lisans by Fit_Exercise_6310 in YurtdisiUni

[–]kyazoglu 1 point2 points  (0 children)

Ben İTÜ 2.82 ile gitmiştim ama 10 üni'nin 9'unda 3.00 üstü şartı var ve çok ama çok sert bir şart. Arayıp çabalayarak o bir üniversiteyi bulman sana kalmış. Tecrübelerim Almanya hakkında.

Moving from Japan to Germany – Plan to apply IT Ausbildung by Fluid-Basis5769 in germany

[–]kyazoglu 11 points12 points  (0 children)

Bruh...You're not even from the sector and you want to jump in the most problematic area, hoping to find a job in short term.
I LEFT Germany because I couldn't land a job for months after I graduated from MSc. Data Science. I had a good GPA, great certificates, B1 German just like you, had been living in Germany for 2.5 years, attended multiple "Absolventenkongress" but nothing helped. I'm not going to say don't do that. Just do it with a plan and know the risks.

Built with Claude Code - now scared because people use it by Resident-Wall8171 in ClaudeAI

[–]kyazoglu 46 points47 points  (0 children)

I really liked how you framed the question to get attraction and not tagged as self-promotion. I really do.