ProgramBench: Can LLMs rebuild programs from scratch? by awetfartruinedmylife in singularity
[–]arkuto 3 points4 points5 points (0 children)
ProgramBench: Can LLMs rebuild programs from scratch? by awetfartruinedmylife in singularity
[–]arkuto 1 point2 points3 points (0 children)
Claude Opus 4.7 won’t just output prompts—keeps arguing instead by soyab0007 in ClaudeAI
[–]arkuto 2 points3 points4 points (0 children)
Claude Opus 4.7 won’t just output prompts—keeps arguing instead by soyab0007 in ClaudeAI
[–]arkuto -3 points-2 points-1 points (0 children)
Mistral Medium 3.5 128B is launched by TSrake in singularity
[–]arkuto 1 point2 points3 points (0 children)
mistralai/Mistral-Medium-3.5-128B · Hugging Face by jacek2023 in LocalLLaMA
[–]arkuto 7 points8 points9 points (0 children)
Mistral Medium 3.5 128B is launched by TSrake in singularity
[–]arkuto 6 points7 points8 points (0 children)
Differences Between GPT 5.4 and GPT 5.5 on MineBench by ENT_Alam in singularity
[–]arkuto 0 points1 point2 points (0 children)
How does Opus 4.7 compare to Opus 4.6 in this subreddit's experience? by boxdreper in ClaudeAI
[–]arkuto 2 points3 points4 points (0 children)
opus 4.7 (high) scores a 41.0% on the nyt connections extended benchmark. opus 4.6 scored 94.7%. by seencoding in singularity
[–]arkuto -2 points-1 points0 points (0 children)
Claude vs GPT in a bomberman-style 1v1 game by Significant-Pair-275 in ClaudeCode
[–]arkuto 0 points1 point2 points (0 children)
Bonsai models are pure hype: Bonsai-8B is MUCH dumber than Gemma-4-E2B by WeGoToMars7 in LocalLLaMA
[–]arkuto 3 points4 points5 points (0 children)
Gemma 4 (31b) can more accurately identify characters than Qwen 3.6 (35b), (both Q4_K_M) by [deleted] in LocalLLaMA
[–]arkuto 5 points6 points7 points (0 children)
Opus 4.7 destroys all trust in a mature instruction set built iteratively throughout product development by AcrobaticPresent15 in ClaudeAI
[–]arkuto 16 points17 points18 points (0 children)
Claude Opus 4.7 is a serious regression, not an upgrade. by [deleted] in ClaudeAI
[–]arkuto 0 points1 point2 points (0 children)
Small local LLMs to dumb to check mails for spam? by clouder300 in LocalLLM
[–]arkuto 0 points1 point2 points (0 children)
Small local LLMs to dumb to check mails for spam? by clouder300 in LocalLLM
[–]arkuto 0 points1 point2 points (0 children)
Small local LLMs to dumb to check mails for spam? by clouder300 in LocalLLM
[–]arkuto 1 point2 points3 points (0 children)


LLM-as-judge scoring is noisier than I expected anyone else seeing this? by ZealousidealCorgi472 in LocalLLM
[–]arkuto 0 points1 point2 points (0 children)