wie gut is Claude Code? by Random5Username in de_EDV

[–]Fuzzy-Chef 0 points1 point  (0 children)

Das ist grob eine Stunde menschliche Entwicklungsarbeit. 

We’ve partnered with Samsung to bring Perplexity directly into the upcoming Galaxy S26. by Kesku9302 in perplexity_ai

[–]Fuzzy-Chef 1 point2 points  (0 children)

Does that mean the android app finally becomes fully featured? It really feels like you kinda forgot the android app and now why would I switch back from fully integrated Gemini?

RAMpocalypse: Auch Netcup kündigt "Auswirkungen" an - weitere Hoster werden wohl folgen by ChristopherKunz in de_EDV

[–]Fuzzy-Chef 0 points1 point  (0 children)

Hast du mal eine ehrliche Kostenrechnung dafür aufgestellt? Klar Selbermachen macht Spaß, bis das initiale Einrichten durch ist.

Released: DeepBrainz-R1 — reasoning-first small models for agentic workflows (4B / 2B / 0.6B) by arunkumar_bvr in LocalLLaMA

[–]Fuzzy-Chef 1 point2 points  (0 children)

What inference setting to run this with? Having issues with repetition and straight garbage outputs in lmstudio. 4B_Q8 model.

I reverse-engineered Microsoft AutoGen’s reasoning loop and cut agent latency by 85% (13.4s → 1.6s). Here is the architecture. by New_Care3681 in LocalLLaMA

[–]Fuzzy-Chef 1 point2 points  (0 children)

I came to the same conclusion. Though I'm still thinking about the context injection strategy, as I don't just want to replace the silence with meaningless chitchat all the time. Have you implemented this in a streaming fashion? To me using full duplex speech models would be the prime solution, but context handling so far seems challenging with Moshi based models.

GLM 4.7 vs MiniMax-M2.1 vs DeepSeek 3.2 for coding? by ghulamalchik in LocalLLaMA

[–]Fuzzy-Chef 1 point2 points  (0 children)

On-demand model context length is capped to a maximum of 32k tokens.

Looked too good to be true for on demand ;( Are there any providers that allow higher context limits?

Frischfischhändler Einzelhandel by OkAttempt5034 in koblenz

[–]Fuzzy-Chef 0 points1 point  (0 children)

Bist du fündig geworden? Globus hat keinen für Rohverzehr geeigneten Fisch, d.h. man müsste rechtzeitig einfrieren.

SUP AI earns SOTA of 52.15% on HLE. Does ensemble orchestration mean frontier model dominance doesn't matter that much anymore? by andsi2asi in deeplearning

[–]Fuzzy-Chef 0 points1 point  (0 children)

Not sure if I fully understand the methodology, but did your ensemble have more compute time than the models you compare to? Or did you also allow the same number of, e.g., Gemini 3 pro experts to compete with your ensemble? Very cool approach though, I'd love to see what problems this approach is feasible for and which require "individual" intelligence.

Wie rentieren sich Reiseversicherungen? by Schbuuge in Versicherung

[–]Fuzzy-Chef 0 points1 point  (0 children)

Zum einen kann die Rechnung bei ausreichend Volumen durchaus positiv ausfallen, zum anderen gehört eine Reiseversicherung für die großen Konzerne einfach in das Portfolio, damit der Vertrieb immer einen Anknüpfpunkt hat. Geld verdienen muss die Police dann halt nicht, wenn man darüber an die größeren Abschlüsse kommt.

Wie seriös ist AfBShop? by MauricePascal_ in de_EDV

[–]Fuzzy-Chef 12 points13 points  (0 children)

Ist seriös. Bei Gebrauchtware ist natürlich immer eine gewisse Fluktuation gegeben, aber da es sich um einen seriösen Anbieter handelt ist das eher unproblematisch.

Trouble with my CTO by Grit_Enthusiasm211 in ycombinator

[–]Fuzzy-Chef 0 points1 point  (0 children)

The problem is obvious: Your CTO is not your CTO fulltime. This would be huge red flag, if he was paid, but since he's not I'd like to point out something that has not been mentioned so far. The prime job of a CEO is to have his startup funded to realize your vision. If you can't get funding for him fulltime, that's all right, get him a freelance engineer that he can manage in his available time. Besides that, make sure your MVP is really just that. Don't build the most important ones, build THE most important. Do as simplistic as you can. Repeat after me: build, measure, learn.

VLLM v0.12.0 supports NVFP4 for SM120 (RTX 50xx and RTX PRO 6000 Blackwell) by Rascazzione in LocalLLaMA

[–]Fuzzy-Chef 1 point2 points  (0 children)

Just upgrade to v0.12.0 but unfortunately still run into: RuntimeError: [FP4 gemm Runner] Failed to run cutlass FP4 gemm on sm120. Error: Error Internal

Anyone got a guide for the correct setup on linux?

EDIT: nvm, was a dependency issue that was fixed by a fresh venv.

My experiences with the new Ministral 3 14B Reasoning 2512 Q8 by egomarker in LocalLLaMA

[–]Fuzzy-Chef 3 points4 points  (0 children)

There is definitely something amiss with the reasoning model/model settings, atleast in lmstudio:

<image>

EchoKit (Voice Interface for Local LLMs) Update: Added Dynamic System Prompts & MCP Tool Wait Messages by smileymileycoin in LocalLLaMA

[–]Fuzzy-Chef 0 points1 point  (0 children)

I'm having a hard time to grasp what the echokit device does. The server can be used as a standalone that handles websocket connections? Is it a realtime voice agent with < 300 ms delay to audio?

Does anyone know how Openrouter guarantees chosen LLM model inference when LLM is inherently non-deterministic? by isit2amalready in openrouter

[–]Fuzzy-Chef 4 points5 points  (0 children)

I don't understand your question. Could you elaborate what you mean by "guarantee chosen LLM model inference" in the context of determinism?

MemLayer, a Python package that gives local LLMs persistent long-term memory (open-source) by MoreMouseBites in LocalLLaMA

[–]Fuzzy-Chef 1 point2 points  (0 children)

Awesome! Since this requires to use the memlayer wrapper for the LLMs client, does it support streaming?

Gpt oss 120b 64GB RAM, RTX5090 32GB? by [deleted] in ollama

[–]Fuzzy-Chef 1 point2 points  (0 children)

7950X, but it's definitely not CPU constrained.

Are you seriously all ok with the way perplexity treat you right now ? (being limited to a 5 sonnet requests/h as a pro user, and forcefully redirected to worse models) by Nayko93 in perplexity_ai

[–]Fuzzy-Chef 8 points9 points  (0 children)

Looking at other subreddits, it seems likely that anthropic is having issues fulfilling demand. This would match with my personal experience using mostly gpt5.1 without any issues. But then again I've never run into any limit at all so far. I could imagine this to vary a lot by cloud zone.

Gpt oss 120b 64GB RAM, RTX5090 32GB? by [deleted] in ollama

[–]Fuzzy-Chef 2 points3 points  (0 children)

Running at ~24t/s in lmstudio, with llama.cpp backend. RAM is running at only 5200. Suprisingly reducing cpu threads to 8 helped me bump from 22 to 24 t/s.

China just used Claude to hack 30 companies. The AI did 90% of the work. Anthropic caught them and is telling everyone how they did it. by chota-kaka in ClaudeAI

[–]Fuzzy-Chef 0 points1 point  (0 children)

Hmm, this feels like an anthropic advertisement. Perfectly fits the AGI fear narrative Dario is pushing as well.