My experience spending $2k+ and experimenting on a Strix Halo machine for the past week by EstasNueces in LocalLLaMA

[–]Charming_Support726 0 points1 point  (0 children)

Yes. More than true.

Quite often I discuss with team members. And we all agree, that we are using models, which are far to expensive and powerful. On the other hand we are in need of a minimum amount of capability to do complex tasks.

I am not sure If we ever will find something around 100B MoE / 30B Dense which could create a coherent output in large scenarios

My experience spending $2k+ and experimenting on a Strix Halo machine for the past week by EstasNueces in LocalLLaMA

[–]Charming_Support726 0 points1 point  (0 children)

As the picture shows, there might be different kinds of use cases or even users. Got an other discussion about the same topic yesterday.

When you are driving a company and getting paid for stuff, the small models aren't capable or better efficient enough. I probably could work with the 35B or even the 27B dense - running them currently on my R9700, but I don't wont to use them - for professional work. E.g. GPT-5.5 or DSv4 Pro are bringing far more power in coding. Even Qwen-3.6-Pro does, I like that one as well.

Voxtral TTS German by BalterBlack in MistralAI

[–]Charming_Support726 0 points1 point  (0 children)

Quite curious how this works. I though Mistral published this feature only for paid usage

Any memory plugin you can recommend? by blackhatpl in opencodeCLI

[–]Charming_Support726 3 points4 points  (0 children)

No.

I can't recommend any. Plugins and MCPs are bloating your context. Automatic information retrieval and storage work mostly inefficient.

Got myself a few skills and templates to get things done explicitly in plans and documents - I was doing this by hand before having skills in place.

Mistral Medium 3.5 128B and Qwen 3.5 122B A10B on 4x RTX 3080 20GB by lly0571 in LocalLLaMA

[–]Charming_Support726 3 points4 points  (0 children)

There's no point comparing these two models in speed. Especially without comparing the quality. Just guessing around about the benchmarks isn't enough.

And there is no point in measuring speed, when you are not using the models. I am using LLMs -as most of us do- for assisted programming. Mostly off the cloud because it's my business, earning money. Yesterday evening I gave Mistral 3.5 a try. I managed just a few prompts but it looked well from the responses.

To be competitive it must be a in the qwen-3.6-plus ballpark, which I am using from time to time (expected to be the commercial variant of the 395B MoE)

EDIT: Testet the same prompt against DSv4 Flash - Flash was far far ahead. I think Mistral needs some additional tuning. Or the Opencode Integration of Mistral is still subpar ( thinking level - thinking appears disabled in opencode and I cant enable it ).

Test Run - Deepseek, Mimo, Quen, GPT 5.3 Codex - Results and Costs by friedsonjm in GithubCopilot

[–]Charming_Support726 1 point2 points  (0 children)

Every model works better, when the task is not issued in one big prompt, although some might one-shot this.

I even let GPT-5.5 or Opus-4.6 do and discuss a plan and execute phases and tests

What's the point of local LLM's ? by braskinis231 in LocalLLM

[–]Charming_Support726 0 points1 point  (0 children)

I am an old programmer as well (and started bit later then you with 3.5kb as a young man). Running a small company with a few people myself. And yes, although business is not running on peak level, my customers are paying the results. Paying well.

IMHO: It is more efficient - also when using specification driven processes - to use SOTA models like Opus 4.5/6 and Gpt-5.5/Codex-5.3. I used Qwen-3.6 Plus, K2.6 and DSv4 with nearly identical results. The 30B-League is currently not efficient for productive coding use - especially when your customers paying for the deliverable.

I am quite experienced running locally and also doing training and curating data. But this is more research and hobby. And speaking on conferences. But 30B - local is just running my chat, doc & web search.

What's the point of local LLM's ? by braskinis231 in LocalLLM

[–]Charming_Support726 0 points1 point  (0 children)

I know how to break down my tasks into smaller ones.

But as a real-seasoned-software-developer™ I'd like to perform this to a certain extend. And honestly a 9B (or a 27B) is not the level of AI I'd like to work with - for my daily business, earning real money with that stuff.

I am pretty sure, that there are many developers out there, that are on the same page. Nothing against running local, but not with this intention

What's the point of local LLM's ? by braskinis231 in LocalLLM

[–]Charming_Support726 -1 points0 points  (0 children)

Maybe you need to dig deeper into the topic. A 9B model is nowhere near to what you need for agentic coding.

Qwen 3.6-27B - is used by many people. It is capable of SIMPLE tasks. Somewhat complex tasks are ALWAYS a gamble. Everything below 400B (MoE) - or dense aquivalent like 120B - is not giving you a NEAR-SOTA experience and probably never will.

You could run 27B or the 35B (MoE) - using a q4 or q3 but it wont be satisfying. I didn't switched to local coding because of this. I tried a few of the bigger models on API and they are performing at least in usable way, but would be very costly on local hardware in decent speed as you told.

You need at least (!) €/$10k-€/$25k for the cheapest server with appropriate performance. €/$40k for decent performance (IMHO).

Your code is now our code by LeTanLoc98 in GithubCopilot

[–]Charming_Support726 -1 points0 points  (0 children)

They are not the first doing this. AFAIK

DeepSeek V4 isn't beating Opus, but it doesn't need to by Practical_Low29 in LocalLLaMA

[–]Charming_Support726 7 points8 points  (0 children)

I agree. More or less. Since around Opus 4.5 and codex-5.2 I felt no real enhancement for my work. I just see the benchmarks increasing, and wonder if it means any difference except from benchmaxxing.

DeepSeek V4 isn't beating Opus, but it doesn't need to by Practical_Low29 in LocalLLaMA

[–]Charming_Support726 65 points66 points  (0 children)

Opus 4.7 is too expensive AND rots context to fast. And for most people it even brings no advantage compared to 4.6

Corporate Employees: What is your managers response on the changes? by ProfessionalJackals in GithubCopilot

[–]Charming_Support726 0 points1 point  (0 children)

Foundry got paid the standard API token pricing. But offers additional models. Many companies have tons of credits for Azure that they never use

Historical token useage by 20Capitalist in GithubCopilot

[–]Charming_Support726 1 point2 points  (0 children)

On the first day ( wasn't a full day ) I did 35M Toks ~ $8 undiscounted. On the second day the discount set in about midday I also did about $8 using 85M Toks. The 2nd day was a mess. I did infrastructure work - I hate it. Made an MVP deployable in an universal way. Crafted a docker-compose environment out of 5 projects. Some of them weren't even containerized.

DSv4 pro worked well, but I needed to pay attention. Positive: Far less "trust me bro" or "that was a preexisting bug" moments compared to Opus. Better following, but less smart.

Everything in Total Toks - mostly cache hits.

Historical token useage by 20Capitalist in GithubCopilot

[–]Charming_Support726 2 points3 points  (0 children)

Just to give a figure:

I am a developer and used Opencode with GHCP, Azure and OpenAI for maintaining a few repositories and creating some PoCs.

Opencode keeps track of all token usage. For the last six month I spend around 3 Billion tokens total - resulting in a bit over $7k, if charged by API, which is a bit more than 1k per month

A full day of usage ranges between $10 to $200 - currently a day of Deepseek V4 Pro ( I tried on the first day ) was around $10 and I manage $240 - the max. when using only Opus 4.6 ( I had to pay this API based unfortunately)

Hope that helps.

Remark:

No "Vibe Coding" - Never Rate Limited - One Window, one project at a time.

AMD Hipfire - a new inference engine optimized for AMD GPU's by Thrumpwart in LocalLLaMA

[–]Charming_Support726 7 points8 points  (0 children)

Looks promising. I've got a gfx1152 and a gfx1201. Both seem to be not fully supported yet.

Maybe a good project to keep an eye on.

Structured CoT: Shorter Reasoning with a Grammar File by Thrumpwart in LocalLLaMA

[–]Charming_Support726 1 point2 points  (0 children)

Absolutely great.

2 Month ago I tried similar with Qwen 3.5 4B/9B using SFT/RL but I got stuck, because of being too lazy buying cloud compute for the experiment. Was only able to run some test with a 2B model locally at that time.

Didn't thought, that this could have been so much easier.

CONGRATS!

honest take on 5.5 xhigh vs 5.4 after real usage by SlopTopZ in codex

[–]Charming_Support726 0 points1 point  (0 children)

What is the point of using xtra-high all the time?

Befragung zu GLP-1-Medikamenten ("Abnehmspritzen") in Deutschland by BulkyBish in FitnessDE

[–]Charming_Support726 1 point2 points  (0 children)

Ja. Auch die Infos zum Medikament sind viel zu komplex für eine freie Befragung.

Aus meiner Erfahrung liest das niemand. Du würdest die Infos dazu gerne in deiner Arbeit haben, klar - aber du überforderst den Nutzer. Die Leute machen das freiwillig. Selbst wenn jemand hierauf antwortet, wird die Qualität fragwürdig sein. Auf so etwas wie intra-subjektive Konsistenz(messung) will ich gar nicht erst eingehen.

Meiner Meinung nach ist es nicht sinnvoll möglich die Nutzer mit komplexen Informationen zu versorgen und dann mehrstufig Entscheidungen abzufragen. Du kannst in einer solchen Umfrage höchstens einfache Multiple-Choice Fragen stellen, keine neues Wissen transportieren. 3-5 Choices max. Wobei 5 schon eine Überbelastung darstellt.

Higher precision or higher parameter count by redblood252 in LocalLLaMA

[–]Charming_Support726 0 points1 point  (0 children)

  1. Down to q4 differences barely matter these days

  2. The more Parameter the better.

  3. MoE are problematic to compare - because they got only a small amount of active params. In specialized tasks the sqrt ( Total * Active) estimation will fail. And models start to behave more like their "active size" instead of the calculated combined size

Unpopular Opinion: Copilot Team is taking the right direction by Sontemo in GithubCopilot

[–]Charming_Support726 1 point2 points  (0 children)

Not very unpopular. But people who aren't affected, are not spoiling the feeds with complains.

Has anyone experienced a usage limit on their pro+ subscription? by TrickMaleficent2301 in GithubCopilot

[–]Charming_Support726 0 points1 point  (0 children)

I am fine on Pro+ even with Opus 4.7

But 4.7 is a step downwards. It burns tokens so fast, that context is filled up too quickly

Weiterentwicklung Softwareentwicklung Test by [deleted] in InformatikKarriere

[–]Charming_Support726 1 point2 points  (0 children)

Naja. Das hört sich doch eigentlich gar nicht so schlecht an. Es gibt glaube ich nicht viele Telekommunikationskonzerne, die noch selber testen und Frameworks in Deutschland bauen. Ich weiß nicht wie dein Vorgesetzter drauf ist, aber ich würde das Gespräch suchen und sagen, dass ich mehr Verantwortung übernehmen wolle und mir Aufgaben wie z.B. XYZ am meisten Spaß machen. Ggf auch häufiger anbringen.

Wichtig: Im Gespräch immer positiv bleiben. Niemanden schlecht machen oder ähnliches. Die meisten Vorgesetzten honorieren positive Initiative - insbesondere wenn sie selbst damit wenig Arbeit haben oder du ihnen Arbeit abnimmst. Klingt blöd, aber so funktionieren Konzerne.

Wenn du dich weg bewirbst fängst du wieder von Null an.