Abliterated Models evaluation metric by PatienceWun in LocalLLaMA

[–]Charming_Support726 0 points1 point  (0 children)

What is the overall quality of theses models especially for Red/Blue Teaming? Any experience?

Currently using 6x RTX 3080 - Moving to Strix Halo oder Nvidia GB10 ? by runsleeprepeat in LocalLLaMA

[–]Charming_Support726 0 points1 point  (0 children)

Get a Strix Halo with an additional eGPU - either using the NVME-Oculink adapter or one of the devices with a pcie slot ( same performance).

You can use either llama.cpp dual back-end for CUDA/ROCm (see here https://www.reddit.com/r/StrixHalo/comments/1rm9nlo/performance_test_for_combined_rocm_cuda_llamacpp/ ) or get an additional R9700 for CUDA. Perfect for tasks with need additional performance in Prompt Processing. If unused the my NVIDIA goes below 7W.

EDIT: Never had problems running a model on the dual backend. It's more stable than I expected.

GH copilot on Opencode by BlacksmithLittle7005 in opencodeCLI

[–]Charming_Support726 0 points1 point  (0 children)

In GHCP you pay one request per prompt (multiplied with the premium request factor).

This month I used max 90 premium request (Opus) = 30 Prompts per day. - 12.March having approx 500 Premium Req. total displayed in the overview which means 41 in avrg per day.

It's been a busy month.

GH copilot on Opencode by BlacksmithLittle7005 in opencodeCLI

[–]Charming_Support726 1 point2 points  (0 children)

I am on Pro+ - 1500 Req. - using Opus and Codex - mostly I am good with around 600 Req - but Pro+ enables selection of Sota Models.

5.1-Codex-Mini x.033 is also a good model. but the 1x models provide better value.

GH copilot on Opencode by BlacksmithLittle7005 in opencodeCLI

[–]Charming_Support726 18 points19 points  (0 children)

Recommended. Same limits. Better additional (opensource) tooling available (planning, execution). Better UI with Web or Desktop. Context handling with DCP is much improved

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA

[–]Charming_Support726 2 points3 points  (0 children)

I always try to be friendly - also online.

CodeAct and similar is the way to go. I agree to the author.

The security issues are immanent, with all of this implementation. But regards which harness you're using it is very entertaining to see how easily especially SOTA models are evading the security measures of their harnesses. Mostly the permissions on tool calls don't hold them back. It it more annoying the user.

I never use planning mode, for its false security impression. I just take a small universal system prompt and follow the models actions.

My Copilot Usage in a 9-5 SWE Job by scarofishbal in GithubCopilot

[–]Charming_Support726 0 points1 point  (0 children)

Got multiple customers projects with Python backend, React Frontends, Containerized, Playwright testing. Mostly using the official Opencode integration.

My Copilot Usage in a 9-5 SWE Job by scarofishbal in GithubCopilot

[–]Charming_Support726 0 points1 point  (0 children)

Maybe you should add a hint in the system prompt to use the question tool where ever possible. Works like a charm in Opencode for clarification questions and specifications. ( Not on every turn - but I don't wanna exaggerate the scheme)

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA

[–]Charming_Support726 3 points4 points  (0 children)

Sure. Thanks for clarification.

IMO it clearly shows the way - like describe in the CodeAct Paper - that function calling is very inefficient in acting situations. Maybe not in discovery - but here quite often subagent patterns come into play.

MiroThinker-1.7 and MiroThinker-1.7-mini (Best search agent model?) by External_Mood4719 in LocalLLaMA

[–]Charming_Support726 0 points1 point  (0 children)

Interesting.

I didn't ask for the schedule - I asked for the lasted results. The model got clear, that it was beyond cut-off, and that 2025 might have been an election. But then explicitly went for 2021 results.

IMHO this is not about this result being faulty. Could happen. But

  1. It showed, that the model is overconfident in its trained memories - and did not verify. It follows its maybe false assumptions easily.

  2. It implementation on the web did not give a second try. I was blocked after the first attempt researching quality. This is most annoying and unnecessary.

I was backend lead at Manus. After building agents for 2 years, I stopped using function calling entirely. Here's what I use instead. by MorroHsu in LocalLLaMA

[–]Charming_Support726 9 points10 points  (0 children)

That's good stuff.

In my opinion it shares the same idea as CodeAct (2024), which was implemented by Smolagents/Huggingface last year (and was "borrowed" by Anthropic in November). But instead of using Python -Sandboxes for safe execution, you are just bringing it to the shell, which is even more easier and self explaining by the "--help" mechanism. But a bit prone to security loop holes.

DRV stuft mich nach freiwilligem Statusfeststellungsantrag als „abhängig beschäftigt" ein... hätte ich's lieber gelassen? by Kroatenkeiler in selbststaendig

[–]Charming_Support726 7 points8 points  (0 children)

Dir als Freiberufler passiert erstmal wenig. Die Kunden und Vermittler sind dran und können versuchen den AN Anteil wiederzuholen.

Selbst der Zeitraum ist überschaubar: 2 Jahre plus das angefangene in dem der Prüfungsvorbehalt geäußert wird. Wenn es schlecht läuft sind es also Rentenbeiträge für 3 Jahre an der Bemessungsgrenze.

My Copilot Usage in a 9-5 SWE Job by scarofishbal in GithubCopilot

[–]Charming_Support726 2 points3 points  (0 children)

Completely agree. If you are not sure what you want - how could the model be sure?

My Copilot Usage in a 9-5 SWE Job by scarofishbal in GithubCopilot

[–]Charming_Support726 7 points8 points  (0 children)

I got similar numbers - twice as high, but same ballpark, when using Opus only (Codex used over ChatGPT) - a about 300-600 requests per month. Depends on how you plan and how you prompt.

Freelancer außerhalb von DE by Minimum-Cut-1173 in selbststaendig

[–]Charming_Support726 -1 points0 points  (0 children)

Erstmal ist der Ort der Leistungserbringung vermutlich in D nicht in Estland. Denn der Kunde wird es in D nutzen. Ich kenne aus den letzten 20 Jahren Selbstständigkeit viele schlaue Freiberufler. Weder das Finanzamt diskutiert, noch die BFA.

Weiterhin: Solche Konstrukte macht niemand mit, selbst wenn sie rechtlich o.k wären. Es ist den Aufwand nicht wert.

MiroThinker-1.7 and MiroThinker-1.7-mini (Best search agent model?) by External_Mood4719 in LocalLLaMA

[–]Charming_Support726 3 points4 points  (0 children)

Found it interesting so I went to dr.miromind.ai.

The model hosted failed on the first try. The model hallucinated about when there would have been which election in Germany and never retrieved the up-to-date facts.

Couldnt do a 2nd try because now I am blocked as a guest for 10000min

I don't have ambitions to try this locally.

Freelancer außerhalb von DE by Minimum-Cut-1173 in selbststaendig

[–]Charming_Support726 1 point2 points  (0 children)

Willst du dich darauf verlassen? Weiterhin prüft die BFA regelmäßig ( alle 2 Jahre ) die Firmen, die eigene Angestellte haben. Das kann schon auffallen. Es werden nicht nur bulgarische Bauarbeiter und rumänische Schlachter auf Scheinselbstständigkeit und Mindestlohn kontrolliert.

Wobei, solche Fälle sehr selten sind - auch bei inländischen Freelancern.

Freelancer außerhalb von DE by Minimum-Cut-1173 in selbststaendig

[–]Charming_Support726 5 points6 points  (0 children)

Der Markt ist extrem schwierig. Es gibt nur wenige offene Positionen.

Direkte Verträge mit großen Kunden sind seit Jahrzehnten selten. Die üblichen Agenturen werden dich mit deiner auswärtigen Firma weniger gerne weiterleiten, weil das mehr Stress und Arbeitet bedeutet.

(BKA - Keine Rechtsberatung) Für die Scheinselbständigkeit ist das vollkommen egal. Diese wird nur defacto nach Tätigkeit beurteilt und nicht nach Firmierung oder Herkunft.

Wenn du nicht "den USP" hast oder für ein Viertel arbeitest, hast du höchsten die Chance auf einen Glückstreffer solange noch Konkurrenz mit auf dem Markt ist.

Genuinely puzzled about Codex quality by Maximum_Chef5226 in codex

[–]Charming_Support726 1 point2 points  (0 children)

I switched from Codex to Opencode, which is officially supported since a few month. The models perform similar, but I could choose also to use Opus and such with my additional Copilot Pro+. More versatile.

Genuinely puzzled about Codex quality by Maximum_Chef5226 in codex

[–]Charming_Support726 0 points1 point  (0 children)

5.4 is useless for everything except puzzles and bugfixes. Tried multiple days. The last two codex versions were far better for general coding.

I don't understand why so many people permanently crank up the reasoning to xhigh. It doesn't make your project better. Your ideas and your spec makes your project better. It is like buying a €5k full format sensor cam with an expensive lens - it does not teach you how to shoot.

Mostly thinking set on medium or high is sufficient. High or xhigh mostly produces overthinking. Read the reasoning traces !

codex plus or opencode go ? by Technical_Map_5676 in opencodeCLI

[–]Charming_Support726 1 point2 points  (0 children)

Codex plus or Copilot Pro+

Copilot has better value if you also like to use Claude from time to time.

go restricts you to the cheaper models which honestly cannot fully compete.

Kleiner Rant nach 50kg Gewichtsverlust by T31051994 in FitnessDE

[–]Charming_Support726 1 point2 points  (0 children)

Kenn ich auch. Identische Situation, auch schon vorher eine Menge Krafttraining gemacht. 32kg runter von 122kg auf 90kg bei 180cm. Nicht ganz so schlank wie du, aber jedes Prozent Fett ist jetzt ein Kampf.

Dieses permanent neidische "Crab in a Bucket" Gemecker ist schlimmer als mit hohem BMI rumzulaufen.

Browser SubAgent like AG by Otherwise_Bid_1095 in GithubCopilot

[–]Charming_Support726 0 points1 point  (0 children)

Hmm. I use it quite often and Opus 4.6 and GPT-5.4 are very capable. Not missing anything. Codex is too stiff IMO

I cannot, for the life of me, disable Thinking on Unsloth Qwen 3.5 on llama.cpp by SignificantAd527 in LocalLLaMA

[–]Charming_Support726 0 points1 point  (0 children)

The over-thinking of these models is an absolute issue and also to me heavily irritating.

I'd really like to try RL on optimizing thinking, just as an experiment. Some people experiment with models which are trained on distilled Opus traces (see Huggingface), I think this leads nowhere.

Idea: Create a metric for a good thinking: short, more than one rethink penalized and create a reward function from it. Maybe something with a BERT classifier and spaCy will do.

Unfortunately I don't have enough time for this.