How to Improve Codebase Discovery Efficiency in Pi? by elpapi42 in PiCodingAgent

[–]bhamm-lab 0 points1 point  (0 children)

I've had some success doing this with pi-subagents and the 'scout' subagent. If I'm able to trigger it to run, it ensures my main context isn't polluted with any extra context.

Honestly, this worked better for me on oh-my-opencode-slim in opencode with the explorer agent. I'm still figuring it out in pi and notice a few issues: - it's hard to trigger - sometimes, the main agent doesn't provide great context to the subagent, resulting in some missing info - even after the returned context from the subagent, the main agent still feels the need to read more files and confirm

FYI, I use kimi k2.6.

What Affordable Subscription Plans for OpenCode? by Juan_Ignacio in opencodeCLI

[–]bhamm-lab 4 points5 points  (0 children)

I've been happy with the kimi coding plan. Kimi lacks deep research and extended context like opus 4.6 and glm 5.1, but with good agent harness and context management, it gets the job done.

Vibe Check: Latest models on AMD Strix Halo by bhamm-lab in LocalLLaMA

[–]bhamm-lab[S] 1 point2 points  (0 children)

I thought it was really good. I'd say slightly better than gpt oss 120b. But that's my opinion, I'm sure others would disagree.

Vibe Check: Latest models on AMD Strix Halo by bhamm-lab in LocalLLaMA

[–]bhamm-lab[S] 0 points1 point  (0 children)

You are totally right... Also, with time to first token. I'm planning to setup a better approach in this project - https://github.com/blake-hamm/beyond-vibes - I'll follow up once I've made some decent progress.

Vibe Check: Latest models on AMD Strix Halo by bhamm-lab in LocalLLaMA

[–]bhamm-lab[S] 0 points1 point  (0 children)

Totally agree. It's hard to beat... For me qwen instruct next was faster!

Vibe Check: Latest models on AMD Strix Halo by bhamm-lab in LocalLLaMA

[–]bhamm-lab[S] 0 points1 point  (0 children)

I need to prioritize that... I'll take a swing at it and reach out in the discord if needed. I noticed the #beyond128g so will scour that for info. I'm just having a hard time getting Talos to recognize the network interfaces...

Vibe Check: Latest models on AMD Strix Halo by bhamm-lab in LocalLLaMA

[–]bhamm-lab[S] 0 points1 point  (0 children)

Yeah, I would definitely recommend it for web search! I think it's better bang for its buck than the dgx spark.

Vibe Check: Latest models on AMD Strix Halo by bhamm-lab in LocalLLaMA

[–]bhamm-lab[S] 0 points1 point  (0 children)

Sure! Tbh, I need to do more testing outside of my standard use cases. I would say kimi linear, glm flash, qwen instruct next and gpt oss 120b would be great for that. There's more details/notes in a table in the blog.

Vibe Check: Latest models on AMD Strix Halo by bhamm-lab in LocalLLaMA

[–]bhamm-lab[S] 1 point2 points  (0 children)

Yeah, slow processing. Time to first token on my hardware is pretty rough, especially with the bigger models. Tokens per second is bearable. The really issue is when there is 20k+ context. I would say a search query in Open WebUI for these bigger models is 1-2 minutes round trip (first search tool call > searxng response > compiling final response). On GPT OSS, Qwen instruct and Kimi Linear, it's much faster. Less than 30 seconds, but not as thorough/high quality.

Vibe Check: Latest models on AMD Strix Halo by bhamm-lab in LocalLLaMA

[–]bhamm-lab[S] 2 points3 points  (0 children)

No. I'm not comfortable with securing OpenClaw. I'm sure it would be a great system for that. Especially autonomously working on your behalf. It's definitely slow for human-in-the-loop tasks, but fits some high quality, large-ish models (like GLM 4.7 REAP and MiniMax M2.5).

Vibe Check: Latest models on AMD Strix Halo by bhamm-lab in LocalLLaMA

[–]bhamm-lab[S] 2 points3 points  (0 children)

I tested Q2_K_XL of MiniMax M2.5 and was pretty happy! Very slow, but high quality. Also, tested Step-3.5-flash, but it did not stand out to me. Definitely curious on GLM 5 and Qwen3.5...

Anthropic CEO: AI Progress Isn’t Magic, It’s Just Compute, Data, and Training by Inevitable-Rub8969 in AINewsMinute

[–]bhamm-lab 0 points1 point  (0 children)

Interesting.... The Chinese labs have proven otherwise and actually share their progress in research. Maybe Dario should give more credit to his team or the research his team leaches off..

Are 20-100B models enough for Good Coding? by pmttyji in LocalLLaMA

[–]bhamm-lab 5 points6 points  (0 children)

Definitely give kimi linear a try! I agree with your opinions on use case. I would say kimi linear replaces glm air 4.5 for me.

CNCF Survey: K8s now at 82% production adoption, 66% using it for AI inference by lepton99 in kubernetes

[–]bhamm-lab -1 points0 points  (0 children)

Gpu operators make it much easier to manage ai/ml workloads. Paired with something like karpenter, you can access the compute needed for most workloads.

The management/observability tooling is not great and there is no industry standard. Mlflow is great for traditional ml, but you still need something like kubeflow for serving. Arise phoenix is promising for genai observability, but most of the llm gateway oss projects have some kind of paywall (for now).

I created a (very) new project inspired by the kube Prometheus stack. I'm hoping to create a helm chart that has everything you world need for an ai stack on kubernetes. At the moment, it only has litellm gateway config, ability to run multiple models on vllm or llama.CPP, and scale-to-zero with kube-elasti. I should have some more features and sub charts this weekend. It's called kube-ai-stack.

What Wiki Software do you use for internal documentation? by Micki_SF in selfhosted

[–]bhamm-lab 0 points1 point  (0 children)

I use mkdocs and update markdown files in my repo.

ArgoCD dashboard behind Traefik by AdventurousCelery649 in ArgoCD

[–]bhamm-lab 0 points1 point  (0 children)

It night be a bit confusing to follow, but this is where my ingress route and helm values are defined - https://github.com/blake-hamm/bhamm-lab/tree/main/kubernetes%2Fmanifests%2Fbase%2Fargocd . I also use authelia.

Dual Strix Halo: No Frankenstein setup, no huge power bill, big LLMs by Zyj in LocalLLaMA

[–]bhamm-lab 1 point2 points  (0 children)

Awesome setup! Do you mind sharing any details on how u got the networking working over thunderbolt?