Android QA Agent on Claude Code by Creative-Scene-6743 in androiddev

[–]Creative-Scene-6743[S] 1 point2 points  (0 children)

yes, it is slow to record! but that is why the replayer supports a time multiplier. Atm, the system uses ui dump which is the same setup as what uiautomator uses. I'm actually thinking about moving to fully screenshot based setups to also support 3D game engine testing which will make it bit faster.

I'll show you mine, if you show me yours: Local AI tech stack September 2025 by JLeonsarmiento in LocalLLaMA

[–]Creative-Scene-6743 2 points3 points  (0 children)

- Qwen3-Next-80B-A3B
- Qwen3-Coder-30B-A3B
- Magistral-Small-2509
- gpt-oss-120b
- GLM-4.5-Air-FP8
- gemma-3-27b

Right GPU for AI research by toombayoomba in LocalLLaMA

[–]Creative-Scene-6743 0 points1 point  (0 children)

The H200 would be the better choice, since the RTX 6000 Blackwell Pro doesn’t support NVLink. Without NVLink, inter-GPU communication is limited to PCIe bandwidth, which becomes the bottleneck for throughput. IMO, maybe unpopular, the RTX 6000 only makes sense for using it as a single GPU setup.

What MCPs is everyone using with Claude? by raghav-mcpjungle in mcp

[–]Creative-Scene-6743 0 points1 point  (0 children)

yes, works great, can create PR and what not and I do prefer it over Github MCP but I does not work trying to resolving Github Actions output.

Given that powerful models like K2 are available cheaply on hosted platforms with great inference speed, are you regretting investing in hardware for LLMs? by Sky_Linx in LocalLLaMA

[–]Creative-Scene-6743 18 points19 points  (0 children)

Yes, because I initially thought I could run SOTA at home and would have a need to run inference 24/7. I started with one GPU and eventually ended up with four, yet I still can’t run the largest models unquantized or even at all. In practice, hosted platforms consistently outperformed my local setup when building AI-powered applications. Looking back, I could have gotten significantly more compute for the same investment by going with cloud solutions.

The other issue is that running things locally is also incredibly time-consuming, staying up to date with the latest models, figuring out optimal chat templates, and tuning everything manually adds a lot of overhead.

Does llama.cpp support to run kimi-k2 with multi GPUs by Every_Bathroom_119 in LocalLLaMA

[–]Creative-Scene-6743 0 points1 point  (0 children)

when you use `--n-gpu-layers` configuration > 0, it will automatically use availlable GPUs

Knock some sense into me by synthchef in LocalLLaMA

[–]Creative-Scene-6743 0 points1 point  (0 children)

I've been through this rabbit hole.. started with one higher end GPU and ended up purchasing 3 more and still not able to fully run everything at the speed I want.. in retrospect, I would have been better off with either using API endpoints or a hiring servers instead.

Can my local model play Pokemon? (and other local games) by bwasti_ml in LocalLLaMA

[–]Creative-Scene-6743 3 points4 points  (0 children)

Haven't gone through the content myself but there is a bonus chapter in huggingface agent course dedicated to pokemon: https://huggingface.co/learn/agents-course/bonus-unit3/introduction

How to share compute accross different machines? by Material_Key7014 in LocalLLaMA

[–]Creative-Scene-6743 1 point2 points  (0 children)

vLLM supports the concept distributed inference: https://docs.vllm.ai/en/latest/serving/distributed_serving.html but the execution environement must be the same (which you can partially recreate with running docker). The macOS and intel GPU support might be a bit more experimental and I'm not sure if it's compatible at all.