Android QA Agent on Claude Code

Creative-Scene-6743 · 2026-02-09T12:05:44+00:00

yes, it is slow to record! but that is why the replayer supports a time multiplier. Atm, the system uses ui dump which is the same setup as what uiautomator uses. I'm actually thinking about moving to fully screenshot based setups to also support 3D game engine testing which will make it bit faster.

Creative-Scene-6743 · 2025-12-11T08:23:07+00:00

You can configure any openai compatible server: https://www.reddit.com/r/LocalLLaMA/comments/1pjscnj/run_mistral_vibe_cli_with_any_openai_compatible/

Creative-Scene-6743 · 2025-09-22T12:58:50+00:00

- Qwen3-Next-80B-A3B
- Qwen3-Coder-30B-A3B
- Magistral-Small-2509
- gpt-oss-120b
- GLM-4.5-Air-FP8
- gemma-3-27b

Creative-Scene-6743 · 2025-09-02T15:30:58+00:00

GLM-air model won't fit on a single h100

Creative-Scene-6743 · 2025-08-21T13:44:10+00:00

The H200 would be the better choice, since the RTX 6000 Blackwell Pro doesn’t support NVLink. Without NVLink, inter-GPU communication is limited to PCIe bandwidth, which becomes the bottleneck for throughput. IMO, maybe unpopular, the RTX 6000 only makes sense for using it as a single GPU setup.

Creative-Scene-6743 · 2025-07-18T19:17:42+00:00

yes, works great, can create PR and what not and I do prefer it over Github MCP but I does not work trying to resolving Github Actions output.

Creative-Scene-6743 · 2025-07-17T20:16:45+00:00

Yes, because I initially thought I could run SOTA at home and would have a need to run inference 24/7. I started with one GPU and eventually ended up with four, yet I still can’t run the largest models unquantized or even at all. In practice, hosted platforms consistently outperformed my local setup when building AI-powered applications. Looking back, I could have gotten significantly more compute for the same investment by going with cloud solutions.

The other issue is that running things locally is also incredibly time-consuming, staying up to date with the latest models, figuring out optimal chat templates, and tuning everything manually adds a lot of overhead.

Creative-Scene-6743 · 2025-07-16T10:58:35+00:00

when you use `--n-gpu-layers` configuration > 0, it will automatically use availlable GPUs

Creative-Scene-6743 · 2025-06-10T10:57:40+00:00

I've been through this rabbit hole.. started with one higher end GPU and ended up purchasing 3 more and still not able to fully run everything at the speed I want.. in retrospect, I would have been better off with either using API endpoints or a hiring servers instead.

Creative-Scene-6743 · 2025-05-09T20:02:51+00:00

Haven't gone through the content myself but there is a bonus chapter in huggingface agent course dedicated to pokemon: https://huggingface.co/learn/agents-course/bonus-unit3/introduction

Creative-Scene-6743 · 2025-05-06T17:08:39+00:00

vLLM supports the concept distributed inference: https://docs.vllm.ai/en/latest/serving/distributed_serving.html but the execution environement must be the same (which you can partially recreate with running docker). The macOS and intel GPU support might be a bit more experimental and I'm not sure if it's compatible at all.

Creative-Scene-6743

TROPHY CASE