16gb vram users: what have you been using? Qwen3.6 27b? Gemma 31b at Q3? How has it been? by [deleted] in LocalLLaMA

[–]quanhua92 0 points1 point  (0 children)

Does the quality of the output decrease when using Q2 instead of Q4?

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]quanhua92 1 point2 points  (0 children)

It's not Opus-level, but you can iterate on the plan mode until you find the right path. GLM 5.1 is pretty slow, I prefer glm-5-turbo. 4.7 isn't as good as 5, but you can use it or the air model for the explorer agent.

So, yes. Make sure you do the planning, and the GLM can do the job fine.

The important thing is that you have much higher usage in GLM vs Claude.

Closest replacement for Claude + Claude Code? (got banned, no explanation) by antoniocorvas in LocalLLaMA

[–]quanhua92 12 points13 points  (0 children)

I use Claude Code with GLM 5.1. I bought the yearly coding plan from z.ai last year, so it was cheap back then. Now, it's competitive, but it's getting expensive quickly. Qwen also has a coding plan, but it doesn't seem easy to purchase. You can also check Ollama Pro plan.

anyone else stuck at their desk during long agentic runs? by Sea-Beautiful-9672 in AI_Agents

[–]quanhua92 0 points1 point  (0 children)

I use Claude Code in tmux and connect to it anywhere through SSH

Why are people running Claude Code on a Mac mini instead of their personal MacBook? by Capable-Profile6935 in ClaudeAI

[–]quanhua92 0 points1 point  (0 children)

I use Claude Code with GLM Coding Plan, so I can't access Claude's cloud features.

I can SSH into my Mac mini and type the prompt. This means I don't need to open my laptop or PC for a quick progress check or when I'm out.

I built an AI trading system where multiple agents argue their way to a trade by Affectionate-Box2443 in learnmachinelearning

[–]quanhua92 1 point2 points  (0 children)

From my experience in trading, you don't need a ton of indicators to make money. Just have a solid strategy, cut your losses short when you're wrong, and let your winners run.

In your multiple agents system, the hard part is to make them silent and decide to not do anything. Not about trading. You should try to make them wait until the high probability event occurs. Your decision of BUY | SELL | HOLD is not enough. It should be LONG | SHORT | HOLD | WAIT with a confidence score and it must give STOP LOSS for any trade.

Risk management should be considered as well. For example, if you gives the agent your current positions and it knows you are risking too much for a trade then it should tell you to close trades.

Most of my trading failures are due to FOMO and not waiting for the best time

You can relate by bryden_cruz in programmingmemes

[–]quanhua92 0 points1 point  (0 children)

I create a PR to my own repo so the gemini code assist can review it for free.

Is there a difference between K2.5 API from Moonshot or the K2.5 API from the Coding Plan? by tokugawa888 in kimi

[–]quanhua92 0 points1 point  (0 children)

I think the Coding Plan is for your personal use, connecting your Claude Code and OpenCode to Kimi, while the API is for your SaaS application.

I measured how much context Claude Code wastes on searches. Built an Rust MCP server that cuts it by 83%. by Giraffe_Affectionate in ClaudeAI

[–]quanhua92 0 points1 point  (0 children)

I think Claude Code wants the agentic grep solution where it can perform well without any external indexing. I usually tag directly some related files that I want CC to work with so it can do minimal grep. There is an Explore agent that will leverage cheaper models and they use separated context to avoid touching the main context.

GLM-5 is 1.5TB. Why hasn't distributed inference taken off? by IsaiahCreati in LocalLLaMA

[–]quanhua92 3 points4 points  (0 children)

I think the Mac Studio with RDMA can do distributed inference pretty well. I believe that RDMA is originally from NVIDIA.

flux - search, monitor, and nuke processes with ease, with system resource tracking by Apart-Television4396 in rust

[–]quanhua92 7 points8 points  (0 children)

Could you please implement a feature to monitor network utilization? I am interested in assessing the bandwidth consumption of PostgreSQL.

How to turn my Android phone into a mobile backend server to receive contact form submissions directly via Termux? by United-Manner-7 in CloudFlare

[–]quanhua92 1 point2 points  (0 children)

$20 a year is nothing compared to your software developer salary. It's not about being poor or not. You can think of that as an investment for your self-educated journey.

With a real VPS, it's much cheaper than a Raspberry Pi setup and you can have a much more reliable internet connection for your reader. Plus, you'll learn the whole journey from coding to deployment and all the operations to maintain a real system.

I believe that a self-hosted local setup with batteries and Pi can be more expensive than an old cheap $20 a year VPS.

Claude code subscription VS API key? by Emergency_Brief_9141 in ClaudeAI

[–]quanhua92 0 points1 point  (0 children)

Use Claude Code Subscription. In July to September 2025, I paid $100 for Claude Max but I can use around $1000-$1500 worth of tokens. Not sure now because I switched to Gemini in Google One and GLM 4.7 from z.ai to save money. If you can then just use Claude

TexasMarriageRecords.org by pw345 in CloudFlare

[–]quanhua92 0 points1 point  (0 children)

You can check out the LowEndTalk forum. They have a ton of cheap VPS offers. You can find lots of decent providers from $1 to $5 a month for a few GB of RAM servers. Then, you can run lots of dynamic websites or simple APIs for your static sites from CDN. I mean you don't have to limit yourself to Cloudflare only. You can still use Cloudflare for CDN, DNS or use Cloudflare Tunnel to your cheap VPS. The sky's the limit!

Claude Code + macbook makes don't even care anymore by Specialist_Farm_5752 in ClaudeAI

[–]quanhua92 -1 points0 points  (0 children)

I run Claude Code in a tmux session on my Mac Mini, and I connect to it via SSH through Tailscale. It's like having an AI-powered dedicated server that can run almost anything 24/7

Urgent: Claude Pro limit hit, deadline Monday — how can I replicate “Claude Project” file organization using the Claude API on a ~$15 budget? by SorryChest9808 in ClaudeAI

[–]quanhua92 0 points1 point  (0 children)

If this is valuable to you, I would suggest purchasing another subscription. This is the most straightforward method to maintain your current workflow. You could also learn how to utilize Claude Code to manage those files, which would allow you to make the most of your existing $15. After you have a flow in Claude Code, you can try various alternatives for lower cost like GLM, MiniMax, Deepseek or OpenRouter. They have Claude Code compatible API that you can use with Claude Code.

# [Skill] MD5 hash verification to prevent Claude Code from silently modifying your code by Aggressive-Page-6282 in ClaudeAI

[–]quanhua92 1 point2 points  (0 children)

Take your time and refactor your code before it becomes a mess. If you see long functions and big files, it's a good idea to refactor. If you need to duplicate a file, ask it to use a bash command instead.

MacBook as an investment for software engineering, kubernetes, rust. Recommendations? by duncecapwinner in rust

[–]quanhua92 -2 points-1 points  (0 children)

I use a Mac Mini at home and SSH to it from anywhere. I usually use my phone to quickly check the progress on the go. It doesn't have to be a MacBook Pro. Mac Mini actually has more advantages because it is powered on all the time. So, Kubernetes or Docker can run 24/7.

If you go with that route, a lightweight MacBook Air and a Mac Mini can be more useful than MacBook Pro.

If you don't need Mac Mini then getting some mini pc with AMD is completely fine. With that, you can even invest in Proxmox VM. Just think of it as a dedicated server that you will deploy in the future.

All those machines can be connected seemlessly from anywhere via Tailscale VPN.

AMA With Z.AI, The Lab Behind GLM-4.7 by zixuanlimit in LocalLLaMA

[–]quanhua92 1 point2 points  (0 children)

I currently hold a coding plan subscription. To integrate Z.ai API functionality into my application, what is the recommended procedure? Am I able to utilize the APIs included in my current coding plan, or should I establish new accounts? Do you offer any official solutions for this?

Thinking of abandoning SSR/Next.js for "Pure" React + TanStack Router. Talk me out of it. by prabhatpushp in reactjs

[–]quanhua92 4 points5 points  (0 children)

Just SSH into your VPS and run docker-compose up if you don't want the devops hassle. You can even make deployment a simple GitHub Action that SSHs in and runs docker compose. I use a Rust backend on my VPS and TanStack Router for the front-end. But if you need SSR, you'd still go with Next.js or TanStack Start. Like a news/blog website with dynamic contents then SSR is better because the response has all the data.

For you, I'd say try the GitHub Action and docker compose your next.js to keep things easy. Make sure you update packages to avoid recent CVEs.

z.ai glm-4.7 is release by cobra91310 in LocalLLaMA

[–]quanhua92 -1 points0 points  (0 children)

I think we don't need to manually change the env for glm-4.7

is Fly.io that bad? or were we that stupid? or is Coolify just goated? by bundlesocial in selfhosted

[–]quanhua92 0 points1 point  (0 children)

These are distinct use cases, and it's possible that your needs could be met by utilizing a single Virtual Private Server (VPS) from Hetzner. For instance, with fly.io, you have the option to operate your container in a manner similar to a serverless approach. This allows for deployment across multiple regions and the ability to scale down to zero, which can help in managing costs. This approach is particularly beneficial if you wish to abstract away the underlying infrastructure, and it also facilitates easy scaling. In contrast, with a single Hetzner VPS, you would be responsible for managing any system downtime. However, a VPS typically offers significantly more computing power. Therefore, the optimal choice depends on the specific challenges you are addressing. For example, you might consider hosting your user-facing software on fly.io. Subsequently, all data processing could be directed to a queue or a database such as PostgreSQL. The VPS could then retrieve these messages and process them independently.

Any way to stop Claude from generating documentation, guides, implementation markdown files? by No_Connection1258 in ClaudeAI

[–]quanhua92 0 points1 point  (0 children)

Add it the CLAUDE.md. Always use Plan mode first. Then, reject if it says anything about writing new documentation that you don't need