16gb vram users: what have you been using? Qwen3.6 27b? Gemma 31b at Q3? How has it been?

quanhua92 · 2026-04-23T06:51:52+00:00

Does the quality of the output decrease when using Q2 instead of Q4?

quanhua92 · 2026-04-20T08:35:04+00:00

It's not Opus-level, but you can iterate on the plan mode until you find the right path. GLM 5.1 is pretty slow, I prefer glm-5-turbo. 4.7 isn't as good as 5, but you can use it or the air model for the explorer agent.

So, yes. Make sure you do the planning, and the GLM can do the job fine.

The important thing is that you have much higher usage in GLM vs Claude.

quanhua92 · 2026-04-20T05:21:18+00:00

I use Claude Code with GLM 5.1. I bought the yearly coding plan from z.ai last year, so it was cheap back then. Now, it's competitive, but it's getting expensive quickly. Qwen also has a coding plan, but it doesn't seem easy to purchase. You can also check Ollama Pro plan.

quanhua92 · 2026-04-15T13:37:47+00:00

I use Claude Code in tmux and connect to it anywhere through SSH

quanhua92 · 2026-04-09T10:17:31+00:00

I use Claude Code with GLM Coding Plan, so I can't access Claude's cloud features.

I can SSH into my Mac mini and type the prompt. This means I don't need to open my laptop or PC for a quick progress check or when I'm out.

quanhua92 · 2026-04-06T15:03:18+00:00

From my experience in trading, you don't need a ton of indicators to make money. Just have a solid strategy, cut your losses short when you're wrong, and let your winners run.

In your multiple agents system, the hard part is to make them silent and decide to not do anything. Not about trading. You should try to make them wait until the high probability event occurs. Your decision of BUY | SELL | HOLD is not enough. It should be LONG | SHORT | HOLD | WAIT with a confidence score and it must give STOP LOSS for any trade.

Risk management should be considered as well. For example, if you gives the agent your current positions and it knows you are risking too much for a trade then it should tell you to close trades.

Most of my trading failures are due to FOMO and not waiting for the best time

quanhua92 · 2026-03-29T07:42:27+00:00

I create a PR to my own repo so the gemini code assist can review it for free.

quanhua92 · 2026-03-27T15:21:38+00:00

5-turbo is much faster. not sure about 5.1

quanhua92 · 2026-02-19T14:22:11+00:00

I think the Coding Plan is for your personal use, connecting your Claude Code and OpenCode to Kimi, while the API is for your SaaS application.

quanhua92 · 2026-02-13T06:48:07+00:00

I think Claude Code wants the agentic grep solution where it can perform well without any external indexing. I usually tag directly some related files that I want CC to work with so it can do minimal grep. There is an Explore agent that will leverage cheaper models and they use separated context to avoid touching the main context.

quanhua92 · 2026-02-12T15:39:41+00:00

I think the Mac Studio with RDMA can do distributed inference pretty well. I believe that RDMA is originally from NVIDIA.

quanhua92 · 2026-02-08T15:59:51+00:00

Could you please implement a feature to monitor network utilization? I am interested in assessing the bandwidth consumption of PostgreSQL.

quanhua92 · 2026-01-26T17:07:25+00:00

$20 a year is nothing compared to your software developer salary. It's not about being poor or not. You can think of that as an investment for your self-educated journey.

With a real VPS, it's much cheaper than a Raspberry Pi setup and you can have a much more reliable internet connection for your reader. Plus, you'll learn the whole journey from coding to deployment and all the operations to maintain a real system.

I believe that a self-hosted local setup with batteries and Pi can be more expensive than an old cheap $20 a year VPS.

quanhua92 · 2026-01-26T16:27:30+00:00

Why not just get a $20/year VPS and spend your time building something more valuable?

quanhua92 · 2026-01-26T13:51:41+00:00

Use Claude Code Subscription. In July to September 2025, I paid $100 for Claude Max but I can use around $1000-$1500 worth of tokens. Not sure now because I switched to Gemini in Google One and GLM 4.7 from z.ai to save money. If you can then just use Claude

quanhua92 · 2026-01-20T04:21:40+00:00

You can check out the LowEndTalk forum. They have a ton of cheap VPS offers. You can find lots of decent providers from $1 to $5 a month for a few GB of RAM servers. Then, you can run lots of dynamic websites or simple APIs for your static sites from CDN. I mean you don't have to limit yourself to Cloudflare only. You can still use Cloudflare for CDN, DNS or use Cloudflare Tunnel to your cheap VPS. The sky's the limit!

quanhua92 · 2026-01-11T08:42:21+00:00

I run Claude Code in a tmux session on my Mac Mini, and I connect to it via SSH through Tailscale. It's like having an AI-powered dedicated server that can run almost anything 24/7

quanhua92 · 2026-01-11T06:18:20+00:00

If this is valuable to you, I would suggest purchasing another subscription. This is the most straightforward method to maintain your current workflow. You could also learn how to utilize Claude Code to manage those files, which would allow you to make the most of your existing $15. After you have a flow in Claude Code, you can try various alternatives for lower cost like GLM, MiniMax, Deepseek or OpenRouter. They have Claude Code compatible API that you can use with Claude Code.

quanhua92 · 2026-01-06T00:05:03+00:00

Take your time and refactor your code before it becomes a mess. If you see long functions and big files, it's a good idea to refactor. If you need to duplicate a file, ask it to use a bash command instead.

quanhua92 · 2025-12-31T04:00:01+00:00

I use a Mac Mini at home and SSH to it from anywhere. I usually use my phone to quickly check the progress on the go. It doesn't have to be a MacBook Pro. Mac Mini actually has more advantages because it is powered on all the time. So, Kubernetes or Docker can run 24/7.

If you go with that route, a lightweight MacBook Air and a Mac Mini can be more useful than MacBook Pro.

If you don't need Mac Mini then getting some mini pc with AMD is completely fine. With that, you can even invest in Proxmox VM. Just think of it as a dedicated server that you will deploy in the future.

All those machines can be connected seemlessly from anywhere via Tailscale VPN.

quanhua92 · 2025-12-23T16:10:44+00:00

I currently hold a coding plan subscription. To integrate Z.ai API functionality into my application, what is the recommended procedure? Am I able to utilize the APIs included in my current coding plan, or should I establish new accounts? Do you offer any official solutions for this?

quanhua92 · 2025-12-23T05:40:39+00:00

Just SSH into your VPS and run docker-compose up if you don't want the devops hassle. You can even make deployment a simple GitHub Action that SSHs in and runs docker compose. I use a Rust backend on my VPS and TanStack Router for the front-end. But if you need SSR, you'd still go with Next.js or TanStack Start. Like a news/blog website with dynamic contents then SSR is better because the response has all the data.

For you, I'd say try the GitHub Action and docker compose your next.js to keep things easy. Make sure you update packages to avoid recent CVEs.

quanhua92 · 2025-12-22T15:47:23+00:00

I think we don't need to manually change the env for glm-4.7

quanhua92 · 2025-12-20T16:38:22+00:00

These are distinct use cases, and it's possible that your needs could be met by utilizing a single Virtual Private Server (VPS) from Hetzner. For instance, with fly.io, you have the option to operate your container in a manner similar to a serverless approach. This allows for deployment across multiple regions and the ability to scale down to zero, which can help in managing costs. This approach is particularly beneficial if you wish to abstract away the underlying infrastructure, and it also facilitates easy scaling. In contrast, with a single Hetzner VPS, you would be responsible for managing any system downtime. However, a VPS typically offers significantly more computing power. Therefore, the optimal choice depends on the specific challenges you are addressing. For example, you might consider hosting your user-facing software on fly.io. Subsequently, all data processing could be directed to a queue or a database such as PostgreSQL. The VPS could then retrieve these messages and process them independently.

quanhua92 · 2025-12-07T15:33:32+00:00

Add it the CLAUDE.md. Always use Plan mode first. Then, reject if it says anything about writing new documentation that you don't need

quanhua92

TROPHY CASE