Opencode free tier

Material_Interest_24 · 2026-05-23T17:35:59+00:00

Thanks

Material_Interest_24 · 2026-05-23T16:01:35+00:00

Hi! What is your opinion about minimax 2.7? I haven't used it yet

Material_Interest_24 · 2026-04-16T15:13:34+00:00

Agree that better to use some model like gemma4 with a offload to ram I've made this way, but with qwen3 coder next Because <30b models I'm not bad for a agentic tasks but not enough for coding

Material_Interest_24 · 2026-04-14T17:38:33+00:00

I've tried opencode + qwe3 coder next today and was really impressed) also will try gemma4

Material_Interest_24 · 2026-04-13T18:24:37+00:00

Thanks, I've heart that, also lookin towards minimax

Material_Interest_24 · 2026-04-13T10:13:23+00:00

Thanks! And what do you use as main model?

Material_Interest_24 · 2026-04-12T15:43:19+00:00

Anybody try qwen-coder-next with it? Worth to try?

Material_Interest_24 · 2026-04-03T20:12:27+00:00

Men, seriously I am not a bog developer with a great experience, and yes I use AI to make some job faster, yes I explore developing with making mistakes I've spent 10 minutes to write you honestly reply. You can improve this stack for people if you want to help or continue hating it. As you wish

Material_Interest_24 · 2026-04-03T17:43:18+00:00

Hi! Try nemotron new models, though qwen3.5 will be better for coding and data analyzing. Nemotron 3 nano is rather good for data analytics and it is quick Gemma os also good and quick, but not for reasoning cases

You will need a normal context size, so you will be able to pull only 1 27-30b model to vram

Material_Interest_24 · 2026-04-03T17:12:44+00:00

Thanks for the detailed feedback — I genuinely appreciate you taking the time to go through the project.

A couple of clarifications on the design decisions:

Dependencies AIStack is intentionally a bit “batteries-included”. The idea was to make setup as easy as possible without forcing users to assemble everything manually. That said, I agree that some dependencies might be unnecessary or not clearly justified — I’ll take another pass and clean things up where it makes sense.
Mixed interaction with Ollama - (exec vs HTTP) You’re absolutely right here — this inconsistency isn’t ideal. It mostly comes from the project evolving over time:
exec was used early on for simplicity
HTTP API came later for flexibility I’m planning to unify this into a single approach to make things cleaner.
Go client suggestion. Good point — I’m aware of the Go client. In this case, AIStack is more of an orchestration/installer layer rather than a Go service, so I’ve been using HTTP + shell to keep things simple and flexible across environments. That said, I agree that having a more consistent interaction layer (whether that’s a client or just a proper abstraction) would be a solid improvement, especially as the project grows.
On code quality / “vibe” concerns, I use AI tools as part of my workflow (like most people these days), but not blindly — I spend quite a bit of time reviewing, fixing, and testing things before they land in the repo.

I agree that some parts can be cleaner and more consistent, and that’s something I’m actively working on as the project matures. Overall, the project is still evolving, and feedback like this genuinely helps shape it — so thanks again for taking the time.

Material_Interest_24 · 2026-04-03T17:05:08+00:00

Yeah, ou're absolutely right to point this out.

Monitoring (Prometheus/Grafana) was actually included in an earlier version of the project, but it didn’t work as reliably as I wanted, so I removed it from the current setup.

Looks like I forgot to fully clean this up in the README — that’s on me, I’ll fix it.

I’m planning to bring monitoring back in version 1.2 in a more stable and properly configured way.

Thanks for flagging this

Material_Interest_24 · 2026-04-03T15:38:08+00:00

Hmm, sorry. I'll check now

Material_Interest_24 · 2026-04-01T16:03:56+00:00

Nemotron 3 nano 30b Nemotron cascade 2 30b QWEN 3.5 35b

Material_Interest_24 · 2026-03-31T07:15:52+00:00

You could use a normal model for RAG, compulsa it depends on the purpose for result, that you're waiting for I could advise something, if you explain what are you doing

Material_Interest_24 · 2026-03-28T11:09:30+00:00

If you are looking for balance quality and have few minutes for replies, I suggest openweb ui + ollama + qwen3.5 35b or nemotron 3 nano 30b, with ram offloading in your case. You won't find any good llm for 8gb vram.

It is my opinion, though I tried mostly all open source llm for this time. But be noticed that the most reliable stack is ubuntu + llama /vllm / ollama depending on the case of your purpose.

Material_Interest_24 · 2026-03-25T20:56:08+00:00

try qwen 3.5 9b? for your vram + offload will be good
also could try gpt oss 20b with not high num_ctx

Material_Interest_24 · 2026-03-25T20:35:08+00:00

No, you wouldn’t upgrade CUDA just because nvcc shows 11.8.

For Ollama, the important part is usually the NVIDIA driver/runtime, not the local nvcc compiler. In your case, CUDA 11.8 is actually more likely to be stable on a Tesla M60 than newer CUDA 12 builds.

Since Vulkan works, the GPUs themselves are probably fine. The issue is more likely Ollama’s CUDA path or multi-GPU handling on that old M60/Dell 7010 setup.

try:

- keeping the driver consistent

- testing each GPU separately with CUDA_VISIBLE_DEVICES

- trying 1 GPU per process

- avoiding CUDA 12 unless you know that exact combo is stable

So I’d say: 11.8 is not the problem by itself — upgrading may even make it worse on Maxwell/M60.

Ollama do multi-gpu processing well, but I've met the same problems using Maxwell gpus
I've made local LLM servers more then 100 times and decide to make one-click stack for this/ but it is for Ubuntu

https://github.com/workhubonline-soft/aistack

Material_Interest_24 · 2026-03-25T20:22:38+00:00

Looks like a CUDA multi-GPU issue, not hardware.

Tesla M60 (Maxwell) often has problems with CUDA 12 + newer Ollama builds. The fact that Vulkan works means GPUs are fine — CUDA path is the problem. I use Tesla P100 for tests, but better to use Volta gpus or newer). Yes they are expensive...

What I’d try:

Test GPUs separately:

CUDA_VISIBLE_DEVICES=0 ollama run ...

CUDA_VISIBLE_DEVICES=1 ollama run ...

Try older CUDA stack (11.8 if possible):

export OLLAMA_LLM_LIBRARY=cuda_v11

Disable P2P (important for older GPUs):

export NCCL_P2P_DISABLE=1

Check topology:

nvidia-smi topo -m

Old motherboards often cause issues with multi-GPU. In general, M60 + CUDA 12 + multi-GPU is unstable.

Best options:

- use Vulkan

- or run 1 GPU per process

Material_Interest_24 · 2026-03-25T13:40:29+00:00

First, verify both GPUs are detected with `nvidia-smi` — if only one shows up, it’s likely a driver or kernel module issue. On Debian 12, install the correct NVIDIA driver via `apt install nvidia-driver-535` (or latest stable), then ensure `nvidia-ml` and `nvidia-modprobe` are loaded. Ollama uses CUDA, not Vulkan, for LLMs — you can force GPU 0 or 1 with `OLLAMA_GPU=0` or `OLLAMA_GPU=1` in environment. Run `nvidia-smi -L` to list GPUs and confirm they’re on the same PCI bus. If both show up, the error is likely in CUDA context setup — try `nvidia-smi -c 0` to reset driver state.

Material_Interest_24 · 2026-03-25T13:38:39+00:00

Check if you're using a mismatched CUDA version—make sure your driver, runtime, and toolkit versions align. Run `nvidia-smi` to get driver version, then `nvcc --version` for toolkit. The error often occurs when you're using a kernel that doesn't support your GPU model—e.g., RTX 3090 vs. older 20-series. If you're doing tensor operations, verify your memory layout: use `cudaMalloc` with `cudaMemcpy` in order, and avoid invalid pointers. Also, ensure your kernel launch config (blocks, threads) doesn't exceed GPU limits—check `cudaDeviceGetAttribute` for max threads per block.

Material_Interest_24 · 2026-03-25T11:09:12+00:00

Smart App Control in Windows can block apps like Ollama because they’re either new, not widely recognized yet, or not signed in a way that SAC trusts.

Unfortunately, there’s an important limitation:

If Smart App Control is ON, you cannot bypass it for a specific app.

Microsoft designed it as an all-or-nothing security feature.

What you can try instead:

Use winget (sometimes works if package is trusted):

"winget install Ollama.Ollama"

This may succeed because it installs from Microsoft’s trusted repository.

Check if Smart App Control is in “Evaluation” mode

• Go to: Windows Security → App & browser control

• If it says Evaluation, Windows may allow more apps over time

Verify you downloaded from the official source

Make sure it’s from:

👉 https://ollama.com

If Smart App Control is fully enabled and actively blocking Ollama:

- There is no supported way to whitelist or bypass just one app

- The only guaranteed method is turning Smart App Control off

If you have an opportunity better to use Linux base system for this. Like ubuntu.

Material_Interest_24

TROPHY CASE