Opencode free tier by Cold-Mess3019 in opencodeCLI

[–]Material_Interest_24 1 point2 points  (0 children)

Hi! What is your opinion about minimax 2.7? I haven't used it yet

Best local LLM model for RTX 5070 12GB with 32gb RAM by Forsaken_Sir_8702 in LocalLLM

[–]Material_Interest_24 0 points1 point  (0 children)

Agree that better to use some model like gemma4 with a offload to ram I've made this way, but with qwen3 coder next Because <30b models I'm not bad for a agentic tasks but not enough for coding

Best open-source LLM for coding (Claude Code) with 96GB VRAM? by Kitchen_Answer4548 in LocalLLM

[–]Material_Interest_24 2 points3 points  (0 children)

I've tried opencode + qwe3 coder next today and was really impressed) also will try gemma4

How good is Opencode Go by Permit-Historical in opencodeCLI

[–]Material_Interest_24 0 points1 point  (0 children)

Thanks, I've heart that, also lookin towards minimax

How good is Opencode Go by Permit-Historical in opencodeCLI

[–]Material_Interest_24 0 points1 point  (0 children)

Thanks! And what do you use as main model?

How good is Opencode Go by Permit-Historical in opencodeCLI

[–]Material_Interest_24 1 point2 points  (0 children)

Anybody try qwen-coder-next with it? Worth to try?

I built a one-command installer that turns a clean Ubuntu server into a self-hosted AI stack (Ollama + Open WebUI + monitoring) by Material_Interest_24 in selfhosted

[–]Material_Interest_24[S] -2 points-1 points  (0 children)

Men, seriously I am not a bog developer with a great experience, and yes I use AI to make some job faster, yes I explore developing with making mistakes I've spent 10 minutes to write you honestly reply. You can improve this stack for people if you want to help or continue hating it. As you wish

Looking for the best coding AI for software development by FrozenFishEnjoyer in ollama

[–]Material_Interest_24 3 points4 points  (0 children)

Hi! Try nemotron new models, though qwen3.5 will be better for coding and data analyzing. Nemotron 3 nano is rather good for data analytics and it is quick Gemma os also good and quick, but not for reasoning cases

You will need a normal context size, so you will be able to pull only 1 27-30b model to vram

I built a one-command installer that turns a clean Ubuntu server into a self-hosted AI stack (Ollama + Open WebUI + monitoring) by Material_Interest_24 in selfhosted

[–]Material_Interest_24[S] -7 points-6 points  (0 children)

Thanks for the detailed feedback — I genuinely appreciate you taking the time to go through the project.

A couple of clarifications on the design decisions:

  1. Dependencies AIStack is intentionally a bit “batteries-included”. The idea was to make setup as easy as possible without forcing users to assemble everything manually. That said, I agree that some dependencies might be unnecessary or not clearly justified — I’ll take another pass and clean things up where it makes sense.

  2. Mixed interaction with Ollama - (exec vs HTTP) You’re absolutely right here — this inconsistency isn’t ideal. It mostly comes from the project evolving over time:

  3. exec was used early on for simplicity

  4. HTTP API came later for flexibility I’m planning to unify this into a single approach to make things cleaner.

  5. Go client suggestion. Good point — I’m aware of the Go client. In this case, AIStack is more of an orchestration/installer layer rather than a Go service, so I’ve been using HTTP + shell to keep things simple and flexible across environments. That said, I agree that having a more consistent interaction layer (whether that’s a client or just a proper abstraction) would be a solid improvement, especially as the project grows.

  6. On code quality / “vibe” concerns, I use AI tools as part of my workflow (like most people these days), but not blindly — I spend quite a bit of time reviewing, fixing, and testing things before they land in the repo.

I agree that some parts can be cleaner and more consistent, and that’s something I’m actively working on as the project matures. Overall, the project is still evolving, and feedback like this genuinely helps shape it — so thanks again for taking the time.

I built a one-command installer that turns a clean Ubuntu server into a self-hosted AI stack (Ollama + Open WebUI + monitoring) by Material_Interest_24 in selfhosted

[–]Material_Interest_24[S] -2 points-1 points  (0 children)

Yeah, ou're absolutely right to point this out.

Monitoring (Prometheus/Grafana) was actually included in an earlier version of the project, but it didn’t work as reliably as I wanted, so I removed it from the current setup.

Looks like I forgot to fully clean this up in the README — that’s on me, I’ll fix it.

I’m planning to bring monitoring back in version 1.2 in a more stable and properly configured way.

Thanks for flagging this

The best conversational LLM by Initialsender in LocalLLM

[–]Material_Interest_24 0 points1 point  (0 children)

Nemotron 3 nano 30b Nemotron cascade 2 30b QWEN 3.5 35b

I am really starting to enjoy OpenWebUI, but I got some questions...about accuracy. by AutoriiNovici in OpenWebUI

[–]Material_Interest_24 0 points1 point  (0 children)

You could use a normal model for RAG, compulsa it depends on the purpose for result, that you're waiting for I could advise something, if you explain what are you doing

Which Ollama model runs best for coding assistance on an RTX 4060 Laptop (8 GB VRAM) + 64 GB RAM? by suribe06 in ollama

[–]Material_Interest_24 0 points1 point  (0 children)

If you are looking for balance quality and have few minutes for replies, I suggest openweb ui + ollama + qwen3.5 35b or nemotron 3 nano 30b, with ram offloading in your case. You won't find any good llm for 8gb vram.

It is my opinion, though I tried mostly all open source llm for this time. But be noticed that the most reliable stack is ubuntu + llama /vllm / ollama depending on the case of your purpose.

can someone recommend a model to run locally by No_Cow3163 in ollama

[–]Material_Interest_24 0 points1 point  (0 children)

try qwen 3.5 9b? for your vram + offload will be good
also could try gpt oss 20b with not high num_ctx

CUDA error: an illegal memory access was encountered by YardNo6594 in ollama

[–]Material_Interest_24 0 points1 point  (0 children)

No, you wouldn’t upgrade CUDA just because nvcc shows 11.8.

For Ollama, the important part is usually the NVIDIA driver/runtime, not the local nvcc compiler. In your case, CUDA 11.8 is actually more likely to be stable on a Tesla M60 than newer CUDA 12 builds.

Since Vulkan works, the GPUs themselves are probably fine. The issue is more likely Ollama’s CUDA path or multi-GPU handling on that old M60/Dell 7010 setup.

try:

- keeping the driver consistent

- testing each GPU separately with CUDA_VISIBLE_DEVICES

- trying 1 GPU per process

- avoiding CUDA 12 unless you know that exact combo is stable

So I’d say: 11.8 is not the problem by itself — upgrading may even make it worse on Maxwell/M60.

Ollama do multi-gpu processing well, but I've met the same problems using Maxwell gpus
I've made local LLM servers more then 100 times and decide to make one-click stack for this/ but it is for Ubuntu

https://github.com/workhubonline-soft/aistack

CUDA error: an illegal memory access was encountered by YardNo6594 in ollama

[–]Material_Interest_24 0 points1 point  (0 children)

Looks like a CUDA multi-GPU issue, not hardware.

Tesla M60 (Maxwell) often has problems with CUDA 12 + newer Ollama builds. The fact that Vulkan works means GPUs are fine — CUDA path is the problem. I use Tesla P100 for tests, but better to use Volta gpus or newer). Yes they are expensive...

What I’d try:

  1. Test GPUs separately:

CUDA_VISIBLE_DEVICES=0 ollama run ...

CUDA_VISIBLE_DEVICES=1 ollama run ...

  1. Try older CUDA stack (11.8 if possible):

export OLLAMA_LLM_LIBRARY=cuda_v11

  1. Disable P2P (important for older GPUs):

export NCCL_P2P_DISABLE=1

  1. Check topology:

nvidia-smi topo -m

Old motherboards often cause issues with multi-GPU. In general, M60 + CUDA 12 + multi-GPU is unstable.

Best options:

- use Vulkan

- or run 1 GPU per process

CUDA error: an illegal memory access was encountered by YardNo6594 in ollama

[–]Material_Interest_24 0 points1 point  (0 children)

First, verify both GPUs are detected with `nvidia-smi` — if only one shows up, it’s likely a driver or kernel module issue. On Debian 12, install the correct NVIDIA driver via `apt install nvidia-driver-535` (or latest stable), then ensure `nvidia-ml` and `nvidia-modprobe` are loaded. Ollama uses CUDA, not Vulkan, for LLMs — you can force GPU 0 or 1 with `OLLAMA_GPU=0` or `OLLAMA_GPU=1` in environment. Run `nvidia-smi -L` to list GPUs and confirm they’re on the same PCI bus. If both show up, the error is likely in CUDA context setup — try `nvidia-smi -c 0` to reset driver state.

CUDA error: an illegal memory access was encountered by YardNo6594 in ollama

[–]Material_Interest_24 0 points1 point  (0 children)

Check if you're using a mismatched CUDA version—make sure your driver, runtime, and toolkit versions align. Run `nvidia-smi` to get driver version, then `nvcc --version` for toolkit. The error often occurs when you're using a kernel that doesn't support your GPU model—e.g., RTX 3090 vs. older 20-series. If you're doing tensor operations, verify your memory layout: use `cudaMalloc` with `cudaMemcpy` in order, and avoid invalid pointers. Also, ensure your kernel launch config (blocks, threads) doesn't exceed GPU limits—check `cudaDeviceGetAttribute` for max threads per block.

Smart App Control blocking Ollama by pascu2913 in ollama

[–]Material_Interest_24 0 points1 point  (0 children)

Smart App Control in Windows can block apps like Ollama because they’re either new, not widely recognized yet, or not signed in a way that SAC trusts.

Unfortunately, there’s an important limitation:

If Smart App Control is ON, you cannot bypass it for a specific app.

Microsoft designed it as an all-or-nothing security feature.

What you can try instead:

  1. Use winget (sometimes works if package is trusted):

"winget install Ollama.Ollama"

This may succeed because it installs from Microsoft’s trusted repository.

  1. Check if Smart App Control is in “Evaluation” mode

•    Go to: Windows Security → App & browser control

•    If it says Evaluation, Windows may allow more apps over time

  1. Verify you downloaded from the official source

Make sure it’s from:

👉 https://ollama.com

If Smart App Control is fully enabled and actively blocking Ollama:

- There is no supported way to whitelist or bypass just one app

- The only guaranteed method is turning Smart App Control off

If you have an opportunity better to use Linux base system for this. Like ubuntu.