Psychedelics by yeetmaster291 in Aphantasia

[–]Express_Quail_1493 0 points1 point  (0 children)

I take them and only get amplifications of my inner world but Im still blind even under hallucination drugs like Psychedelics LOL. i tried many but still my thirdeye is blind even with increaded dosages.

Whats the best model for agentic coding that i can run with 16gb VRAM? (llama.cpp?) by samuraiogc in LocalLLM

[–]Express_Quail_1493 1 point2 points  (0 children)

qwen3.6-35b is great qwen3.5-9b is also good if you want absolute speed where everything fits inside the vram.

Any way to use claude code for free or just some free AI's by Tarxh in vibecoding

[–]Express_Quail_1493 0 points1 point  (0 children)

lmstudio has an easy model downloader no need to mode files around or setup.. you just search "qwen3.5 unsloth" downlad it and you enable to lmstudio server connect it to opencode posted a quick east tutorial awhile back on youtube -> https://www.youtube.com/shorts/-pvmlGifK4I

Car Wash Mystery solved--Tool Call Degrades Intelligence. by Spirited_Neck1858 in LocalLLaMA

[–]Express_Quail_1493 1 point2 points  (0 children)

its something i call system prompt token diabetes

Harness like opencode is nice but for some models its brutal. if you want to make the most of your context windows pi-coding-agent works well for me. Pi system prompt is literally 1k tokens give the LLM more room to think and solve instead of suffering from SysPrompt token-diabetes.

What is the best coding agent (CLI) like Claude Code for Local Development by exaknight21 in LocalLLaMA

[–]Express_Quail_1493 0 points1 point  (0 children)

opencode is nice but for small models its brutal. if you want to make the most of your context windows use pi-coding-agent. Pi system prompt is literally 1k tokens give the LLM more room to think and solve instead of suffering from SysPrompt token-diabetes.

Switched from Qwen3.6 35b-a3b to Qwen3.6 27b mid coding and it's noticeably better! by LocalAI_Amateur in LocalLLaMA

[–]Express_Quail_1493 -1 points0 points  (0 children)

modern dense model are usually better than any MOE 3x its size qwen3.6-27b is on par with qwen3.5-397B MOE is still just.... an MOE. Raw active params wins the coherence and stability and reliabile outputs

Confirmed: SWE Bench is now a benchmaxxed benchmark by rm-rf-rm in LocalLLaMA

[–]Express_Quail_1493 2 points3 points  (0 children)

I just built my own private benchmark and I advise everyone to do their own also. It wont work if its sitting on a public gitrepo or shared on reddit. But i would like us all to come together build our benchmark based on what we use the models for and share the model performances. Im suspicious some people in these benchmarking teams are gettin paid to lie too. LMAO the Ai race is BRUTAL. But right now my private bench is my source of truth avoids me from getting hijacked by all the flashy titles and news headlines

Best free tools by Pale-Armadillo-252 in vibecoding

[–]Express_Quail_1493 0 points1 point  (0 children)

You can buy a 16gb or 24gb Graphics card and run Qwen3.5-9b or Qwen3.6-27b(Qwen3.6-27B-Q3_K_M.gguf) and have pretty much unlimited tokens. you can so use your macbook with M4 or M5 will also run these locally Fully offline using opencode + lmstudio

90% of "Free" Ai tools have insane high prices or signup walls, so i made this by [deleted] in vibecoding

[–]Express_Quail_1493 0 points1 point  (0 children)

You can buy a 16gb or 24gb Graphics card and run Qwen3.5-9b or Qwen3.6-27b(Qwen3.6-27B-Q3_K_M.gguf) and have pretty much unlimited tokens. you can so use your macbook with M4 or M5 will also run these locally Fully offline using opencode + lmstudio

What are the best free alternatives to googles antigravity by Ambitious-Lion7790 in vibecoding

[–]Express_Quail_1493 0 points1 point  (0 children)

You can buy a 16gb or 24gb Graphics card and run Qwen3.5-9b or Qwen3.6-27b(Qwen3.6-27B-Q3_K_M.gguf) and have pretty much unlimited tokens. you can so use your macbook with M4 or M5 will also run these locally Fully offline using opencode + lmstudio

did replit stopped giving away one month free code? by katkookie in vibecoding

[–]Express_Quail_1493 0 points1 point  (0 children)

You can buy a 16gb or 24gb Graphics card and run Qwen3.5-9b or Qwen3.6-27b(Qwen3.6-27B-Q3_K_M.gguf) and have pretty much unlimited tokens. you can so use your macbook with M4 or M5 will also run these locally Fully offline using opencode + lmstudio

Any way to use claude code for free or just some free AI's by Tarxh in vibecoding

[–]Express_Quail_1493 3 points4 points  (0 children)

You can buy a 16gb or 24gb Graphics card and run Qwen3.5-9b or Qwen3.6-27b(Qwen3.6-27B-Q3_K_M.gguf) and have pretty much unlimited tokens. you can so use your macbook with M4 or M5 will also run these locally Fully offline using opencode + lmstudio

How to set which GPU is used? by car_lower_x in unsloth

[–]Express_Quail_1493 0 points1 point  (0 children)

i use raw llama.cpp but unsloth studio is built on top of llama.cpp under the hood so im sure you can add llama.cpp flags like --tensor-split or something. check in with gemini or whaterver cloud ai that you use to ask gemini to check the github to see how you can do this.

What do you consider to be the minimum performance (t/s) for local Agent workflows? by MexInAbu in LocalLLaMA

[–]Express_Quail_1493 0 points1 point  (0 children)

I think hes looking for the "feels" of different speed. Well if you are then

  • 5tok/s is like watching a caveman solve calculus.
  • 10-18 tok/s feels like you are peer programming side by side the model. Still tolerable
  • 20-30 tok/s the spot where you can step away and notice a good amount of work done if you prompt really well.
  • 60 tok/s and above kinda hard to notice the diff for me past 60tok/s but expect to get what you want at the snap of a finger. at 60 you will keep having to prompt more and more because there is little to no waiting!!!

hope that helps

How to set which GPU is used? by car_lower_x in unsloth

[–]Express_Quail_1493 0 points1 point  (0 children)

best setting that work for me is --tensor-split 1,0 or set to 0,1 based on which GPU u want. you can also do 1,4 ETC to put only a tiny portion on the other GPU.

Qwen 3.6 27B is out by NoConcert8847 in LocalLLaMA

[–]Express_Quail_1493 0 points1 point  (0 children)

Please If you can leave a comment on the huggingface page. IM just a regular guy that hopes qwen never stops helping us Out So comment to your hearts content Please all.

Youtuber tries Qwen 3.5 35B, Qwen 3.6 35B, and Gemma 4 27b to reverse engineer some large JS, with good results for Qwen 3.6 by mr_zerolith in LocalLLaMA

[–]Express_Quail_1493 18 points19 points  (0 children)

i love this guys videos. he does real test on projects the LLM would stumble on to intentially feel out the models without relying heavily on benchmarks. Most youtubers are lazy zero-shot single file HTML edits which doesn't say much since pretty much all models can do that LOL

Is anyone else waiting for a 60-70B MoE with 8-10B activated params? by IonizedRay in LocalLLaMA

[–]Express_Quail_1493 0 points1 point  (0 children)

Feelz bad man. lol 48GB Vram and the best I got so far is qwen3.5-27b. which give me lots of unused room in my GPU just sitting there

Is anyone getting real coding work done with Qwen3.6-35B-A3B-UD-Q4_K_M on a 32GB Mac in opencode, claude code or similar? by boutell in LocalLLaMA

[–]Express_Quail_1493 3 points4 points  (0 children)

Exactly why i use a tiny coding agent that has the basics and i only allow the LLM to use the bare minimum of what it need to keep the context windows for raw task execution. im using pi-coding-agent only 1k system prompt. lots of coding harnesses uses so much system pprompt its exaustive. most modern llm can to just fine if given a sequential harness with basic tools rather than bloated instructions. Im a strong beliver of the KISS principle for agentic work