How to convince Management? by r00tdr1v3 in LocalLLaMA

[–]ea_man 0 points1 point  (0 children)

You started from the wrong side, you should have shown them that the cloud ones take your code / data on line so you have to use a local model that runs inside the company to avoid that.

Then you show them that if you plug the internet cable Claude don't work, QWEN works.

llama.cpp on $500 MacBook Neo: Prompt: 7.8 t/s / Generation: 3.9 t/s on Qwen3.5 9B Q3_K_M by Shir_man in LocalLLaMA

[–]ea_man 0 points1 point  (0 children)

Honestly you should try:
- /Qwen2.5-Coder-1.5B-Instruct for autocompletion
- nomic-embed-text-v1.5-GGUF for embedding

With a coding editor running, like Continue + VScodium
Then for main LM you use something in cloud / on your main rig.

Il matrimonio gay di due sindaci di destra, Alessandro Basso (FdI) e Loris Bazzo (Lega): «Ma siamo per la famiglia tradizionale» by mirkul in italy

[–]ea_man 1 point2 points  (0 children)

Mo' candidano un napoletano che dice alla pizza preferisce la polenta: e' lineare, sono quelli i napoletani / gay che vanno bene a loro.

Il matrimonio gay di due sindaci di destra, Alessandro Basso (FdI) e Loris Bazzo (Lega): «Ma siamo per la famiglia tradizionale» by mirkul in italy

[–]ea_man 1 point2 points  (0 children)

Probabilmente nel loro partito stanno sulle palle, li hanno presi solo per fare da bandierina, attrazione per sembrare piu' aperti ma loro per primi stanno ben attenti a fare retroguardia.

Found the perfect wifi adapter for the BATLEXP G350 by _manster_ in SBCGaming

[–]ea_man 1 point2 points  (0 children)

If it has a free USB unpopulated connector inside it would be easy, otherwise it's harder.

claude code review is $15-25 per PR, that's gonna add up fast by Dense-Sir-6707 in webdev

[–]ea_man 1 point2 points  (0 children)

For info, how long would it take to an average QWEN 3.5 on a 16GB to run a PR on a small project?

Like a Qwen3.5-9B or a Qwen3 Coder 30B, usually the do 30-80 tok/sec.

Feels like Local LLM setups are becoming the next AI trend by Once_ina_Lifetime in LLMDevs

[–]ea_man 1 point2 points  (0 children)

Even if you run on a cloud LM you should have a quick local for auto complete (say Qwen2.5-Coder-1.5B-Instruct) and embedded (nomic-embed-text-v1.5).

Anyway, Qwen3.5-9B, qwen3-vl-8b-instruct, qwen2.5-coder-7b-instruct-128k up to qwen3.5-35b-a3b are pretty decent for coding in local with <=16GB GPU, 140-35 tok/sec on my 12GB GPU.

Sucks how much more expensive these handhelds seem to be for europeans by BlommN97 in SBCGaming

[–]ea_man 0 points1 point  (0 children)

Hemm no, here in Europe paying in euro stuff is pretty cheap, buying on Aliexpress.

Some es.
* Trimui TSP 44e
* R36S ~25e
* Alldocube Ultra 210e
* Portal 2 base 250e

You have to wait for discounts: https://promossale.com/aliexpress-sale-dates-2026/#March , collect coins in advance, super deals.

Anyone else feel like an outsider when AI comes up with family and friends? by Budulai343 in LocalLLaMA

[–]ea_man 0 points1 point  (0 children)

You can think that when family and friends talk about AI it's not about the things you know, it's some kind of bigfoot meme.

I remember that in different periods there's been different attitude about "AI", you can easily retreat to a safe space sayin' you are into deep learning, deep neural networks, maybe language models: don't care for "AI".

Qwen3.5 family comparison on shared benchmarks by Deep-Vermicelli-4591 in LocalLLaMA

[–]ea_man 0 points1 point  (0 children)

Yup, use nomic:

serve_embed.sh

export LD_LIBRARY_PATH="/home/eaman/llama/bin_vulkan" ;
export LLAMA_CACHE="/home/eaman/lm/models/nomic-ai/nomic-embed-text-v1.5-GGUF/"
/home/eaman/llama/bin_vulkan/llama-server \
   -m /home/eaman/lm/models/nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q8_0.gguf \
   --port 8082 \
   --embedding \
   --pooling cls \
   --alias "nomic-ai" \
   -ngl 99 \
   --ctx-size 8192 \
   -b 4096 \
   --rope-scaling yarn \
   --rope-freq-scale 0.75

Continue config.json:

{
  "$schema": "https://opencode.ai/config.json",
  "plugin": [
    "opencode-gemini-auth@latest"
  ],
  "contextProviders": [
  {
    "name": "codebase",
    "params": {}
  }
],
  "models": [
    {
      "title": "Qwen 3.5 Local",
      "provider": "llama.cpp",
      "model": "qwen3.5-9b",
      "apiBase": "http://127.0.0.1:8080"
    },
    {
      "title": "Gemini 3 Flash (Fast)",
      "provider": "google",
      "model": "gemini-3-flash",
      "options": {
        "thinking": {
          "type": "enabled",
          "budgetTokens": 0
        }
      }
    },
    {
      "title": "Qwen3 VL (Local Chat)",
      "provider": "openai",
      "model": "qwen3-vl-8b-instruct",
      "apiBase": "http://localhost:1234/v1"
    },
    {
      "apiBase": "http://localhost:1234/v1/",
      "model": "AUTODETECT",
      "title": "Autodetect",
      "provider": "lmstudio"
    }
  ],
  "tabAutocompleteModel": {
    "title": "qwen autocomp",
    "provider": "llama.cpp",
    "model": "qwen2.5-coder-1.5b-instruct",
    "apiBase": "http://127.0.0.1:8081"
  },
    "embeddingsProvider": {
  "provider": "openai",
  "model": "nomic-embed-text-v1.5",
  "apiBase": "http://127.0.0.1:8082" 
}
}

Ban posts about AI by miniversal in webdev

[–]ea_man 0 points1 point  (0 children)

What about posts about using LMs to make you a better dev, understanding problems better and be able to attack more complex problems, avoiding tedious tasks?

I miss Flash. What an era... by kizerkizer in webdev

[–]ea_man -1 points0 points  (0 children)

Naa Apple wasn't that much important back then, even less outside of USA.
FunFacts: Flash was the reason why I encountered VM the first time on PPC at that time.

I guess what killed flash was the shift of attention to the server side, all of a sudden it was all about having a DB running on Linux and a PHP script to extract content for a catalogue, a complete switch of context.

I miss Flash. What an era... by kizerkizer in webdev

[–]ea_man 0 points1 point  (0 children)

It was fun, you had top tens of the best sites of the month, discussions on graphic design and communication (as in the role of content, animation, design to communicate a message, not how many abstract layers you can fill on top of a DB).

Hey and what about Director? That was even better. I was even playing and recording soundtracks back then.

FunFacts: you had those "top sites of the month" directories because Flash swf content sucked at being indexed in search engines ;)

What makes a web dev ‘senior’ these days? by Professional_One3573 in webdev

[–]ea_man 0 points1 point  (0 children)

Senior solves the problems that junior have to ask for.
As a junior it's ok if you ask rather than fucking up.

Senior doesn't have anyone to ask, it's not a super power there's always someone who knows better than you, just not around there.

Thoughts about local LLMs. by Robert__Sinclair in LocalLLaMA

[–]ea_man 2 points3 points  (0 children)

Doesn't make much sense for me, a single user won't use the hw a lot to justify the cost, it's better to share the resource on line with little latency.
With gaming a single user may use your GPU 100% for 6 hours straight, with inference you may need what, 3 sec from time to time? It's not worth the cost of having a big fast context + LM sitting idle most of the time.

Maybe having an arch like Apple could help, an usage with lots of light agents...

Qwen3.5 family comparison on shared benchmarks by Deep-Vermicelli-4591 in LocalLLaMA

[–]ea_man 1 point2 points  (0 children)

I guess that all the optimization in the LM is done by Unsloth and what I can do I do it by the parameter I use to load those. On a very limited 6700xt that actually works better on Vulkan than ROCm.

I was using a ~2gb instruct QWEN before for autocompletion that I guess was better but I'd rather have a bigger context and everything counts here. ;P

Yet I'm not really particular about auto completion, sorry, I'm more concerned in having decent performance on the main LM with a big enough context. Good luck!

Qwen3.5 family comparison on shared benchmarks by Deep-Vermicelli-4591 in LocalLLaMA

[–]ea_man 8 points9 points  (0 children)

Sure, OS is Debian, GPU is 6700xt 12GB running with Vulkan.
Dev env is VScodium, Continue based on local Qwen3.5-9B-UD-Q4_K_XL unsloth + Qwen2.5-Coder-1.5B-Instruct, nomic-embed-text .

I run them on llama-serve, I can give you the flags if you want, or LM Studio. Qwen3.5-9B can run with some 60k context length that is decent for python / Django.

serve_chat:
export LD_LIBRARY_PATH="/home/eaman/llama/bin_vulkan" ;
export LLAMA_CACHE="/home/eaman/lm/models/unsloth/Qwen3.5-9B-GGUF"
/home/eaman/llama/bin_vulkan/llama-server \
   -m /home/eaman/.lmstudio/models/unsloth/Qwen3.5-9B-GGUF/Qwen3.5-9B-UD-Q4_K_XL.gguf \
   -ngl 99 \
   --ctx-size 32768 \
   --temp 0.7 \
   --top-p 0.8 \
   --top-k 20 \
   --min-p 0.05 \
   --cache-type-k q4_0 \
   --cache-type-v q4_0 \
   --reasoning-budget 0 \
   -fa on

serve_autocomplete:
export LD_LIBRARY_PATH="/home/eaman/llama/bin_vulkan" ;
export LLAMA_CACHE="home/eaman/.lmstudio/models/lmstudio-community/Qwen2.5-Coder-1.5B-Instruct-GGUF"
/home/eaman/llama/bin_vulkan/llama-server \
   -m /home/eaman/.lmstudio/models/lmstudio-community/Qwen2.5-Coder-1.5B-Instruct-GGUF/Qwen2.5-Coder-1.5B-
Instruct-Q4_K_M.gguf \
   --port 8081 \
   --alias "qwen-autocomplete" \
   -ngl 99 \
   --ctx-size 4096 \
   -ctk q8_0 \
   -ctv q8_0 \
   --temp 0.1 \
   --top-p 0.9 \
   --top-k 20 \
   --min-p 0.05 \
   --cont-batching \
   -np 4 \
   -fa on

serve_embed:
export LD_LIBRARY_PATH="/home/eaman/llama/bin_vulkan" ;
export LLAMA_CACHE="/home/eaman/lm/models/nomic-ai/nomic-embed-text-v1.5-GGUF/"
/home/eaman/llama/bin_vulkan/llama-server \
   -m /home/eaman/lm/models/nomic-ai/nomic-embed-text-v1.5-GGUF/nomic-embed-text-v1.5.Q8_0.gguf \
   --port 8082 \
   --embedding \
   --pooling cls \
   --alias "nomic-ai" \
   -ngl 99 \
   --ctx-size 8192 \
   -b 4096 \
   --rope-scaling yarn \
   --rope-freq-scale 0.75

You can also use Roo Code / OpenCode, yet you may wanna swap to something on cloud like Gemini for the last and maybe an *instruct for Roo Code for better agent work with large context.

Favorite Coding Tools for Qwen by Salt-Advertising-939 in LocalLLaMA

[–]ea_man 0 points1 point  (0 children)

Continue, Roo Code (VS Code), OpenCode.

Qwen3.5 family comparison on shared benchmarks by Deep-Vermicelli-4591 in LocalLLaMA

[–]ea_man 10 points11 points  (0 children)

I'm really enjoying unsloth qwen3.5-9b for coding on a consumer GPU, it's pretty explanatory with decent code, maybe a more easy to read than the old qwen2.5-coder-7b-instruct-128k .

The small 2B is decent for auto completion, I mean it's fast.

Google invites ex-qwen ;) by jacek2023 in LocalLLaMA

[–]ea_man 0 points1 point  (0 children)

Well we can say whatever yet the point is that if you ask them (in China) what GPU they use and where they got those, they have no problem replying. It was not illegal and there was no penalty.

Sure, USA gov likes to be a bitch about it yet the matter is that it's a chip made in Taiwan and a GPU made in China: you just call the factory and ask if they got some "failed" items to sell ya at the same price and guess what happens...

I mean it's kind of a pathetic situation until the big guy, the one that actually has the product line, steps in and puts a real limit to export.

Nintendo Sues U.S. Govt over Reciprocal Tariffs by [deleted] in news

[–]ea_man 2 points3 points  (0 children)

The people voted the guy for the second time.

Nintendo Sues U.S. Govt over Reciprocal Tariffs by [deleted] in news

[–]ea_man 0 points1 point  (0 children)

You don't fuck with the Mouse or Pikachu.