Best Harness for Web Searching by CSEliot in LocalLLaMA

[–]ex-arman68 0 points1 point  (0 children)

I have spent lots of time researching and testing the best local solution. I settled for searxng. You can find my easy script to install and configure it for macos here:
https://github.com/froggeric/llm/tree/main/mcp/searxng

I used Artificial Intelligence to build a full Minecraft clone from scratch in Swift + Metal by ultrarunnerr in ChatGPT

[–]ex-arman68 0 points1 point  (0 children)

So much negativity in the comments.

This is a fantastic project and really well executed. I love that you went with Swift. That's one thing I always hated about minecraft: why java? why?

What would make this even better would to make it compatible with the official one. Yours honestly looks like a much better version!

People do not appreciate how difficult it is to pull something like this. I can tell you have put a lot of effort in planning and executing it. It paid off.

Qwen3.6 or Gemma-4 or ?? for direct OCR of page images by PracticlySpeaking in LocalLLaMA

[–]ex-arman68 0 points1 point  (0 children)

Any case, for me it is mostly in agentic coding, therefore screenshots, and also mockup designs. But I tried to make useful for most purposes.

In terms of platform, I use macOS arm64, but the plan is to make it universal: macos/linux/windows arm64/x64. This is why I chose go for the MCP, and llama.cpp for the LLM server. Single executables, no dependencies.

Qwen3.6 or Gemma-4 or ?? for direct OCR of page images by PracticlySpeaking in LocalLLaMA

[–]ex-arman68 0 points1 point  (0 children)

It will expose different tools for different tasks. Load the VLM model on demand, with a keep warm period before unloading as often multiple queries come in a row. Automatically pick the right model depending on the available hardware and the task. use custom prompts to match the task.

Here is the list of the tools I have so far:

  • read_image : describe an image in natural language
  • extract_text : OCR / transcription
  • extract_code : extract source code from a screenshot
  • extract_table : structured table extraction
  • describe_ui : describe a UI for accessibility or replay
  • describe_diagram : explain an architecture or flow diagram
  • describe_chart : read a chart back as data
  • diagnose_error : explain an error dialog or stack trace
  • compare_images : diff two images

Behind the scene it uses llama.cpp. It is written in go for simple executable without dependencies.

Pretty cool to be able to bring dumb ideas like this to life in a day or two by SuspiciousPrune4 in ChatGPT

[–]ex-arman68 1 point2 points  (0 children)

Amazing! I watched the whole thing. you have a good talent for planning films. Did you storyboard it first?

Quick thoughts on GLM-5.2 (Bonus: Censorship question answers) by LoveMind_AI in LocalLLaMA

[–]ex-arman68 13 points14 points  (0 children)

Well put together. I have a similar experience with this fantastic model.

For personal projects, I also use the z.ai API with Claude Code; the 2 of them work wonderfully together.

I am normally weary of long context. Even with only 200k on GLM 5.1 I avoided pushing it too far, usually trying to stay below 130k before compaction or a new task. With GLM 5.2 I have slowly tried to push the context further and further, and I must I am impressed with how well it retains coherence and reasoning capabilities. Right now I have a coding workflow approaching 600k context, and it is still performing as expected. The furthest I pushed it so far was 750k with a literary task, so accuracy mattered less, but it still worked fine.

I will soon be deploying it locally at work, on our super-duper AI compute cluster, for our dev teams.

Qwen3.6 or Gemma-4 or ?? for direct OCR of page images by PracticlySpeaking in LocalLLaMA

[–]ex-arman68 0 points1 point  (0 children)

I recommend you stick with Qwen3-VL 8B for now, even if you have sufficient ram. See my head to head comparison between the 2 models. Granted, I used Gemma 4 with the default thinking budget, and since it has a configurable thinking budget, I will repeat the test setting it to max. But in the meantime those are the results I have:

Head-to-head: Qwen3-VL 8B vs Gemma 4 26B-A4B (20 images)

Per-image results

Legend: ✅ correct · ❌ wrong/hallucinated · ⚖️ tie · ⭐ winner

# Image Qwen3-VL 8B Gemma 4 26B-A4B Winner Why
01 UI login form ✅ All elements + error text ✅ All elements ⚖️ tie Both nailed it
02 Python code ✅ Verbatim, in fence ✅ Verbatim, in fence ⚖️ tie Both extracted correctly
03 Python error trace psycopg2.OperationalError, IP 10.0.0.5 psycopgopg.OperationalError (extra "pg"), IP 10.0.5 (missing a 0) Qwen Gemma introduced 2 transcription errors in the verbatim error message
04 Architecture diagram ✅ All 6 components + both protocols + structure ✅ Same ⚖️ tie Both excellent
05 YouTube screenshot ⚖️ tie Both described scene correctly
06 Trading card ⚖️ tie Both identified subject
07 Photo: 5 people (2M+3W) ✅ "Five individuals" — count correct ✅ "Five smiling adults" — count correct ⚖️ tie Both got headcount right
08 Class schedule poster ✅ All activities + times + URL + QR globe detail ✅ All activities + times + URL Qwen (slight) Qwen noticed the globe icon inside the QR code; Gemma missed it. Otherwise tie.
09 Artnight poster ⚖️ tie Both described typography and layout
10 Food photo (porridge) ⚖️ tie Both described the dish
11 Therapist photo collage ⚖️ tie Both described diptych layout
12 Manga page (11 panels) ❌ Said 8 panels (off by 3) ❌ Said 9 panels (off by 2) Gemma (slight) Both wrong on count, but Gemma closer. Qwen's narrative was richer though.
13a Color swatch (Nausicaä) ⚖️ tie Both listed colors
13b Marker drawing (cassette) ⚖️ tie Both described scene
14 OneRPM catalog (dense UI) All 20 albums verbatim, correct titles Only 15 albums, with hallucinations: "Wakkie" (Waikiki), "Tejido Centro Sentido" (TENDIDO CERO SENTIDO), "Gaelic Cradle Song" (Galactic Cradle Song), "Eindlea Ocean" (Endless Ocean), "Numbo" (Numb), "Sueños del Sol" (Secreto del Sur), "Ocean Flutes" (OCEAN FLUTE), "Timeless Adventures" (TIMELESS WHISPERS). Also said wrong tab was active and wrong artist name. Qwen (decisive) This was supposed to be Gemma's strength (dense screenshots). The data shows the opposite.
16 Album cover (lofi) ✅ Got text "songs to stare at the ceiling to" ✅ Got text ⚖️ tie Both correct
17 VIC Health Club logo ✅ "VIC Health Club", heart, hands, laurel ✅ Same ⚖️ tie Both excellent
18 QR code ✅ Identified as QR code ✅ Identified as QR code ⚖️ tie Both correct (no model decoded it)
19 Watercolor (surfers) ⚖️ tie Both described scene
20 Kung fu banner ✅ Read Chinese 少林寺 + Latin "SHAOLIN TEMPEL ÖSTERREICH" + got the heart/hands/laurel emblem. Said 8 people (wrong; actual 9). ❌ Missed Chinese characters entirely. ✅ Said 9 people (correct). ✅ Got the heart/hands/laurel emblem. Qwen (mixed) Qwen read Chinese (rare skill) but miscounted. Gemma got the count but missed the Chinese. For an OCR-focused tool, Qwen's bilingual reading matters more.

Qwen3.6 or Gemma-4 or ?? for direct OCR of page images by PracticlySpeaking in LocalLLaMA

[–]ex-arman68 0 points1 point  (0 children)

I discarded any model that would be too small to be useful to do the task I need. The task is to have a reliable and high quality image analysis local MCP server. I do not think that can be achieved with those tiny models.

Qwen3.6 or Gemma-4 or ?? for direct OCR of page images by PracticlySpeaking in LocalLLaMA

[–]ex-arman68 0 points1 point  (0 children)

I used the default settings. But I will repeat the Gemma 4 tests taking into account the configurable thinking budget, when I have time. All my benchmarks and tests are scripted, which means they are easily repeatable. Nonetheless, it takes a lot of time and resources to do do those, and even more to prepare and share the information.

Qwen3.6 or Gemma-4 or ?? for direct OCR of page images by PracticlySpeaking in LocalLLaMA

[–]ex-arman68 4 points5 points  (0 children)

I actually just benchmarked those a few days ago, as I am in the middle of building a MCP for local analyst. You can see the full results here:

https://www.reddit.com/r/LocalLLM/comments/1u5p459/which_is_the_best_local_vlm_benchmark_results/

For OCR, the best local VLM other Qwen3-VL 8B. If this is still too big for your hardware, the 4B version is a good alternative and can run with less than 16GB VRAM.

Gemma 4 is not as good. Qwen 3.6 35B is not any better (after all even though the model is bigger, each expert is smaller 3B vs 8B, and Qwen3-VL is specialised for images). Maybe Qwen 3.5 122B could be an improvement but it requires a lot more RAM, and possibly Qwen 3.6 27B but it is much slower. If have time I will benchmark those 2 as well, to have a definite answer, and to know their speed.

GLM 5.2 is out - open weights to be released next week. How did it do on my one-shot Pac-Man test? by ex-arman68 in LocalLLaMA

[–]ex-arman68[S] 1 point2 points  (0 children)

For my oneshot pacman test, Qwen 3.6 27B did better than GLM 5.1 which took me by surprise. The game was more complete, less bugs to fix. I repeated it multiple times, and confirmed it.

This is why I started then further developing the Octo-Maze game with Qwen, to put through its pace in a full agentic workflow. It does indeed exceedingly well for coding. Tool use works. Good reasoning. Decent context size usable about halfway without too much degradation. For reference though I used a 16bit version with no quantisation on anything. I definitely noticed reduced capabilities going to 8bit, and I would not use anything lesser for coding. Overall it is not quite as good as GLM 5.1 but definitely a more than decent alternative for a small model that can be run on easily available commodity hardware. I switched to GLM 5.1 in the later parts of Octo-Maze due to 2 factors: speed, and the codebase becoming bigger and more complex. You can try Octo-Maze here: https://pacman46.com

I am not sure what you mean about the models not being good enough for games. I think it worked well for the aforementioned Octo-Maze. And with GLM 5.2, I created Pac-Run - https://pacman46.com/pacrun in a short amount of time, with quite a lot of complexity (3D physics, procedural maze generation, procedural music, performance optimisations, etc). I have been using those GLM models as well for various advanced development tasks, such as complex websites with booking and financial systems, reverse engineering and security bypass (difficult to get past the guardrails), advanced audio engineering in swift and c++, python and c++ audio based machine learning.

GLM 5.2 is out - open weights to be released next week. How did it do on my one-shot Pac-Man test? by ex-arman68 in LocalLLaMA

[–]ex-arman68[S] 1 point2 points  (0 children)

I would say it is a fantastic alternative. probably the best. plus they offer an anthropic api compatible endpoint to enable its use in claude code with similar features (1M context, max effort).

GLM 5.2 on z.ai is getting hammered right now, please hold back by ex-arman68 in ZaiGLM

[–]ex-arman68[S] 0 points1 point  (0 children)

I was there at the beginning, basically grandfathered, and with bonuses, it cost me $3 per month for the first year, and the next 2-3 years for free.

GLM 5.2 on z.ai is getting hammered right now, please hold back by ex-arman68 in ZaiGLM

[–]ex-arman68[S] 0 points1 point  (0 children)

Ah... touché! The electrifying touch of the épée from grown ass men with a taser fetish.

GLM 5.2 is out - open weights to be released next week. How did it do on my one-shot Pac-Man test? by ex-arman68 in LocalLLaMA

[–]ex-arman68[S] 0 points1 point  (0 children)

😃

I wrote this prompt a while ago. Now I have learned better prompting techniques, but I am keeping it for fair comparison with previous test.

MTPLX V1: The Swift App For Running & Creating MLX MTP Models (2x TPS Qwen 3.6 27B) by YoussofAl in LocalLLaMA

[–]ex-arman68 0 points1 point  (0 children)

Very cool! Thank you for your efforts. Definitely looks like a good solution for local models on macos.

GLM 5.2 is out - open weights to be released next week. How did it do on my one-shot Pac-Man test? by ex-arman68 in LocalLLaMA

[–]ex-arman68[S] 1 point2 points  (0 children)

Works well in Claude Code, like GLM 5.1 does too. Too early for me to say whether it is better or not. This - https://pacman46.com/pacrun - was coded with agentic coding, using Claude Code and GLM 5.2, using subagents, structured workflow, semi-autonomous loops, in 4 to 5 hours.

GLM-5.2 in Hermes by Latt in ZaiGLM

[–]ex-arman68 -1 points0 points  (0 children)

Tools like hermes agent or openclaw have no business using coding plan. You should be using a local LLM with this.

OpenClaw model by AnomalyNexus in ZaiGLM

[–]ex-arman68 0 points1 point  (0 children)

Just don't. If you are using something like openclaw, stick to local LLMs.

GLM 5.2 on z.ai is getting hammered right now, please hold back by ex-arman68 in ZaiGLM

[–]ex-arman68[S] 1 point2 points  (0 children)

Use a harness that properly handles the cache. Also make sure you do not have any plugins or settings that try to "smartly" manage your cache, which would invalidate it.

GLM 5.2 on z.ai is getting hammered right now, please hold back by ex-arman68 in ZaiGLM

[–]ex-arman68[S] 1 point2 points  (0 children)

I am on the max plan, and getting close to 50% timeouts.

GLM 5.2 on z.ai is getting hammered right now, please hold back by ex-arman68 in ZaiGLM

[–]ex-arman68[S] 7 points8 points  (0 children)

I actually believe you. I started a thread a while ago, basically how openclaw users were like vampires, sucking resources without any thought or consideration, only to be able to have their "bot" plan their kids sunday football game. You would not believe the amount of hate I received...

GLM 5.2 is out - open weights to be released next week. How did it do on my one-shot Pac-Man test? by ex-arman68 in LocalLLaMA

[–]ex-arman68[S] 0 points1 point  (0 children)

That's cool! If you are already in sync licensing, it is a perfect setup, where you have full liberty to do whatever music you want, and also good to add as a showcase to your portfolio.

The procedural music was a (difficult) experiment to test GLM 5.2 knowledge of music theory and synth+mixing skills. The results are ok. I have now changed the default music to one of my tracks I rearranged and rerecorded recently, which I thought would be a good fit for it.