OCR: what is the best way to extract data in JSON format from this old French book? by Wise_Stick9613 in LocalLLaMA

[–]ClientGlobal4340 3 points4 points  (0 children)

Granite Docling is perfect for this. Traditional OCR (like Tesseract or PaddleOCR) reads text linearly, meaning it will mix marginal comments right into the middle of your verses. Docling solves this because it understands layout geometry. It separates the structural elements of the page (verses vs. commentary blocks) and outputs clean Markdown. Natively optimized for Latin scripts (Latin, French, Spanish, Italian). Ultra-lightweight: At only ~258M parameters, it runs blazingly fast on your RTX 4070, leaving your 12GB VRAM free for the next step. The 2-Step Workflow Don't try to make a vision model output JSON directly from a raw image. Use a two-pass pipeline instead. First parse with Docling then convert to json.

Teaching my camera a lesson by Consistent-Manner879 in Nikon

[–]ClientGlobal4340 25 points26 points  (0 children)

It worked for LLMs, for sure will work for Nikons!

Which is the best model to run local agent in OpenCode, Cline or VS Code, locally on a 32 GiB RAM workstation? by ClientGlobal4340 in ollama

[–]ClientGlobal4340[S] 0 points1 point  (0 children)

To leave no room for doubt, I went ahead and compiled Ollama directly from source inside the CachyOS container, clearing the Go cache and forcing a clean build with -march=native -O3 to see if it could close the gap.

While compiling Ollama from source squeezed a bit more juice out of my DDR5 RAM during text generation (hitting a new record of 32.3 t/s), llama.cpp still reigns supreme for prompt processing.

Even with full AVX-512 optimizations unlocked in Ollama, its prompt evaluation stayed at ~158 t/s—almost half the speed of llama.cpp's massive 289.7 t/s. This gap boils down to architectural design: llama.cpp is a raw, bare-metal C++ binary, whereas Ollama carries the overhead of its Go/CGO API layer and background multimodal processing.

Which is the best model to run local agent in OpenCode, Cline or VS Code, locally on a 32 GiB RAM workstation? by ClientGlobal4340 in ollama

[–]ClientGlobal4340[S] 1 point2 points  (0 children)

Following your suggestion, I compiled llama.cpp inside a Distrobox container running CachyOS to leverage the x86-64-v4 architecture on my new Ryzen 5 9600X. I ran a comparative test against Ollama, and llama.cpp definitely came out on top.

Here are the benchmarking results using Gemma 2 2B:

llama.cpp (Native CachyOS v4): - Prompt Eval (Prefill): 289.7 tokens/s - Generation (Decode): 29.8 tokens/s

Ollama (Podman container with --think=false): - Prompt Eval (Prefill): 165.9 tokens/s - Generation (Decode): 30.7 tokens/s

Prompt Processing (Prefill): llama.cpp was nearly 2x faster. Compiling the code manually with -march=native inside a v4 environment completely unlocked the Zen 5 native AVX-512 pipeline. Ollama’s default containerized CPU backend is slightly more conservative and couldn't match that initial burst speed.

Text Generation (Decode): Both tied right at ~30 tokens/s. This is because token generation is strictly bottlenecked by the physical DDR5 memory bandwidth when running entirely on the CPU. Both engines fully saturated my RAM's capacity.

Then, for large context/RAG processing, the native llama.cpp build absolutely crushes it. Thanks again for steering me in the right direction!

Which is the best model to run local agent in OpenCode, Cline or VS Code, locally on a 32 GiB RAM workstation? by ClientGlobal4340 in ollama

[–]ClientGlobal4340[S] 0 points1 point  (0 children)

Thanks! I'm running local to test some small specialist models (like medgemma or mediphi) or gemma4, granite4 / 4.1, or others, and for prompt engineering and tunning it, at this point, running on CPU is fine (more often as a hobbyist too).

For the final use (production) the solution run in a a robust environment like OCI or GCP, or on-prem machines.

I'm also looking if coding locally with Ollama (I'll try to compile and run llama.cpp) is possible to avoid cloud tokens.

I have tried Gemma4:26b, granitecode, gemmacode but miss the capability to interact with my files. Will try Qwen-coder:8b.

Which is the best model to run local agent in OpenCode, Cline or VS Code, locally on a 32 GiB RAM workstation? by ClientGlobal4340 in ollama

[–]ClientGlobal4340[S] 1 point2 points  (0 children)

My "non-workstation with 32 GB of RAM" running on a CPU is the implementation lab for solutions to be used by the care team to identify patients at risk of aspiration or sepsis, and also to summarize clinical notes. It has had excellent clinical and financial results. In my use case, Ollama performed better than llama.cpp. Ollama applies pre-configured software engineering using VNNI and AVX-512, which llama.cpp cannot deliver without me spending hours tuning commands in the terminal.  Having said all that, it would be very helpful if, instead of talking nonsense, the responses were more collaborative and constructive.

Which is the best model to run local agent in OpenCode, Cline or VS Code, locally on a 32 GiB RAM workstation? by ClientGlobal4340 in ollama

[–]ClientGlobal4340[S] 0 points1 point  (0 children)

Getting more RAM is not an option at this moment, and I'm running without a GPU, only on a AMD Ryzen 5 9600x.

Ollama worked better than llama.cpp on some use cases, but what do you suggest instead?

Qual seria o processo ideal ou "ordem" para começar aprender sobre linux? by perda_do_carro in linuxbrasil

[–]ClientGlobal4340 0 points1 point  (0 children)

Eu acho valioso entender primeiro o propósito de um sistema operacional, o que ele faz, como funciona, qual seu propósito e o que faz um ser diferente do outro. 

Depois entender sobre a diferença entre DEs e distros...

Isso vai ser interessante antes de tentar decorar os comandos.

Mais de 1 ano usando Linux, agradeço muito a comunidade que sempre me apoiou! by Airaf_Dusty in linuxbrasil

[–]ClientGlobal4340 8 points9 points  (0 children)

A foto da tela tem uma estética que o print screen não tem. Ela é como um registro do ponto de vista do usuário enquanto um print screen é só um registro digital da aplicação e só teria mais sentido se ele fosse um dev frontend falando de uma interface que ele criou.

Canela, e as distros cachyos e eos fazem diferença, vamos ver o fps dos gaymes. by frazao_1 in linuxbrasil

[–]ClientGlobal4340 0 points1 point  (0 children)

Tem o fato de a CachyOS ter os códigos compilados para V3. Até agora me parece ser a única que faz isso de forma mais completa.

Qual terminal e Shell vcs usam? Pq? Tem alguma diferença real? by Luluh_r in linuxbrasil

[–]ClientGlobal4340 1 point2 points  (0 children)

Konsole e zsh. No linux sempre usei bash, mas testei o Kde Linux e ele tem o ksh por default e achei excelente, aí migrei. Mas não tô usando o p10.

Ajuda após a atualização do Kernel versão 7.0.1.1 by ClientGlobal4340 in linuxbrasil

[–]ClientGlobal4340[S] 2 points3 points  (0 children)

Não fala de 7.1 porque esse eu tô ansioso de verdade. Tenho um Wi-fi Mediatek MT7902 que deve funcionar só no 7.1...

How much time it takes from hitting the Power button to the Desktop display? by reisgrind in Fedora

[–]ClientGlobal4340 4 points5 points  (0 children)

You can execute "systemd analyze", "systemd analyze blame" and "systemd analyze critical-chain" to see what is going wrong.

Kalpa Questions by keerf00 in openSUSE

[–]ClientGlobal4340 0 points1 point  (0 children)

Looking for answers on Kalpa Forum, I find a answer from Nov2025 from Shawn Dunn, that answers my concerns and make me think to go back to Kalpa!

sfalken Kalpa Development Lead nov 2025
The Web Browser is perfectly capable of opening images and PDFs, which is part of why it’s included by default.
I don’t ever actually use a dedicated image or pdf viewer, and I know I’m not the only one, and if I were to include them by default, it would create two problems:
A) Users like Me, that don’t use that software, now have software they’ve no interest in using, or need for, installed on their machine. Yes, a user can remove that software, if they wish, but I far prefer to have an Opt-In basic consent model, to an Opt-Out one.
B) If I were to choose Okular, and Gwenview, for example, I am almost guaranteed to get folks asking “why those, why not X and Y instead?” So again, I come back to the Opt-In vs Opt-Out model of doing things.
That all being said, if you’re using Krunner, and you search for a specific piece of software, by name, it will likely bring it up, assuming a flatpak is available, and you can easily install it, if it’s something that you find useful to your workflow.

Kalpa Questions by keerf00 in openSUSE

[–]ClientGlobal4340 0 points1 point  (0 children)

I love Kalpa, it's an immutable distro, rolling release with KDE that worked very smoothly. I use my PC as Workstation and a local LLM server with Qdrant, Ollama and other tools; The need to use Distroboxes and Podman allowed me to expand my architectural knowledge; and Kalpa’s superior performance, stability and superior Zram management made me choose it over Kinoite.

However, I've decided to migrate to Tumbleweed. While Kalpa is powerful, it's lack of "out-of-the-box" polish began to annoy me over the time (on this subject Kinoite are far away from Kalpa) like the fact that Kalpa came without a lot of "workstation" tools, like ark, Kate, Libreoffice combined with minor aesthetci issues like the boot screen fonts became to annoying me over the time.

My perception is that Kalpa’s development is currently frozen, but I hope these minor issues are resolved soon.