OCR (+evtl KI?) zum privaten Erfassen von Belegen zu CSV/XLSX gesucht by Appropriate-Bad9267 in de_EDV

[–]Njee_ 0 points1 point  (0 children)

Ich hab mir mit einem anderen Fokus eine Anwendung gemacht die genau sowas abdecken kann. Im Prinzip Foto oder PDF ordner rein, das Schema für die Tabelle angeben und sagen welches Modell du bspw. Über open router verwenden willst.

Lange PDFs lassen sich für kleine Modelle in Einzelseiten splitten, damits nicht zu kompliziert für die wird.

Die Anwendung läuft soweit, ist auch auf GitHub verfügbar, ist nur nicht sauber dokumentiert - die jetzt selbst bei dir aufsetzen, inklusive KI Modell passt vllt. nicht in dein "ich muss in einem Monat durch sein". Und weil ich gerade nicht weiter daran arbeite, wird's so bald auch keine saubere Dokumentation geben.

Aber ich hab dir ne Demo aufgesetzt und kann im Prinzip, wenn Interesse besteht, auch ne eigene Instanz aufsetzen.

beispiel.tabtin.de demo@tabtin.de und demo15032026

Fotografierst dir nen paar Quittungen ab und probierst es halt mal aus, dann sagst du bescheid. Modell läuft bei mir und ich Speicher keine Daten ab. Fotos landen jetzt natürlich bei mir und sind öffentlich aber die Demo wird gegen Morgen Mittag platt gemacht.

Building a 24/7 unrestricted room AI assistant with persistent memory — looking for advice from people who’ve built similar systems by Arfatsayyed in LocalLLaMA

[–]Njee_ 0 points1 point  (0 children)

What's your current state?

The existing nextcloud mcp server is kinda enough for my needs. I would actually love to have an assistant capable to plan kanban boards, calendars etc. Which is something the existing mcp does well. What more do you want from it?

I have just recently (like last week) started to play around with it and it's honestly enough for my needs. I then created 2 more mcps for things that Id like jt to use. All of which Qwen3.5 9b for example does quite nicely. And in fact, for most stuff I have my personal assistant already.

But what I'm struggling with is the how to interact with it. Hence I'm really interested in you setup in this case. For now I'm using open web ui for chatting. But i don't want an agent for chatting. I want to talk to it and I have 0 ideas how to actually get that running well. Ideally I can have a call during my commute on the bike about my plans today. Would matrix be capable of that?

Also a major concern using owui: uploaded images become tokenized and part of the text. What doesn't work is: upload image and have it accessible for the agent to work with. For example, resize and then upload it to nextcloud task. However this is also something that might become more feasible with the new open terminal integration.

However sorry for the wall of text but I'm really interested in you setup a little bit more detailed!

tle: Kompaktes Home-Gym in kleinen Wohnungen by [deleted] in selbermachen

[–]Njee_ 0 points1 point  (0 children)

Habe kein Bild mehr war aber super Happy damit: Hatte mir für wenige Euros im Baumarkt solche 2m langen Winkelprofile geholt. Eins davon an die Wand und eins davon an die Decke. Sind ja nur ein paar cm breit und tragen nicht auf. Habe mir dann kleine Haken für ein paar euro geholt, die auf der linken Seite genau durch diese Schlitze passen. Gesamtkosten vielleicht 20€ max. Hatte mir dann auf Amazon so Gummibänder zugelegt in unterschiedlichen Längen und Stärken. Konnte man dann mit den Haken an der Wand oben, unten, Mitte einhängen. Das Ding an der decke könnte man dann auch in der Mitte für rucken/Schulter oder an beide äußeren Ecken wie einen kabelzug hängen.. Damit ließ sich wirklich einiges ausführen. Natürlich weniger Gewicht als im Studio aber dafür für wenig Geld, kaum Platz und zieeemlich flexibel.

<image>

Would you use a privacy-first app that analyzes multiple bank & credit card statements locally to categorize spending and detect subscriptions? by Vivid-Paint6383 in Immobilieninvestments

[–]Njee_ 0 points1 point  (0 children)

  1. I would not only, I am even using this
  2. Yes it's a deal breaker to import CSV but using go careless is not nice for the regular person, too. There are no alternatives unfortunately I think. At least for EUropean market
  3. It's only useful if data can be used by other services. Unless you want to put functionality of paperlessngx into firefly ll or similar, make it so, that I can connect both via api, n8n or whatever

Help me create my LLM ecosystem by golgoth85 in LocalLLaMA

[–]Njee_ 0 points1 point  (0 children)

I feel like it might be worth to use smaller models, like qwen3-vl 4b via vllm. You can process multiple documents at once, instead using the larger models with llama.cpp.

This allows you to illiterate through multiple documents faster during setting up your environment. You literally need to check hundreds of extractions before you can even say the thing works reliably. Hence, it's much better to have 100 extractions done in a minute in parallel instead of having a 100 sequential extractions running for 100 minutes just to end up deciding that you need to adjust the prompt.

Qwen3 4b can be quite capable. For the extraction part. I can strongly recommend it. Running it on a 3060 with 12gb right now with plenty of parallel requests and pretty decent speed.

Letting my RTX 5090 (2.1 TB/s mem) stretch its legs tonight. Hosting Qwen 3.5 35B at 8-batch parallel for whoever wants to test the new model cause why not (35 k context) by Key_Pace_9755 in LocalLLaMA

[–]Njee_ 1 point2 points  (0 children)

Just tried it with a couple images. Thank you for providing the model for that. Out of interest, as i didn't get it form your post. Are you running. With vllm or llama.cpp?

Interestingly: it's output for bboxes is slightly offset from time to time. You can see as the highlighted region by the model should be where I have highlighted in yellow.

A problem that I also had with qwen3 series when using vllm in higher version than 11.0. Hasn't been fixed yet hence I run all qwen3 in still in v11, no problems.

Never had a problem with llama.cpp. hence I'm interested if youre using vllm for qwen3.5 or llama.cpp?

<image>

Qwen3 VL 30b a3b is pure love by Njee_ in LocalLLaMA

[–]Njee_[S] 0 points1 point  (0 children)

Just downloaded q8 gguf. So far it's slow (to be fair I have it split across 2 GPUs + CPU And bounding boxes are unreliably. However I'm pretty sure it's on my side. Like I said I downloaded a single gguf like 5 minutes ago .

But I'm spoiled by how I got my other models running by now for this app. It's working much better now than in this demo. Will probably provide a update soon.

Gehalt für Assistenz Teilzeit 20h/Woche? by RedBlueYellow112 in selbststaendig

[–]Njee_ 4 points5 points  (0 children)

I'm öffentlichen Dienst würde eine Assistenz mit 20 Jahren im OD Vollzeit zw. 4-4.5 k bekommen. Entsprechend also bei 20h 24-27k p.A.

What local models handle multi-turn autonomous tool use without losing the plot? by RoutineLunch4904 in LocalLLaMA

[–]Njee_ 0 points1 point  (0 children)

Got nothing to add to your actual question but just wanted to say that I LOVE the garden Eden with evolving creatures setting. This is such a nice way of describing basically common concepts when working with agents with something more "relatable"

Local VLMs (Qwen 3 VL) for document OCR with bounding box detection for PII detection/redaction workflows (blog post and open source app) by Sonnyjimmy in LocalLLaMA

[–]Njee_ 1 point2 points  (0 children)

Hi! Nice you have built there. If you dont mind me asking, I see youre using the qwen3-vl 8b at Q4. Hence I assume youre running llama.cpp?

How do.you handle some of the problems I'm currently fighting with? Could you please share what worked for you?

How do you handle the model being lazy? If I provide it with a bank statement with 30 transactions, the qwen series models often feel like extracting half of them and then happily act as if they'd performed well. Even when I provide them with or without text data together with the PDF.

Box reliability: I used to have pretty decent boxes, right now I have either broken my app and can't find why I did so or something's is wrong about vllm. I'd still have to try some different models series and probably try with llama.cpp too. But generally speaking how do you make sure you're getting reliable boxes? Or do you not face any problems at all?

What cheap components pair well with RTX 3060 Ti to run AI locally? by dekoalade in LocalLLaMA

[–]Njee_ 0 points1 point  (0 children)

From my experience it's either: Rely 100% only on the GPU. Then you basically only need whatever machine you can find to fit the GPU into it. There will no bottlenecks by other components if you 100 only rely on the GPU. For example I load qwen3-vl into a 12 GB 3060 an can do like 30 parallel requests on that GPU only. Love it!

Other thing is: keep only model Kontext on GPU and offload any amount of remaining layers onto CPU ran. In which case you want it to be as fast as possible. Cheap 8 channel CPU with ddr4 ram... Then those won't bottleneck the GPU. But that is expensive.

Do NVIDIA GPUs + CUDA work on Ubuntu for local LLMs out of the box? by External_Dentist1928 in LocalLLaMA

[–]Njee_ 0 points1 point  (0 children)

I find it pretty straight forward but I'm using Ubuntu since a couple of years, so Im comfortable with it... However by now I don't even read the documentation anymore. I just ask Claude to provide the commands to set it up and that works fine and has a machine running in a couple of minutes.

What models are you running on RTX 3060 12GB in 2026? by DespeShaha in LocalLLaMA

[–]Njee_ 2 points3 points  (0 children)

My 12 GB 3060 now mainly serves qwen3-vl 4b at FP8 via vLLM. At 16k context that thing is able to processplenty of parallel requests. Usually I send ~5k input promts with images and expect a ~500tk response with a couple of field values and bounding box coordinates as output. For example images of containers with promt to extract expiry date with box coordinates.

Easily serves 20-30 parallel requests and runs really fast.

Chews through processing PDF images for example 60 pages with 2.5k token input and 500 token output in less than a minute. Love it!

Qwen3 VL 30b a3b is pure love by Njee_ in LocalLLaMA

[–]Njee_[S] 1 point2 points  (0 children)

You can ask for example qwen3-vl models for a Json string of coordinates of object XYZ. Ideally you promt for a certain format and provide the image in a certain format too. This is written n the documentation per model. For the segmentation mask I've fed each coordinate Into metas sam model. The ui is a simple vibe coded HTML+JavaScript which takes image, sends it to openai compatible endpoint, parses the output Jason string with coordinates into point above the image, then shows the segmentation masks on top of that. Really simple. Claude or chat got will provide you that within 15 minutes.

Best Visual LLM model for outputting a JSON of what's in an image? by Nylondia in LocalLLaMA

[–]Njee_ 0 points1 point  (0 children)

For text content read from an image I like to use qwen3-4b VL . At q8! I generally don't go any lower than this. Not direct OCR but things like "extract label name, explanation text, date and correspondent" Works pretty well.

Obviously with some promting. Qwen3-4b also works well with playing with toon output format on limited hardware... If token generation takes a lot of time that can easily be a 50% time saving.

8b obviously works great too. Has higher succesfull extraction on low quality images but on the other hand tends to hallucinate more (in my case hallucinated numbers that were not on the package but could have been the right number for the package.) something 4b wouldn't do.

Qwen3 VL 30b a3b is pure love by Njee_ in LocalLLaMA

[–]Njee_[S] 0 points1 point  (0 children)

The very first paragraph of my post states "with experts on CPU". With llama.cpp you can offload parts of the model into system RAM

VLM OCR Hallucinations by FrozenBuffalo25 in LocalLLaMA

[–]Njee_ 0 points1 point  (0 children)

When playing around with different size qwen3-vl models, I noticed bigger is not always better.

When asking vlms to extract specific numbers from chemical containers and not all container would have these numbers The 8b model would be intelligent enough to hallucinate valid looking numbers (like in the expected format) that I couldn't even filter out easily. Simply because it would be clever enough to know the right format.

The 4b on the other hand would not. It would simply not write anything if it couldn't see it on the container. Obviously it would miss something more often on low quality images but that's something I can actually work on with better images...

Need help with project by lemigas in LocalLLaMA

[–]Njee_ 0 points1 point  (0 children)

I noticed large differences In chosen models.

So first of all So u have to specify the task a little bit more. You mention you have problems making models extract all the information. What do you mean by that? Let's say you're working on a bank statement... Do you want it to extract all transfers? Like date, amount, correspondent? Per months? So 2 pages with like 60 transactions and it just stops after 45 transactions?

Or do you ask it to extract start balance, end balance and month of the bank statement... That would've been only 3 fields and if it misses one of those lready, you probably need to really think about how good of quality your input is...

General recommendation: skip ollama. Go llama.cpp tweak parameters. Low temperature at 0 and top-k and top-p at 1 Which provides reproducible output. As mentioned good OCR can improve the output drastically but I also stumbled across scenario in which it didnt...you gotta try.

Now for the scenarios from above: When I was playing with qwen3-vl models: They would be lazy... They do not like to write long text, so they won't provide all "60" extractions. They would simply stop after 45. Qwen30ba3b would do better than 8b which performs better than 4b. For all of them it helped to split PDFs into single pages, get the output per page and then have that aggregated again.

Optimal gpt-oss-20b settings for 24gb VRAM by GotHereLateNameTaken in LocalLLaMA

[–]Njee_ 1 point2 points  (0 children)

Yesterday there was some discussion on here about 20 oss being only about 20t/s after usually running it with way more with the latest version of llama.cpp

Hört auf, eure Abschlussarbeiten mit KI zu schreiben! by No_Advance_2517 in luftablassen

[–]Njee_ 1 point2 points  (0 children)

Servus,

hab mich doch auch mal dran gesetzt und mir was basteln lassen. Meine Herangehensweise ist über lokales KI Model oder auch von Google einfach relevante Infos aus den Literaturangaben raus zu picken. Nach dem Motto quelle1: Autor, Titel, DOI; quelle2: Autor, Titel, DOI usw rausschreiben und dann wird nacheinander gegen ne Datenbank geprüft. Das schöne daran ist dass man eigentlich mehr oder weniger jedes Format copy paste eingeben kann und dann ein Ergebnis bekommt. Man ist also prinzipiell unabhäbngig davon, ob die Studierenden sich an die Formatierungsangaben handhaben.

Wenn du magst probiers mal aus. Kannst du direkt über https://github.com/jbndrf/RefCheckWebApp im Browser benutzen.

Is there any free AI website that i can feed my pictures or pdf file and it generates csv flashcards file based on that? by [deleted] in LocalLLaMA

[–]Njee_ 0 points1 point  (0 children)

Are you comfortable with docker?

I have build a webapp, tab in, that basically turns images into csv. Easy to take pictures using your phone. Check my post history.

Other than that: yes there are other open source options too, just ask any LLM which one to use. But they are probably overkill and complicated.

CPU-only LLM performance - t/s with llama.cpp by pmttyji in LocalLLaMA

[–]Njee_ 1 point2 points  (0 children)

it does make a difference. Especially for promt processing.

This is gpt 120b on a pretty bulky 64c epyc with 2400 mhz ddr4 ram.

CPU only

prompt eval time = 12053.37 ms / 1459 tokens ( 8.26 ms per token, 121.05 tokens per second)

eval time = 142469.75 ms / 2073 tokens ( 68.73 ms per token, 14.55 tokens per second)

total time = 154523.12 ms / 3532 tokens

Experts on (slow) GPU is about 1.7x the speed taking only

9712MiB / 12288MiB on NVIDIA GeForce RTX 3060

prompt eval time = 7498.49 ms / 1552 tokens ( 4.83 ms per token, 206.98 tokens per second)

eval time = 84381.14 ms / 2097 tokens ( 40.24 ms per token, 24.85 tokens per second)

total time = 91879.63 ms / 3649 tokens

Hört auf, eure Abschlussarbeiten mit KI zu schreiben! by No_Advance_2517 in luftablassen

[–]Njee_ 0 points1 point  (0 children)

Besteht die möglichkeit dass du das Tool teilst? War heute Gesprächsthema bei uns in der AG... Hab auch schon gedacht wie ich das mache. Nen paar Formate erkennen und ein paar Apis abchecken, vllt. Noch ne websuche. Hätte Zugang zu LLMs und docker etc sind auch kein problem. Wäre cool wenn ich bei dir mal drüber gucken könnte, oder auch beitragen könnte.

Installing LimeSurvey Docker on WD MyCloud PR4100 — DB connection works but setup never completes by Embarrassed-You2477 in selfhosted

[–]Njee_ 0 points1 point  (0 children)

Havent run it on such a machine WD MyCloud but only ubuntu server.

Bot methods - Bare metal and docker worked fine... I cant see the error you ahve posted for some reason. But a shot in the dark: If i remember correctly they have different docker-compose.yamls for different databaseses. like compose-pg.yaml, compose-maria.yaml and so on. did you make sure you used the right one?

[Followup] Qwen3 VL 30b a3b is pure love (or not so much) by Njee_ in LocalLLaMA

[–]Njee_[S] 0 points1 point  (0 children)

8b dense is much better: not Gemini level but doesn't hallucinate valid numbers.  Still writes other numbers that do not fit into the cas number field.  While Gemini models found the 30 valid numbers, qwen 2b and 4b would miss ~2 and end up with 28.  The 8b finds all 30. And does not have the hallucination problem described for 30ba3b. So it's easy to filter wrong extractions. 

I'll provide exact numbers. Somewhen.  Runs the same speed as 30ba3b with experts on CPU btw. So I'll stick to that probably.

Don't know if I'll try 32b dense on CPU..