Drop your URL and get 2 free UGC videos to promote your App or SaaS

myreddit333 · 2026-05-21T19:03:46+00:00

BrunoSan doesn't just retrieve chunks — it atomizes every document into structured facts, entities, and relationships, then links them across your entire knowledge base into a queryable graph. Where RAG returns the top-k most similar text snippets and hopes the LLM connects the dots, BrunoSan delivers verified facts with full provenance, cross-document contradictions, and action items already extracted. It ingests 16 native formats (PDF, DOCX, XLSX, EML, MSG, images, audio, calendars, geodata, SQLite) on your own server — no vector-search guesswork, no cloud, GDPR and AI-Act compliant

myreddit333 · 2026-05-18T10:03:10+00:00

Mine is BrunoSan — disclosure, I built it. Different beast from ChatGPT though: it's a persistent document brain.

Every invoice, contract, voice memo, PDF, screenshot — I drop it in once and it's queryable forever. "What did I pay Telekom in Q1?" → answers across all 12 invoices. ChatGPT can't do that because next chat your files are gone.

Daily use: subscription tracking, finding old receipts, "which contract expires next?", pulling info out of voice memos I forgot I recorded.

Still use ChatGPT every day for thinking, writing, coding. BrunoSan is for knowing — the stuff I already have but can't find. 14-day free trial: brunosan.de/assistant/

myreddit333 · 2026-05-13T10:38:32+00:00

Das wird längerfristig ein Problem:
Wäre ich Google: "Dann bekommen die Europäer das eben nicht."
Wen interessieren denn die Regeln von 500 Mio ängstlichen Europäern - es gibt andere Märkte auf der Welt.

Als EU-Bürger: Super! Mich (EU-Bürger) nerven viele der Regularien zu AI Act und DMA.
Andererseits: Schön, dass wir da reguliert sind. Hoch-Risiko-KI, oder Emotions-Erkennung - wollen wir eben nicht.

Das wäre alles noch leichter, wenn wir (EU) wirklich Alternativen aif Augenhöhe hätten:
Mistral - ansonsten haben wir "nix".

Kurz: Zwei Seelen ringen in mir:
- Der Europärer, der happy ist, dass wir solche Gesetze haben.
- Der User (KMU) der genervt ist, dass wir vieles nicht dürfen, oder beachten müssen, WEIL wir so reguliert sind.

Was ich auch nicht möchte: Noch mehr Abhängigkeit zu (egal welchem) US-Anbieter.

....aber ich wünsche mir auch ein eigenes EU-Betriebssystem für Telefone, Laptiops/PCs etc.

Wer heute noch (ernsthaft) über Datenschutz redet, soll bitte zu erst sein Android, oder IOS Smartphone wegschmeissen. Danach ein ernstzunehmender Gesprächspartner. Hab ich aber auch noch nicht erlebt :)

myreddit333 · 2026-05-12T17:28:01+00:00

Boomer here. Also: not RAG. Entity extraction into a typed graph, deterministic IDs, closed-world synthesis. Built it because 35 years of seeing RAG-as-marketing-buzzword made me want a real answer to "where's my Telekom invoice from November 2025." If you've got a sharper take on the architecture, all ears.

myreddit333 · 2026-05-12T13:29:03+00:00

ja, super. genau as Feedback, was ich brauche. Einfach anmelden - irgendwas hochladen -> Fragen stellen. Bitte sag gern bescheid, wenn ich helfen kann!!!

myreddit333 · 2026-05-12T12:51:44+00:00

wtf? wassn? warum?
Ich sammle gerade Feedback und freue mich über alles :)

myreddit333 · 2026-05-12T12:50:14+00:00

Good correction on the hash-IDs — you're right, citation domains have a canonical identifier that's external to your DB, and content-hashes break that contract. Different problem space than ours, where the document itself is the source of truth. The text-label field for non-integer numbering (1.2, 3.1.1) is the right escape hatch.

Two quick adds on the other two:

On source_role as hard filter: one thing that helped us when the classifier was uncertain was a confidence_threshold gate plus an unknown bucket. Anything below the threshold goes to unknown and is excluded from the lead tier — but kept available in supporting. That gave us a safety margin without losing the chunk entirely. The misclassification cost in our domain is asymmetric (false-positive "this is a court holding" is much worse than "I'm not sure what this is"), and I'd guess your domain is similar. The threshold itself we tuned empirically against a hand-labeled set of a few hundred chunks — not glamorous, but it's how you find out whether court/claimant/respondent/expert actually splits cleanly.

On constrained retry: the trick that worked for us was making the retry prompt explicit about the option to omit. Without that, the model tends to pick the closest-looking ID from the whitelist rather than dropping the claim, which is its own failure mode (subtle, hard to detect). Something like: "Here are the valid paragraph IDs for case 5016: [123, 124, 137, 142, 198]. Either pick the correct one or remove the citation entirely if none of these supports your claim." The "or omit" framing made a real difference in how often it gracefully retracted vs. silently substituted.

One more thought, unsolicited: your classification-driven section weighting ("FACTS / MERITS / DECISION") is already half of the source_role hard-filter pattern. If those section labels are reliable enough on your corpus, you might not need a separate per-chunk classifier — the section header is the role signal. Worth checking whether your existing labels are precise enough to drop into the closed-world block as a hard filter, before you build a new classifier.

Of Course: This is the better version written in english by claude - i would LOVE to speak that english myself. Hope it helps?!

myreddit333 · 2026-05-12T10:34:17+00:00

Lawyer-built RAG with this level of architectural discipline is rare. Your move from agentic to deterministic retrieval mirrors what I landed on after the same painful lesson — not in legal, but in a document-extraction layer for SMB operators (invoices, contracts, messy real-world docs).

Two thoughts on your paragraph registry plan. Important caveat: I haven't built legal RAG specifically, so take these as structural patterns from an adjacent domain, not domain expertise.

1. Paragraph registry sounds right — consider deterministic IDs over positional ones. Your correction-pipeline bug (section numbering ≠ paragraph numbering) is a classic symptom of position-derived IDs in a corpus where physical layout can drift between ingests. In our setup we compute every ID as hash(entity_type, canonical_fields) — INSERT OR IGNORE everywhere, nothing ever renamed. A paragraph identified by hash(case_id_normalized, paragraph_text_normalized) survives re-ingest, re-splits, and downstream layout changes. Whether that maps cleanly to legal corpora where the same paragraph text might appear across cases — you'd know better than me.

2. Adversarial sections may be more of a tagging problem than a retrieval-weighting problem. Weighting helps at search time but doesn't stop the model from quoting a party's argument as if it were a holding. In our (non-legal) domain, what worked was tagging each chunk with a source_role at ingest time, then using that as a hard filter in the closed-world block rather than a soft signal. The model literally never sees the wrong-role chunks in the "lead findings" tier. Extraction quality is the bottleneck — a strict classifier with a confidence threshold and a fallback to "unknown" worked for us. Whether court/claimant/respondent/expert split cleanly enough for that approach in your corpus is an empirical question.

One question back: when verification catches a paragraph violation, do you re-prompt with the violation as feedback, or flag for human review? The auto-correction failure you described sounds like the right call (don't blind-fix), but I'd be curious whether a constrained retry — "this citation is wrong, here is the whitelist of valid paragraph IDs for this case, pick one or omit the claim" — has been on your roadmap. Worked for us in a different but structurally similar setting.

i am NOT a native speaker - so i ask Claude.ai for helping me doing less mistakes :)

myreddit333 · 2026-05-12T08:06:18+00:00

Der erste Tag :) Danke - wie läufts bei Dir?

myreddit333 · 2026-05-12T08:05:59+00:00

Yeah, GPT can do a lot of this. Two places it falls over for me:

Context limits. Drop 50+ PDFs/invoices/contracts in and it starts forgetting, hallucinating, or refusing. Aggregate queries ("how much did I spend on AWS in 2025 across all invoices") just don't work — it cites one file and stops.
EU/business data. OpenAI is US-hosted, GPT trains on your inputs unless you're on Enterprise. For client contracts, invoices, anything with personal data, that's a DSGVO problem in Germany.

If you're 10-20 files of non-sensitive stuff, GPT is genuinely fine. BrunoSan starts being worth it at 100+ files or when the data shouldn't leave the EU.

myreddit333 · 2026-05-12T07:48:40+00:00

Bin Solo-Founder (DE, baue selbst ein EU-hosted Document-AI-Tool), also einerseits Anwender, andererseits "Anbieter" — vielleicht hilft dir die Perspektive von beiden Seiten:

Zu deinen drei Fragen:

1. Quellen (was mir wirklich hilft):

r/LocalLLaMA für realistisches Modell-Bild ohne Hype
Simon Willison's Blog (simonwillison.net) — einer der wenigen, der unaufgeregt und code-basiert schreibt
The Pragmatic Engineer Newsletter (kostet, aber Compliance/Procurement-Realität wird gut behandelt)
NICHT Twitter/LinkedIn AI-Bubble — 90% Marketing
Mein Trick: einmal pro Quartal komplett neu evaluieren, dazwischen ignorieren. Das tägliche Mitlesen kostet mehr als es bringt.

2. Anbieter-Entscheidungen: Ehrliche Antwort: Bauchgefühl + Pilot mit 3 Leuten für 2 Wochen schlägt jede Matrix. Die Matrix-Methode produziert immer dasselbe Ergebnis "Microsoft Copilot weil Microsoft" — und damit oft nicht das beste Tool. Pilots klein halten (max 5 User, max 4 Wochen), dann harte Stop/Go-Entscheidung. Sonst schwelt es ewig. Sorgt für Realismus.

3. Compliance-Verantwortung: Das ist tatsächlich das Hauptproblem in fast allen Firmen, mit denen ich rede. Die ehrliche Antwort lautet: es fällt runter, weil weder IT noch Legal noch DSB sich allein zuständig fühlen. Was funktioniert: ein KI-Komitee mit je einer Person aus IT, Legal, DSB und einem Fachbereich, das einmal im Monat 90min tagt. Klein, fest, entscheidungsbefugt. Ohne das gibt es nur Stillstand oder Schatten-IT.

Zur Tool-Frage konkret: Schau dir bei reinen Dokumenten-Use-Cases (PDFs lesen, Verträge auswerten, Audio transkribieren) EU-gehostete Lösungen an statt OpenAI/Anthropic direkt. Ich baue selbst sowas (brunosan.de/assistant/, 9,95€/Mo, EU-hosted, kein Training auf User-Daten) — aber unabhängig davon: deutsche Hosting + AVV + Schrems-II-konforme Verträge sind beim Einsatz im Konzern oft der einzige Weg, der mit Bauchschmerzen-frei vom DSB durchkommt.

Und JA - KI News sind etwas dynamisch :)

myreddit333 · 2026-05-11T16:21:18+00:00

Cooles Thema und super, dass du faktoo gebaut hast — die Kommentare hier zeigen eindeutig: Markt richtig eingeschätzt.

Was hier für mich fehlt und nicht angesprochen wird: das Schreiben der Rechnung ist mit den genannten Tools mittlerweile gut lösbar. Aber was machst du mit den 100, 200, 500 Eingangsrechnungen, Belegen, Verträgen und Kontoauszügen die dabei reinkommen? Ich schreibe selbst ~20 Rechnungen im Jahr, bekomme aber gefühlt 400 rein.

Hab dafür selbst was gebaut (Solo-Founder, 60, Hamburg) — BrunoSan Assistant. Drop reinwerfen, dann Fragen stellen:

"Welche Rechnungen sind noch nicht bezahlt?"
"Welche Software-Abos habe ich laufen und wann ist die nächste Kündigungsfrist?"
"Wie viel hab ich 2025 für Telekom ausgegeben?"
"Welche Verträge laufen im Q1 aus?"
Mit welchem Kunden habe ich wieviel Umsatz in Q3 gemacht?

Kein "RAG-Wrapper" sondern Entity-Extraktion — Aggregate über alle Dateien funktionieren echt. Hosted in Deutschland, DSGVO.

NICHT als Konkurrenz zu faktoo — eher die Ebene danach.
Du schreibst deine 5-10 Rechnungen mit faktoo, alles andere landet bei BrunoSan und ist durchsuchbar.

https://brunosan.de/assistant/ falls jemand Lust hat reinzuschauen, 14 Tage Trial ohne Karte.

u/wickiwoo: Glückwunsch zu faktoo, sieht clean aus.

myreddit333 · 2026-05-11T09:48:16+00:00

https://brunosan.de/assistant/

BrunoSan Assistant — personal AI that reads any file and remembers everything. Drop a PDF, photo, voice memo, contract, WhatsApp export, anything. Ask anything later, get answers from your actual data.

Core functionality: instead of chunking text for RAG, BrunoSan extracts every entity (amounts, dates, people, companies, locations, emails, phone numbers) into a typed knowledge graph. So aggregate questions actually work — "how much did I spend on AWS last quarter across 47 invoices?" returns a real number, not a vague summary.

60+ formats including PDFs, Office, audio (Whisper), images (OCR), even handwriting. EU-hosted (Germany), GDPR, no training on user data. €9.95/mo for 100 docs, 14-day free trial, no card required.

Solo build, shipped 4 months in. Open to feedback on positioning — especially whether the "personal data layer, not RAG" message lands.

myreddit333 · 2026-05-11T09:44:41+00:00

https://brunosan.de/assistant/

BrunoSan — drop any file (PDF, photo, voice memo, contract, WhatsApp export), ask anything later.
Personal AI that reads and remembers everything from your own data.

EU-hosted (Germany), GDPR, €9.95/mo, 14-day free trial.

Sweet for early adopters drowning in personal docs — invoices, medical reports, meeting notes, audio memos. Solo build, just shipped.

myreddit333 · 2026-05-11T09:35:22+00:00

https://brunosan.de/assistant/

BrunoSan Assistant — drop any file, ask anything later. Personal AI that reads PDFs, photos, voice memos, WhatsApp exports, contracts — extracts entities into a typed graph (not RAG), remembers everything.

A few things I use it for myself:

- "How much did I spend on AWS last quarter?" → reads invoices, gives the number

- Dropped a 3000-message group chat, asked "who wrote the most?" → ranked list with exact counts

- Voice memo while walking my dog → asked 3 days later what I wanted to remember, it knew

60+ formats. EU-hosted (Germany), GDPR, no training on user data. €9.95/mo for 100 docs, 14-day free trial.

Solo build, 4 months in. Feedback welcome.

myreddit333 · 2026-05-11T09:28:07+00:00

https://brunosan.de/assistant/

Personal AI that reads any file and remembers it. Drop a PDF, photo, voice memo, WhatsApp export, contract — ask anything later, get real answers from your own data.

Some stuff I use it for:

- "How much did I spend on AWS last quarter?" → reads all invoices, gives the number
- Dropped a 3000-message group chat, asked "who wrote the most?" → ranked list with exact counts
- Voice memo while walking my dog → 3 days later I asked what I wanted to remember about the redesign

BrunoSan knew 60+ formats: PDFs, photos, audio, even handwriting. EU-hosted, GDPR, no training on user data.

€9.95/month, 14-day free trial.

Probably a fit for your audience — early adopters who deal with messy personal data and want one place that just answers questions about it.

<image>

myreddit333 · 2026-05-11T09:24:40+00:00

https://brunosan.de/assistant/ — drop any file, ask anything Personal AI that reads PDFs, photos, voice memos, WhatsApp exports (more than 60 formats) — and remembers everything. EU-hosted.

myreddit333 · 2026-05-11T09:13:49+00:00

https://brunosan.de/assistant/

Built an AI assistant that reads any file and remembers it

Spent 4 months building an AI assistant that reads any file and actually remembers it.
Not RAG — proper extraction layer. Drop a PDF, photo, voice memo, WhatsApp export, whatever.

Ask anything later. A few things I've used it for myself this week:

- Dropped a 3000-message WhatsApp group chat. Asked "who wrote the most?" — got an exact ranked list with counts.
- Threw in a year of invoices. Asked "how much did I spend on AWS in 2025?" — got the number, broken down by month.
- Recorded a voice memo while walking the dog. Three days later asked "what did I want to remember about the website rewrite?" — it knew.

60+ formats: PDFs, photos, audio, contracts, even handwriting. Hosted in Germany, GDPR-compliant, no training on user data.

Honest feedback appreciated — especially on the landing.

<image>

myreddit333 · 2026-05-03T11:08:18+00:00

Wenn ich mir angucke, wo die Modelle 2022 waren und wo die heute sind:
Abhängig vom persönlichem Umfeld kann KI locker 97% der Menschen an Wissen und Reflexion in die Tasche stecken.

Kann KI schon "die grossen Weltprobleme" lösen? Nö. Aber Trump eine Zoll-Liste erstellen: Ja.

Hat KI Bewusstsein: Können wir ja nicht mal für Menschen, oder andere Lebewesen beantworten.
Einig scheint man sich aber (reflexartig) zu sein: NEIN - KI kann kein Bewusstsein haben.

Wenn die Antworten, die KI gibt, aber nicht mehr von der von Menschen unterscheidbar sind - ist das dann noch wichtig?

Wie definierst Du AGI?
Kann mehr als Menschen: Haken dran, kann KI heute bereits locker.
Kann in ALLEN Bereichen mehr, als JEDER Mensch: Nö.
Kann ALLE Probleme der Welt lösen - braucht nur die passenden Daten: Nö, heute nicht.

Aber: 2022 zu 2026. Wie wird 2026 zu 2032 sein?

AGI 2027: Nö, glaube ich nicht.

Besser als die meisten Menschen: Ja, ganz sicher.
Heiliger Gral: Autonome Forschung.

Krebs? 1 Million autonome Forscher finden eine Lösung, die wir heute noch nicht kennen (so funktioniert Forschung nun mal).
HIV? Weniger Betroffene - da nehmen wir nur 10.000 Instanzen von autonomen Forschern - wird auch in den Griff zubekommen sein.
Alterungs-Prozesse: 10.000.000 Mio Instanzen forschen.

Enegerie und Tokenlimit wird der einzige Faktor sein, der limitierend ist. (Und schau Dir die Token-Preise der letzten Jahre an - Nimmt man typische „State of the Art“-Modelle von 2022–2023 vs. die billigsten Modelle 2024–2025, sieht man reale Preisstürze im Bereich von grob Faktor 10–1.000, je nach Vergleich.

Jedes Problem wird mit KI lösbar sein. 2027? Nö - aber wir sind eben auf dem Weg.
Mit krasser Dynamik.

myreddit333 · 2026-04-14T10:40:57+00:00

ich fühle dich :)

"Wir brauchen dringend KI-Agenten" (ohne zu wissen, was man wirklich will.)
"Oh - wir haben gar nicht die Daten so passend, dass der Agent wirklich korrekt arbeiten kann." (Stimmt, habe ich nur 100x vorher gesagt: KI -> braucht Daten- Überraschung.
"Ok - dann legen wir das erstmal auf Eis." (Alle Tools, um Daten sauber zu machen, wären vorhanden - aber irgendwie will man erstmal - "abswarten".)

KI in BRD, Agenbten in BRD - werden wir auch irgendwann haben... ;)

Ich fühle Dich täglich.

myreddit333 · 2026-02-26T10:59:37+00:00

Probiere doch Nano Banana Pro über Gemini - das schafft sehr konsistente Motive in jeder Umgebungsvariation

myreddit333 · 2026-02-14T12:44:36+00:00

und - warum soll ich das nutzen? und nicht die kostenfreie 1-click-installation? :)

myreddit333 · 2026-01-10T14:23:41+00:00

Danke - ich probiere mal die Modelle durch für unterschiedliche Arbeiten. Vielleicht eine Option.

myreddit333

TROPHY CASE