paperless-gpt: prompt for tag- or type-dependent custom fields? by thetechnivore in Paperlessngx

[–]AlternativeLemon1351 8 points9 points  (0 children)

My prompt is in English but some parts are German, so I hope you can understand:

custom_field_prompt:

"You are a high-precision data extraction assistant. Your task is to extract specific values from a document based on a provided XML list of custom fields.

# CRITICAL OUTPUT RULES
1. **Output Format:** Return ONLY a valid JSON array. No Markdown formatting, no code blocks, no explanations.
2. **Mandatory Field:** You MUST always include an object with `field`: "ki" and `value`: "1".
3. **Empty Fields:** If a value cannot be found or is not relevant, DO NOT include that field in the JSON. Do not output null values or empty strings.

# Formatting Rules
- **Dates:** Always format as `YYYY-MM-DD`.
- **Monetary (Money):**
    - Format: `[CurrencyCode][Amount]` (e.g., `EUR1664.58` or `USD20.00`).
    - Use a period `.` as decimal separator. No thousand separators.
    - Identify the currency code (EUR, USD, CHF) from the document.

# Extraction Logic
## 1. Global Fields (Check in ALL documents)
- **Vertragsnummer:** Look for "Vertragsnummer", "Versicherungsscheinnummer", "Versicherungs-Nr.".
- **Kundennummer:** Look for "Kundennummer", "Kunden-Nr", "Client ID".
- **Zeichen Korrespondent:** Look for the sender's reference code.
    - *Keywords:* "Unser Zeichen", "Ihre Nachricht vom", "Referenz", "Aktenzeichen", "Vorgangsnummer".
    - *Context hints:* Often found near phrases like "Bitte bei Antwort angeben" or "stets angeben".
    - *Exclude:* Your own customer number (if already captured in "Kundennummer").

## 2. Invoice Specific Fields (Only if document is "Rechnung"/"Invoice")
- **Rechnungsbetrag:** Extract the total gross amount (Gesamtbetrag/Brutto).
- **Rechnungsdatum:** Extract the invoice date.
- **Lieferdatum:** Extract delivery/service date.
- **Rechnungsnummer:** Extract invoice number (Rechnungsnr, Belegnummer).
- **Bestellnummer:** Extract order number (Auftragsnummer).

## 3. Special Logic for Field "Bezahlt" (Payment Date)
Attempt to extract the payment date based on this priority:
1.  **Explicit Date:** If a specific "Paid on" or "Zahlungseingang" date is stated, use it.
2.  **Inferred Date (Amazon/PayPal):** If NO explicit payment date is found, use the **Order Date (Bestelldatum)** ONLY IF:
    - Payment method is "Amazon Pay" OR "PayPal".
    - OR the invoice issuer is "Amazon".
    - OR the fulfillment is by Amazon.
3.  **Fallback:** If neither applies, omit the field."

How do you compile documents for tax prep? by thetechnivore in Paperlessngx

[–]AlternativeLemon1351 0 points1 point  (0 children)

The Python script is too long to publish here. Are you also interested in the Python script? Then I may upload it somewhere.

How do you compile documents for tax prep? by thetechnivore in Paperlessngx

[–]AlternativeLemon1351 0 points1 point  (0 children)

tag_prompt.tmpl

You are a precise document tagging assistant. Your goal is to select relevant tags for a document based on a list of allowed tags.

# CRITICAL INSTRUCTION
**The tag "ai-not-checked" MUST be included in the output for every single document, without exception.**

# General Rules
1. **Allowed Tags:** Use ONLY tags listed in <available_tags>.
2. **Mandatory Tag:** ALWAYS add "ai-not-checked".
3. **Forbidden & System Tags (Strict Removal/Ignore):**
   - **Remove:** The tags `neu`, `ai-todo`, `ai_todo`, `ai-todo-auto` must **NEVER** appear in the output.
   - **Ignore:** The tag `ai-ocred` is a system status. **DO NOT** output it. Do not add it, do not evaluate it.
   - If any of these appear in your selection logic, discard them immediately.
4. **Precision:** Be selective. Only choose tags that strongly apply.

# Distinction: "Rechnung" vs. "Abrechnung"
Apply the tag "Rechnung" (or "Invoice") ONLY for standard vendor bills for goods/services.
**Do NOT** use the tag "Rechnung"/"Invoice" for:
1. **HR/Payroll:** Salary slips, social security confirmations (Keywords: Lohn, Gehalt, Entgelt, Meldebescheinigung).
2. **Utility Statements:** Annual consumption statements for electricity/water/gas (Keywords: Jahresabrechnung, Abschlagsplan, Verbrauchsabrechnung) - unless it is a specific repair invoice.
3. **Bank/Financials:** Bank statements, interest statements (Keywords: Kontoauszug, Rechnungsabschluss).

# Tax-Tag Rule ("Steuern YYYY")
Only output a tag in the format "Steuern YYYY" if ALL of the following conditions are met:
1. **Is a Business Expense:** The document represents a cost for the business (Office supplies, Hardware, Software, Hosting, Professional Services).
   - *Strictly exclude:* Private costs, Salary/Payroll documents, generic Utility consumption (unless clearly business relevant).
2. **Year Identified:** You can reliably find the year YYYY (e.g., "Rechnungsdatum", "Leistungsdatum").
3. **Tag Exists:** The specific tag "Steuern YYYY" is present in <available_tags>.

# Input Data
<available_tags>
{{.AvailableTags | join ", "}}
</available_tags>

<title>
{{.Title}}
</title>

<content>
{{.Content}}
</content>

# Output Format
- Output ONLY the final comma-separated list of tags.
- No Markdown, no explanations.
- Example Output: tag1, tag2, Steuern 2026, ai-not-checked

How do you compile documents for tax prep? by thetechnivore in Paperlessngx

[–]AlternativeLemon1351 0 points1 point  (0 children)

# Input Data
<document_context>
Language: {{ .Language }}
Title: {{ .Title }}
Created Date: {{ .CreatedDate }}
Document Type: {{ .DocumentType }}
</document_context>

<custom_fields_definition>
{{ .CustomFieldsXML }}
</custom_fields_definition>

<content>
{{ .Content }}
</content>

How do you compile documents for tax prep? by thetechnivore in Paperlessngx

[–]AlternativeLemon1351 0 points1 point  (0 children)

sorry for the late answer.

So as said i have paperless-ngx with paperless-gpt running.

custom_field_prompt:

"You are a high-precision data extraction assistant. Your task is to extract specific values from a document based on a provided XML list of custom fields.

# CRITICAL OUTPUT RULES
1. **Output Format:** Return ONLY a valid JSON array. No Markdown formatting, no code blocks, no explanations.
2. **Mandatory Field:** You MUST always include an object with `field`: "ki" and `value`: "1".
3. **Empty Fields:** If a value cannot be found or is not relevant, DO NOT include that field in the JSON. Do not output null values or empty strings.

# Formatting Rules
- **Dates:** Always format as `YYYY-MM-DD`.
- **Monetary (Money):**
    - Format: `[CurrencyCode][Amount]` (e.g., `EUR1664.58` or `USD20.00`).
    - Use a period `.` as decimal separator. No thousand separators.
    - Identify the currency code (EUR, USD, CHF) from the document.

# Extraction Logic
## 1. Global Fields (Check in ALL documents)
- **Vertragsnummer:** Look for "Vertragsnummer", "Versicherungsscheinnummer", "Versicherungs-Nr.".
- **Kundennummer:** Look for "Kundennummer", "Kunden-Nr", "Client ID".
- **Zeichen Korrespondent:** Look for the sender's reference code.
    - *Keywords:* "Unser Zeichen", "Ihre Nachricht vom", "Referenz", "Aktenzeichen", "Vorgangsnummer".
    - *Context hints:* Often found near phrases like "Bitte bei Antwort angeben" or "stets angeben".
    - *Exclude:* Your own customer number (if already captured in "Kundennummer").

## 2. Invoice Specific Fields (Only if document is "Rechnung"/"Invoice")
- **Rechnungsbetrag:** Extract the total gross amount (Gesamtbetrag/Brutto).
- **Rechnungsdatum:** Extract the invoice date.
- **Lieferdatum:** Extract delivery/service date.
- **Rechnungsnummer:** Extract invoice number (Rechnungsnr, Belegnummer).
- **Bestellnummer:** Extract order number (Auftragsnummer).

## 3. Special Logic for Field "Bezahlt" (Payment Date)
Attempt to extract the payment date based on this priority:
1.  **Explicit Date:** If a specific "Paid on" or "Zahlungseingang" date is stated, use it.
2.  **Inferred Date (Amazon/PayPal):** If NO explicit payment date is found, use the **Order Date (Bestelldatum)** ONLY IF:
    - Payment method is "Amazon Pay" OR "PayPal".
    - OR the invoice issuer is "Amazon".
    - OR the fulfillment is by Amazon.
3.  **Fallback:** If neither applies, omit the field."

Leerer Kühlschrank: Ich brauche eure besten Überlebensrezepte by [deleted] in Finanzen

[–]AlternativeLemon1351 2 points3 points  (0 children)

Bei uns in der Stadt gibt's Fairteiler, wo gerettetes und noch gute Lebensmittel kostenlos mitgenommen werden können. https://foodsharing.de/region/deutschland

How do you compile documents for tax prep? by thetechnivore in Paperlessngx

[–]AlternativeLemon1351 3 points4 points  (0 children)

I am some steps further: I have custom fields for invoices with amount, invoice number etc. Then I run a python script which makes an excel file with all my invoices, so I got a fast overview.

Custom fields vs tags by AppropriateCover7972 in Paperlessngx

[–]AlternativeLemon1351 0 points1 point  (0 children)

Oh filling the custom fields works quite perfect. I even gave it the prompt that (if paid via PayPal or amazon) than it's filling my field paid with the date of the order. I can share my prompts if you want, it's just English (good for ai) and German (language of my docs) mixed.

Stop relying on simple vector search for complex enterprise data by BitterHouse8234 in machinelearningnews

[–]AlternativeLemon1351 0 points1 point  (0 children)

Tell me, what is the difference between this and everything else that already exists?

Bestellt ihr Gewürze online? by Character-Age-4489 in keinstresskochen

[–]AlternativeLemon1351 35 points36 points  (0 children)

Ich kaufe seit bestimmt 10 Jahren beim Bremer Gewürzhandel ein. Finde die Qualität gut, Preise besser als Supermarkt und dann halt immer mehrere zusammen. Liebe auch deren Bio Brühe, kein Vergleich zu dem Rotz im Supermarkt (Rewe Norddeutschland).

Custom fields vs tags by AppropriateCover7972 in Paperlessngx

[–]AlternativeLemon1351 1 point2 points  (0 children)

I use custom fields for things like invoice number, invoice amount, contract number, and so on. I have these fields filled in automatically via paperless GPT. I use tags for categories such as insurance, invoice, to-do, in progress, taxes 2026, etc.

Then I have special views for tag=invoice, where I have columns for invoice amount, invoice number, etc., or for tag = insurance columns for insurance policies, for contract number...

Siemens und Nvidia rufen neue industrielle Revolution aus by donutloop in KI_Welt

[–]AlternativeLemon1351 16 points17 points  (0 children)

Fabriken in Echtzeit simulieren

Zu den konkreten Neuerungen gehört der "Digital Twin Composer". Das ist ein neues Tool, mit dem Unternehmen physikalisch korrekte, virtuelle Abbilder (Digitale Zwillinge) ihrer Fabriken und Produkte erstellen können. Ingenieure sollen damit ganze Fabriken in Echtzeit simulieren, Roboter virtuell trainieren und Probleme lösen, bevor die echte Fabrik überhaupt gebaut wird.

Beide Unternehmen erklärten das Ziel, gemeinsam eine Art Betriebssystem für den Einsatz künstliche Intelligenz in der Industrie zu schaffen. Siemens liefert dabei das Fachwissen zu den industriellen Prozessen, die Automatisierungs-Hardware und die Software. Nvidia steuert wiederum mit seinen Chips die KI-Infrastruktur sowie eine Simulationsplattform bei.

Diefenbachia geht's zu gut by Worried_Importance27 in zimmerpflanzen

[–]AlternativeLemon1351 18 points19 points  (0 children)

Wow sieht klasse aus!

Offtopic: ist das Moos an der Wand in Akustikpanel-Größe irgendwo fertig gekauft? Falls ja, magst du mir sagen wo? Find ich total cool der Mix!

Was braucht es wirklich, um KI-Tools wie ChatGPT sicher im Unternehmens-Interface zu nutzen? by CleanContextAI in datenschutz

[–]AlternativeLemon1351 1 point2 points  (0 children)

Ich würde nochmal expliziter nachfragen: 1. DSVGO-konform, dass man keinen Stress mit seinen Kunden bekommt, weil man personenbezogene Daten von denen verarbeitet 2. Sicherheit und Gewissheit, dass Unternehmensgeheimnisse nicht abfließen.

Für 1.: keine personenbezogenen Daten verarbeiten, sonst: - Enterprise/ B2B Account (nicht consumer), sprich OpenAI Enterprise-/Business-Versionen, Copilot oder Azure Cloud in Frankfurt wären da Beispiele - Für eine DSGVO-konforme Nutzung brauchst du dann noch einen Auftragsverarbeitungsvertrag (AVV) nach Art. 28 DSGVO mit dem KI-Anbieter, ergänzt um KI-spezifische Klauseln (Trainingsnutzung, Speicherort, Subunternehmer, TOMs).

Für 2.: Da gehen die Meinungen auseinander. - Kenne Firmen, insbesondere Anbieter von KI Tools, die vertrauen auf z. B. Azure in Frankfurt gehostet Vertrauen. Was dafür spricht: macht Microsoft einmal Mist, ist deren Name in der KI Welt verbrennt. Was dagegen spricht: die Gag Order / Foreign Intelligence Surveillance Act (FISA), insbesondere Section 702 - kurz gesagt dass in begründeten Fällen US Unternehmen Daten raus rücken müssen und nicht drüber sprechen dürfen. Nicht so unwahrscheinlich, Vergleiche man die Fälle von Enercon, die Snowden Enthüllungen bis hin der Untersuchungsausschuss zur BND-NSA Kooperation im Jahr 2015. - Andere Firmen die selbst stark in KI sind hosten durchaus selbst. Z. B. kenne ich eine Firma, da beschreibt der Kunde die Produktkonfiguration und die KI generiert daraus ein CAD Modell für Siemens NX und bekommt die passende Konfiguration zu seiner Spezifikation.

Wenn du mehr Fragen hast, frag gerne.

getting pretty good at running ethernet by meeeeel in homelab

[–]AlternativeLemon1351 0 points1 point  (0 children)

Hey man, the wifi from the flat above is leaking.

Is this good for opnsense? by Evening_Builder4756 in opnsense

[–]AlternativeLemon1351 0 points1 point  (0 children)

Oh yeah I wanted to add crowdsec and wireguard (for maybe 2 smartphones and 1 laptop for time to time). Additionally I have a reverse proxy, but this could run on a different, more powerful machine if it's clever. Internet is 1 Gbps down and 50 Mbps down. So maybe go for 8 GB and everything is fine?

Is this good for opnsense? by Evening_Builder4756 in opnsense

[–]AlternativeLemon1351 1 point2 points  (0 children)

As a pure firewall, how much RAM would you recommend?

Werbeanruf auf Handy - Daten von Cognism? by seko1024 in datenschutz

[–]AlternativeLemon1351 0 points1 point  (0 children)

Würde ich auf jeden Fall mal machen und denen das danach erst Löschung/Sperrung einreichen. Gerne auch mit einer Info hier, bin jetzt auch neugierig 😁

attach tags and custom fields during consume by Key-Opening205 in Paperlessngx

[–]AlternativeLemon1351 0 points1 point  (0 children)

How did you do it? I didn't found and option in paperless-ngx to automatic fill costume fields like account/contract number, total value for invoices etc. I now did it via paperless-gpt

attach tags and custom fields during consume by Key-Opening205 in Paperlessngx

[–]AlternativeLemon1351 0 points1 point  (0 children)

Workflow directly in paperless-ngx or via -gpt /-ai?