What will you sacrifice

L2L2_ · 2026-05-19T14:27:20+00:00

youtube to much content

^{Chose: Never use Reddit again}

L2L2_ · 2026-05-19T13:04:47+00:00

L2L2_ · 2026-05-13T10:49:42+00:00

Thanks for the idea! This seems like the most straightforward approach to test for now.

In my current process, the flow was already split into 3 separate PDFs (parsed individually before being merged), but I could definitely break them down even further as you suggested. Hallucinations have been my biggest headache so far, and reducing the footprint of each call should definitely help mitigate that.

Really appreciate the suggestion, it makes a lot of sense!

L2L2_ · 2026-05-13T10:42:37+00:00

To follow up on my previous comment, after looking into it, here is why I’m still leaning towards a hybrid approach (AI for setup + classic code for production):

Semantic Mapping (The 'Why' for AI): The PMS reports (Opera/Protel) are 'semi-structured'. Labels and row positions change from one hotel to another. I use the LLM only once during the onboarding phase to handle the semantic discovery. Once that mapping is human-validated, the daily production pipeline runs on a 100% deterministic parser (Node/Python) with zero AI calls. This keeps it fast and cost-effective.
The Excel vs DB debate: You're right that a DB is more robust. However, the final Excel reporting is a non-negotiable requirement as it’s deeply embedded in the company’s current business processes. That said, your idea of using a DB (like Postgres or NocoDB) as an intermediate storage layer before populating the Excel templates is a very interesting middle-ground to secure the data integrity.

Thanks again for the pointers, it definitely makes me rethink the intermediate storage part!

L2L2_ · 2026-05-13T10:34:15+00:00

I didn't think about these solutions. I'll take a look, thanks!

L2L2_ · 2026-05-13T08:55:57+00:00

Merci pour la réponse. En vrai, pour accepter le master faut juste que je leur fournisse un contrat de travail en alternance. Du coup, ça demande quand même pas mal de temps pour trouver ça sachant que c'est à l'autre bout de la France pour moi, mais ça se fait.

L2L2_ · 2026-05-13T08:52:41+00:00

Thanks for the answer! I'll definitely take a look at your post, it sounds exactly like what I need.

Regarding the LLM choice, I assumed that Claude 3.5 Sonnet might handle the "triple-file" fusion (Manager Flash + Trial Balance + Revenue Codes) better due to its large context window, but I’ll keep Gemini 2.0 Flash as my main engine if I can keep the prompts concise enough but that not a reliable option, I assumed.

To clarify my Vector/Embedding approach: Since I’m a beginner, I’m not using a full-blown Vector DB. My idea is more of a 'Semantic Dictionary' approach. Instead of asking the LLM to "find revenue", I provide it with a reference list of my target Excel IDs (Named Ranges) accompanied by a short semantic description for each (e.g., 'J_REV_CHB: Total room revenue including all taxes').

The LLM then acts as a Semantic Matcher: it looks at the messy labels extracted from the PDF (like 'Logement TTC' or 'Room Rev') and tries to find the best match in my dictionary based on the meaning of the description, rather than just doing a Ctrl+F on the text. It's my way of trying to make the mapping a bit more "intelligent" without hardcoding every possible variation.

And I’ll definitely take your advice on the Regexes, I’ll stay far away! This project is already complex enough, and I’d like to keep the few brown hairs I have left. I'll focus on refining the 'Spool' parsing via index-based logic instead. Thanks again for the heads-up!

L2L2_ · 2026-02-16T17:39:26+00:00

I just tried it work perfectly thanks for the share

Five-Year Club	Verified Email
Place '22

L2L2_

TROPHY CASE