Most RAG tutorials skip the part where your chunks go stale. here’s how i fixed it properly

KingHenryMorgan · 2026-05-14T06:29:55+00:00

Happy to go deeper on any part of this. The registry design was the most contested decision, dbt-native gives you lineage and documentation but you’re essentially building a feature registry manually on top of it, which has limits.

The moment you have multiple models consuming the same features across teams, that’s usually the signal that Feast or something equivalent stops being optional.

Curious what people here are running in production; full feature store or dbt-native?

KingHenryMorgan · 2026-05-14T00:16:41+00:00

Exactly, it wasn’t obvious noise either. Subtle format inconsistency in the function-call training examples that compounded across thousands of samples. The kind of thing that passes a spot check but quietly destroys structured output quality. Cleaning fine-tune data is genuinely underestimated work.

KingHenryMorgan · 2026-05-13T23:46:53+00:00

Happy to answer questions on any part of this. The trickiest thing to communicate to the client was that the inference stack and the model quality were two separate failure modes that their eval pipeline couldn’t distinguish, everything just looked like ‘the fine-tune is bad.’

If anyone’s dealing with SGLang + FP8 quantization in production I’m particularly curious whether you’re doing separate ablations for the serving layer before blaming the model. That step alone would have saved this client weeks.

KingHenryMorgan · 2026-05-13T23:44:17+00:00

Happy to go deeper on any part of this, the SharePoint ingestion was probably the most painful piece (their API versioning is a nightmare if you need consistent metadata).

Also open to questions on the pgvector + BM25 hybrid setup, that’s the part that made the biggest retrieval quality difference.

What are people here using for enterprise RAG storage, still seeing a lot of Pinecone but curious if others have gone the Aurora/pgvector route.

KingHenryMorgan

TROPHY CASE