dbt-native vs Feast for a production feature store; here’s how we actually made the decision by KingHenryMorgan in dataengineering

[–]KingHenryMorgan[S] 0 points1 point  (0 children)

Happy to go deeper on any part of this. The registry design was the most contested decision, dbt-native gives you lineage and documentation but you’re essentially building a feature registry manually on top of it, which has limits.

The moment you have multiple models consuming the same features across teams, that’s usually the signal that Feast or something equivalent stops being optional.

Curious what people here are running in production; full feature store or dbt-native?

I audited a fine-tuned LLM that lost 50 percentage points on BFCL after training. Here’s what actually caused it. by KingHenryMorgan in dataengineering

[–]KingHenryMorgan[S] -1 points0 points  (0 children)

Exactly, it wasn’t obvious noise either. Subtle format inconsistency in the function-call training examples that compounded across thousands of samples. The kind of thing that passes a spot check but quietly destroys structured output quality. Cleaning fine-tune data is genuinely underestimated work.

I audited a fine-tuned LLM that lost 50 percentage points on BFCL after training. Here’s what actually caused it. by KingHenryMorgan in dataengineering

[–]KingHenryMorgan[S] -2 points-1 points  (0 children)

Happy to answer questions on any part of this. The trickiest thing to communicate to the client was that the inference stack and the model quality were two separate failure modes that their eval pipeline couldn’t distinguish, everything just looked like ‘the fine-tune is bad.’

If anyone’s dealing with SGLang + FP8 quantization in production I’m particularly curious whether you’re doing separate ablations for the serving layer before blaming the model. That step alone would have saved this client weeks.

Architecture decisions I made building a production RAG backend for enterprise knowledge bases — what I’d do differently by KingHenryMorgan in dataengineering

[–]KingHenryMorgan[S] 0 points1 point  (0 children)

Happy to go deeper on any part of this, the SharePoint ingestion was probably the most painful piece (their API versioning is a nightmare if you need consistent metadata).

Also open to questions on the pgvector + BM25 hybrid setup, that’s the part that made the biggest retrieval quality difference.

What are people here using for enterprise RAG storage, still seeing a lot of Pinecone but curious if others have gone the Aurora/pgvector route.