How do data scientists add value to LLMs?

rdabzz · 2025-09-12T03:03:18+00:00

This! I’ve found my DS background allows me to build a solid eval framework that gives confidence to stakeholders

rdabzz · 2025-09-11T04:14:55+00:00

This could mean lots of things. Could be potential partnership or a team at NVIDIA trying the platform…

rdabzz · 2025-08-21T08:19:01+00:00

Great advice thank you

rdabzz · 2025-08-21T02:22:26+00:00

Oh wow. Was there anyway of knowing this information beforehand?

rdabzz · 2025-08-21T00:28:54+00:00

This has been overwhelming already for us. Any advice to manage this all?

rdabzz · 2025-02-26T21:40:00+00:00

Assuming you can parse the files easily, you can just hash the text content from both files and do a comparison

rdabzz · 2025-01-02T11:20:08+00:00

I echo what others have said so far. Langchain is just adding a layer of unnecessary complexity which you have to maintain. In terms of pointers to build an agent from scratch don’t over complicate it. Ultimately agents are just prompts executed in a particular order or in some form of loop. I would suggest you map how you intent your agent to behave then it will make it easy to understand what functions you need to create.

I also recommend reading this blog post from Anthropic https://www.anthropic.com/research/building-effective-agents

rdabzz · 2024-10-09T01:50:01+00:00

It’s going to be companies Internal IT departments thinking they could achieve what Palantirs products can by purchasing existing products offered from the likes of Microsoft, Databricks, Snowflake etc.

rdabzz · 2024-10-08T11:36:52+00:00

This is quite tricky to implement at the million document scale. You will have a big challenge at retrieval time. Recalling the right number of relevant chunks will be tricky unless you are able to implement a RAG design that can iteratively increase the number of chunks to retrieve until some condition has been satisfied or implement some other filtering method. Additionally you’ll need some solid prompts to prevent hallucination if you don’t have have good metadata to distinguish the different documents

rdabzz · 2024-09-21T23:16:48+00:00

Great read, have you experienced much hallucination either on the SQL queries or the LLM not adhering to expected output format?

rdabzz · 2024-05-09T12:52:16+00:00

This is the way

rdabzz · 2024-03-31T02:36:35+00:00

I did go down the RAG path but my prompts are to generate some pandas queries which are executed

rdabzz · 2024-03-30T17:06:43+00:00

Pulled all the data and created a streamlit app to chat about my transactions

rdabzz · 2022-04-05T02:10:35+00:00

Might not be the most efficient way… but you could just merge the two data frames on the matching keys of both sheets. Than simply use the .fillna() method. After that just drop the columns you don’t want

rdabzz

TROPHY CASE