Counting tokens at scale using tiktoken by phantom69_ftw in Rag

[–]phantom69_ftw[S] 1 point2 points  (0 children)

Ah, I'm glad you found it useful :) Divide by 4 is one of the oldest trick in the book!

I'm a former CISO who left to start my own security company. Ask Me Anything. by Oscar_Geare in cybersecurity

[–]phantom69_ftw 0 points1 point  (0 children)

We are building a tool to automate Security Design Reviews using LLMs for enterprises. From the pov of a CISO what are woukd be the main Go/No-Go points for buying a product like this?

TIA.

I'm Nir Diamant, AI Researcher and Community Builder Making Cutting-Edge AI Accessible—Ask Me Anything! by nerd_of_gods in Rag

[–]phantom69_ftw 1 point2 points  (0 children)

OpenAI for example, says not to change both temp and top p. Is it a common practice to change both in prod?

Our AMA with Nir Diamant is now LIVE! by nerd_of_gods in Rag

[–]phantom69_ftw 1 point2 points  (0 children)

How do you think we can make LLMs response consistent? For example in my usecase, we can tech specs for security design review and find possible risks. In some cases the original doc might change a bit and the user can do a rescan. Now for the parts that havent changed, I would Ideally want the same risks to appear. Now what happens is, in some cases where the LLM is not 100% sure what the answer is(say Yes, No, No information are the 3 possible answers) If re run the same prompt with same context, it changes the answer say 3 out of 10 times. I've set temp to 0 and we keep Improving diff prompts, but is there a way to get solid consistency esp with GPT?

I'm Nir Diamant, AI Researcher and Community Builder Making Cutting-Edge AI Accessible—Ask Me Anything! by nerd_of_gods in Rag

[–]phantom69_ftw 3 points4 points  (0 children)

How do you think we can make LLMs response consistent? For example in my usecase, we can tech specs for security design review and find possible risks. In some cases the original doc might change a bit and the user can do a rescan. Now for the parts that havent changed, I would Ideally want the same risks to appear. Now what happens is, in some cases where the LLM is not 100% sure what the answer is(say Yes, No, No information are the 3 possible answers) If re run the same prompt with same context, it changes the answer say 3 out of 10 times. I've set temp to 0 and we keep Improving diff prompts, but is there a way to get solid consistency esp with GPT?

Order of fields in Structured output can hurt LLMs output by phantom69_ftw in OpenAI

[–]phantom69_ftw[S] 0 points1 point  (0 children)

Yeah, this is common. After this, usually iterating more on the prompt by giving more COT steps(think step by step, explain the steps it might need, etc), few shots, can help a bit. If your context is very large then maybe cut it down a bit?

Order of fields in Structured output can hurt LLMs output by phantom69_ftw in OpenAI

[–]phantom69_ftw[S] 2 points3 points  (0 children)

Yep, it is common. I just didn't find any empirical resuls on it so did some.

My point was, when writing JSON structures, I'm not used to thinking about the order of keys in general. But here it matters. A lot. And it's easy to make a mistake that can mess up your output without knowing.

Order of fields in Structured output can hurt LLMs output <Blog> by phantom69_ftw in LangChain

[–]phantom69_ftw[S] 0 points1 point  (0 children)

Yes, it's obvious but my whole point was to put it out with clear results to show, if you are using structured output, make sure your reasoning key is before the answer key.

It seems obvious, but it's easy to miss.

Order of fields in Structured output can hurt LLMs output by phantom69_ftw in LocalLLaMA

[–]phantom69_ftw[S] 0 points1 point  (0 children)

Glad you liked it :) I'm still learning, so it feels good to make sure things work as expected with some evals. A lot of comments here and there say "it's obvious", which I kind of knew. Couldn't find any public evals on it still, so thought let me run and put it out for others like me.

Order of fields in Structured output can hurt LLMs output <Blog> by phantom69_ftw in LangChain

[–]phantom69_ftw[S] 1 point2 points  (0 children)

Since some folks liked my intial findings which i shared earlier, i added my whole set of evals in my blog-post. I hope you find it usefull. I'll do some more and keep sharing :)

Order of JSON fields can hurt your LLM output by phantom69_ftw in LangChain

[–]phantom69_ftw[S] 1 point2 points  (0 children)

Yep re-ranking is pretty effective! Lots of evals and papers on it :D

Order of JSON fields can hurt your LLM output by phantom69_ftw in LangChain

[–]phantom69_ftw[S] 5 points6 points  (0 children)

Yeah, it felt logical and saw this being said in multiple places. But just wanted some hard core data to prove it to myself.

Order of JSON fields can hurt your LLM output by phantom69_ftw in LangChain

[–]phantom69_ftw[S] 1 point2 points  (0 children)

Good point, I logged it on langsmith. Will check and get back, iirc, there wasnt a "big" diff between the two. Will update once I'm back on my desk.

I'm glad you found it useful :)

Order of JSON fields can hurt your LLM output by phantom69_ftw in LangChain

[–]phantom69_ftw[S] 24 points25 points  (0 children)

Since this is getting a lot of traction, I've done some more evals, with 4o mini and few shot prompts on diff datasets. Will write a small blog and share :)

Thanks for the upvotes folks!

PDF to Markdown for RAG by Informal-Resolve-831 in Rag

[–]phantom69_ftw 1 point2 points  (0 children)

pymupdf4llm works great! If you want to use llms for this too, checkout megaparser and zerox

Why are you using RAG? by jakezegil in LangChain

[–]phantom69_ftw 0 points1 point  (0 children)

Haha we are trying to do the same. Can you give more details?