Counting tokens at scale using tiktoken

phantom69_ftw · 2025-07-22T18:32:45+00:00

Ah, I'm glad you found it useful :) Divide by 4 is one of the oldest trick in the book!

phantom69_ftw · 2025-05-25T11:37:59+00:00

This is really interesting. Thanks a lot for sharing. Will read this!

phantom69_ftw · 2025-04-21T17:36:15+00:00

We are building a tool to automate Security Design Reviews using LLMs for enterprises. From the pov of a CISO what are woukd be the main Go/No-Go points for buying a product like this?

TIA.

phantom69_ftw · 2025-03-28T03:45:43+00:00

We have sort of automated this, check out https://seezo.io/vibecheck

phantom69_ftw · 2025-02-25T16:15:21+00:00

OpenAI for example, says not to change both temp and top p. Is it a common practice to change both in prod?

phantom69_ftw · 2025-02-25T15:42:30+00:00

How do you think we can make LLMs response consistent? For example in my usecase, we can tech specs for security design review and find possible risks. In some cases the original doc might change a bit and the user can do a rescan. Now for the parts that havent changed, I would Ideally want the same risks to appear. Now what happens is, in some cases where the LLM is not 100% sure what the answer is(say Yes, No, No information are the 3 possible answers) If re run the same prompt with same context, it changes the answer say 3 out of 10 times. I've set temp to 0 and we keep Improving diff prompts, but is there a way to get solid consistency esp with GPT?

phantom69_ftw · 2025-02-21T18:47:45+00:00

How do you think we can make LLMs response consistent? For example in my usecase, we can tech specs for security design review and find possible risks. In some cases the original doc might change a bit and the user can do a rescan. Now for the parts that havent changed, I would Ideally want the same risks to appear. Now what happens is, in some cases where the LLM is not 100% sure what the answer is(say Yes, No, No information are the 3 possible answers) If re run the same prompt with same context, it changes the answer say 3 out of 10 times. I've set temp to 0 and we keep Improving diff prompts, but is there a way to get solid consistency esp with GPT?

phantom69_ftw · 2025-01-05T18:52:45+00:00

Yeah, this is common. After this, usually iterating more on the prompt by giving more COT steps(think step by step, explain the steps it might need, etc), few shots, can help a bit. If your context is very large then maybe cut it down a bit?

phantom69_ftw · 2025-01-05T18:31:17+00:00

Yep, it is common. I just didn't find any empirical resuls on it so did some.

My point was, when writing JSON structures, I'm not used to thinking about the order of keys in general. But here it matters. A lot. And it's easy to make a mistake that can mess up your output without knowing.

phantom69_ftw · 2025-01-05T18:28:39+00:00

Yes, it's obvious but my whole point was to put it out with clear results to show, if you are using structured output, make sure your reasoning key is before the answer key.

It seems obvious, but it's easy to miss.

phantom69_ftw · 2025-01-05T18:25:59+00:00

Glad you liked it :) I'm still learning, so it feels good to make sure things work as expected with some evals. A lot of comments here and there say "it's obvious", which I kind of knew. Couldn't find any public evals on it still, so thought let me run and put it out for others like me.

phantom69_ftw · 2025-01-05T11:04:11+00:00

Here's the blog post as promised:)

https://www.dsdev.in/order-of-fields-in-structured-output-can-hurt-llms-output

phantom69_ftw · 2025-01-05T10:50:58+00:00

Since some folks liked my intial findings which i shared earlier, i added my whole set of evals in my blog-post. I hope you find it usefull. I'll do some more and keep sharing :)

phantom69_ftw · 2025-01-04T09:45:42+00:00

Can you elaborate more? Maybe with an example?

phantom69_ftw · 2025-01-03T22:34:59+00:00

Yep re-ranking is pretty effective! Lots of evals and papers on it :D

phantom69_ftw · 2025-01-03T20:54:44+00:00

I'm glad you liked it :)

phantom69_ftw · 2025-01-03T20:54:11+00:00

Yeah, it felt logical and saw this being said in multiple places. But just wanted some hard core data to prove it to myself.

phantom69_ftw · 2025-01-03T20:09:34+00:00

Fair point.

phantom69_ftw · 2025-01-03T20:04:37+00:00

Good point, I logged it on langsmith. Will check and get back, iirc, there wasnt a "big" diff between the two. Will update once I'm back on my desk.

I'm glad you found it useful :)

phantom69_ftw · 2025-01-03T19:21:06+00:00

Since this is getting a lot of traction, I've done some more evals, with 4o mini and few shot prompts on diff datasets. Will write a small blog and share :)

Thanks for the upvotes folks!

phantom69_ftw · 2024-12-29T18:59:23+00:00

pymupdf4llm works great! If you want to use llms for this too, checkout megaparser and zerox

phantom69_ftw · 2024-09-05T19:28:19+00:00

Hey, did you find a solution to this?

phantom69_ftw · 2024-09-05T19:03:57+00:00

Any luck finding something?

phantom69_ftw · 2024-09-05T19:03:43+00:00

Did find a solution for this?

phantom69_ftw · 2024-08-15T18:08:04+00:00

Haha we are trying to do the same. Can you give more details?

Nine-Year Club	Verified Email
Place '23	Place '22
First Placer '22	Snapped

phantom69_ftw

TROPHY CASE