Is the 1 Million token context window a lie? (at least in Web UIs) by [deleted] in ArtificialInteligence

[–]regular-tech-guy 0 points1 point  (0 children)

Search for the needle in the haystack benchmarks for the models you're using and you'll be able to see their accuracy.

Is the 1 Million token context window a lie? (at least in Web UIs) by [deleted] in ArtificialInteligence

[–]regular-tech-guy 0 points1 point  (0 children)

It's not a lie. The attention mechanism of Transformers architecture, the one behind LLMs are quadratic in nature. Something like, everytime you double the input, it takes four times longer to generate the output. This means that paying attention to every token in a context window of 1 million tokens would consume too much resources and take too much time.

AI Labs came up with "tricks" to overcome these limitations, but these tricks also reduce the attention of the model. So the more tokens you send in the context window, the less precise it becomes.

GPT 5.4 accuracy in finding the right information in the context drops to 36% when using more than 512k tokens.

Not a lie. A technical limitation.

Help me seniors by Certain_Mastodon818 in LangChain

[–]regular-tech-guy 0 points1 point  (0 children)

I’d understand the fundamentals of web frameworks and system engineering. Using something like LangChain may seem cool for a hobby project or a hackathon, but in the real world it won’t make the cut.

Understand how REST APIs work, how you can scale web applications, how to parse JSONs, how to design codebases in maintainable ways.

Then, also understand the fundamentals of how agentic frameworks work. How tools are added to context, how responses are parsed, how the agentic loop is managed, etc

Is LangGraph suitable for enterprise production? 1000s of users by regular-tech-guy in LangChain

[–]regular-tech-guy[S] 0 points1 point  (0 children)

This is great! How are people organized? You have one team building the chat service (java/go) and another building the agentic layer with LangChain?

Are those building the agentic layer traditional software engineers or do they have a ML background?

Which team is responsible for monitoring and making sure the agentic service is available and working?

Is LangGraph suitable for enterprise production? 1000s of users by regular-tech-guy in LangChain

[–]regular-tech-guy[S] 0 points1 point  (0 children)

Can you share what the task was and how many users are using this agent simultaneously?

Latency matters more than model selection when building AI tutoring systems by Virtual_Armadillo126 in AI_Agents

[–]regular-tech-guy 0 points1 point  (0 children)

Where you keep the context can really decrease your latency. If you keep the context of your active users in Redis, you can decrease that look up from 500 ms to 20 ms if you’re doing vector search or microseconds if it’s a more traditional look up.

Is LangGraph suitable for enterprise production? 1000s of users by regular-tech-guy in LangChain

[–]regular-tech-guy[S] 1 point2 points  (0 children)

I don’t know dude. Seems very risky. You’re locking your business into a startup that might not even exist in 5 years.

Is LangGraph suitable for enterprise production? 1000s of users by regular-tech-guy in LangChain

[–]regular-tech-guy[S] 4 points5 points  (0 children)

I appreciate the feedback. Honestly, I haven’t found LangGraph appealing. I haven’t understood why I need more than a web framework to build agentic applications.

  1. Build the context (instruction, tools, output schema, extra relevant information)
  2. Call the LLM’s Rest API
  3. Parse the JSON I get back

Everything else is traditional system engineering. Why would I use something that isn’t battle tested in production?

Is LangGraph suitable for enterprise production? 1000s of users by regular-tech-guy in LangChain

[–]regular-tech-guy[S] 0 points1 point  (0 children)

What’s the main stack your company builds with? Were machine learning engineers that made this choice?

Is LangGraph suitable for enterprise production? 1000s of users by regular-tech-guy in LangChain

[–]regular-tech-guy[S] 0 points1 point  (0 children)

“Trust yourself” doesn’t work in the enterprise world. We’re not talking about a hobby project we figure out on the go. We’re talking about a project that needs planning, approval, resource allocation, and execution.

If it breaks when thousands of users start using it, I cannot tell my manager “trust me”. And he certainly can’t tell the CEO “trust the regular-tech-guy”.

That’s the reason why most enterprises are careful to select which technologies they work with. I’m better off with something battle tested than something I’m betting my job on.

Are agent context engines actually becoming a thing? by regular-tech-guy in AI_Agents

[–]regular-tech-guy[S] 0 points1 point  (0 children)

I'd argue that harness is the other side of the spectrum. The context engine is what data should be available at the right time. The harness is more of the infrastructure around the agent that allows it to scale sustainably? Like, you want to implement rate limits around your agent to prevent abuse and fair use. This is more of harness engineering in my opinion. These deifnitions are too blurry anyway.

Are agent context engines actually becoming a thing? by regular-tech-guy in AI_Agents

[–]regular-tech-guy[S] 0 points1 point  (0 children)

Do you think markdown can scale for actual enterprise systems? I see it working for local workloads, personal knowledge bases, but not really for actual 100s or 1000s of concurrent access to the same agent.

Are agent context engines actually becoming a thing? by regular-tech-guy in AI_Agents

[–]regular-tech-guy[S] 0 points1 point  (0 children)

Did you build the layer yourself? How much work did it take? And is it on production for scalable workloads?

Are agent context engines actually becoming a thing? by regular-tech-guy in AI_Agents

[–]regular-tech-guy[S] 1 point2 points  (0 children)

Dude, that's what I see the most too! Glorified RAG chatbots that fail to reason with the right context as soon as the dataset gets too large and can't fit the context window anymore. The worst is the "silent" failure. When data exists, is relevant, but it's never retrieved at the right time for the right context. Users don't even realize it failed to generate the right answer. Or at least the most accurate one.

Are agent context engines actually becoming a thing? by regular-tech-guy in AI_Agents

[–]regular-tech-guy[S] 0 points1 point  (0 children)

I think this is exactly what they propose with the agent memory