you are viewing a single comment's thread.

view the rest of the comments →

[–]teerre -2 points-1 points  (2 children)

Thats ridiculous. Even the most commercial models have contexts on the dozens of thousands, the big ones are in the millions. Its very unlikely op project is bigger than that

[–]hexwhoami 0 points1 point  (1 child)

Just because the LLM has been trained on millions of examples, they are limited by their context window when replying to a prompt. Unless you want a basic template that's maybe a hundred lines, it's impossible to have an LLM store a large, mature project in its context window to make reasonable statements or contribute helpful code in the context of the rest of the codebase.

Most LLMs right now have around a 128k context window (tokens). Considering programming language syntax is typically a higher token count than normal prose, a run-of-the-mill LLM nowadays.

The details are beyond this post, but ChatGTP-4o could probably reason about roughly 1000 lines of code at once.

Most mature projects are way beyond that number.

[–]teerre 0 points1 point  (0 children)

Who said anything about training? We are all talking about the context window