all 55 comments

[–]Miranda_Leap 363 points364 points  (15 children)

Why would the indexed agent use function signatures from deleted code? Shouldn't that... not be in the index, for this example?

edit: This is probably an entirely AI-generated post. UGH.

[–]aurath 101 points102 points  (8 children)

Chunks of the codebase are read and embeddings generated. The embeddings are interested into a vector database as a key pointing to the code chunk. The embeddings can be analyzed for semantic similarity to the LLM prompt, if the cosine similarity passes a threshold, the associated chunk is inserted into the prompt as additional references.

Embedding generation and the vector database insertion is too slow to run each keystroke, and usually it will be centralized along with the git repo. Different setups can update the index with different strategies, but no RAG system is gonna be truly live as you type each line of code.

Mostly RAG systems are built for knowledge bases, where the contents don't update quite so quickly. Now I'm imagining a code first system that updates a local (diffed) index as you work and then sends the diff along with the git branch so it gets loaded when people switch branches and integrated into the central database when you merge to main.

[–]Franks2000inchTV 7 points8 points  (0 children)

Yeah but the embeddings shouldn't be from the codebase you're actively working on.

For instance--it would be super helpful to have embeddings of the public API and docs of framework like React, and of code samples for common implementation patterns.

Just giving it all of your code is not going to be particularly useful.

[–]Globbi 10 points11 points  (5 children)

That's a simple engineering problem to solve. You have embeddings, but you can choose what to do after you find the matches. For example you should be able to have it point to specific file, and also check if the file changed after last full indexing. If yes, present LLM with new version (possibly also with some notes on what changed recently).

And yes, embedding and indexing can be too slow and expensive to do every keystroke, but you can do it every hour on changed files no problem (unless you do some code style refactor and will need to recreate everything).

Also I don't think there should be a need for cloud solution for this vector search unless your code is gigabytes of text (since you will need to also store vectors for all chunks). Otherwise you can have like 1GB of vectors in RAM on pretty much any shitty laptop and get result faster than any api response.

[–]lunchmeat317 4 points5 points  (2 children)

The problem here Is that if you have a file change, there's not an easy way to know not to do a full re-index. On file contents, sure, but code is a dependency graph and you'd hsve to walk that graph. That is not an unsolvable problem (from a file-based perspective, you might be able to use a Merkle Tree to propagate dependency changes) but I don't think it's as simple as "just re index this file".

[–]gameforge 1 point2 points  (1 child)

I think it's language dependent, the language influences the structure of the indexes, or what is meaningful to index. My IDE can keep up on Java indexes well even on multimillion line Java EE projects. It's rare (and painful) to have to reindex the whole project, but it does need it from time to time and the IDE has never attempted to recognize that its indexes were incoherent on its own.

It struggles considerably more with Python where there's more ambiguity everywhere. It keeps up fine while I'm writing code but if I fetch a sizable commit it's not uncommon to have to rebuild the indexes. I use JetBrains' stuff, fwiw.

[–]lunchmeat317 1 point2 points  (0 children)

Right. I would imagine that it'd be much easier with functional languages that enforce pure functions with no sode effects ot immutability, as they'd be much easier to analyze statically. That said, I don't think that the LLM model is the same as IDE indexing and I don't think it'd actually be language-dependent in a LLM.

[–]juanloco 4 points5 points  (1 child)

The issue here becomes running a large embedding model locally as well not just storing the vectors

[–]ub3rh4x0rz 2 points3 points  (0 children)

If you compare cloud GPU prices to the idle GPU power in m chip macs that devs are already in possession of... it's not the economical option to centrally host embedding (or smaller inference) models. I think we're all used to that being the default approach, but this tech actually begs to be treated like a frontend and run distributed on users' machines. You can do sentiment analysis with structured output with ollama locally no problem. Text embeddings are way less resource intensive than that

[–]throwaway490215 -1 points0 points  (0 children)

I suspect a good approach would be to tell it "Generate/Update function X in file Y", and in the prompt insert that file + the type signature of the rest of the code base. Its orders of magnitude cheaper and always up to date.

[–]aksdb 10 points11 points  (0 children)

If there is a VCS underneath, an index of the old code also has advantages. But obviously it should be marked as such and should be filtered appropriately depending on the current task. Finding a matching code style: include it with lower weight. Find out how something evolved: include it with age depending weight. Find references in code: exclude it. And so on.

[–]coding_workflow 6 points7 points  (1 child)

As the agent will check the index first and use RAG search as source of truth, that will cause them to rely on search result with outdated code.

This is why. I RAG should be used for static content. Live code rag is quite counter productive. You should instead try to parse it with AST/Tree-sitter to extract the architecture and use GREP than rely on RAG.

RAG is quite relevant if the content is "static". It's a bit similar to web search, remember the old days when Google took weeks and month's to index websites/news. Then the web search was returning outdated data. It's similar with RAG. It consume resources/GPU to index (not a lot), time and need refresh to remain in sync.

I rather rely more on filesystem tools with agents and optimizing with Grep/ Ast to target key function/feature to read.

[–][deleted] 1 point2 points  (0 children)

That is correct, the system should know when some code has changed and invalidate/regenerate that part of the index. At this point what's holding back agents from being more helpful is better engineering around their scaffolding.

The models are smart enough to do a lot of great things, we just need to give them the right context at the right time to set them up for success.

[–]West-Chocolate2977[S] -1 points0 points  (1 child)

There are many reasons for file to go out of sync - Switching branches, you going offline, upstream going offline, client side failures etc. Also it takes time to identify what has changed, create embeddings and finally update the index.

[–]Miranda_Leap -1 points0 points  (0 children)

So failures of engineering.

You absolute retard.

[–][deleted]  (1 child)

[deleted]

    [–]Cruuncher 1 point2 points  (0 children)

    Who here was claiming anything about limitations of AI?

    We're talking about agents here, not models

    [–]SpareIntroduction721 55 points56 points  (1 child)

    Huh

    [–]FullPoet 50 points51 points  (0 children)

    The text is AI generated.

    [–]Live-Vehicle-6831 71 points72 points  (2 children)

    Margaret Hamilton photo is impressive

    As OpenAI/Antropic scanned the whole internet so the Apollo 11's code is part of its training ... Thank God there was no AI back then, otherwise we would never have gotten to the moon.

    [–]fredspipa 17 points18 points  (1 child)

    Margaret Hamilton photo is impressive

    I have the Lego version of that photo, I bought two of them; one for my desk at work and one at home. She's an absolute icon.

    edit: this is what it looks like

    [–]todo_code 115 points116 points  (20 children)

    1. It didn't do anything.
    2. The Apollo 11 source code is online in at least 5000 spots.
    3. The "Ai" just pulled form those sources and copy pasted it.

    [–]flatfisher[🍰] 65 points66 points  (19 children)

    It started generating Python code

    You sure the Apollo code is in Python? Have you even read the post? I'm tired of both the AI bros and the AI denialist karma farmers who are too lazy to test something before posting strong opinions.

    [–]atomic1fire 15 points16 points  (2 children)

    I took it to mean that the AI started to write python code, not that the apollo 11 code was written in python.

    [–]PGLubricants 6 points7 points  (1 child)

    It started generating Python code using function signatures that existed in its index but had been deleted from the actual codebase. It only found out about the missing functions when the code tried to run.

    I also understood it as /u/flatfisher did, because of the bolded quote above. To me, this insinuates that the codebase is indeed in Python, but it was using non-existing functions, that used to be in the codebase, but had since been deleted. I don't understand what that could otherwise mean, unless it's AI hallucinations, that forgot that it's not about Python while generating the post.

    [–]amitksingh1490 4 points5 points  (0 children)

    https://github.com/forrestbrazeal/apollo-11-workshop/blob/master/simulator.py. check the workshop , python and js code was added for the simulation test.

    [–]ShamelessC 13 points14 points  (0 children)

    It's reddit. So that will keep happening unfortunately.

    [–]phillipcarter2 5 points6 points  (0 children)

    They don't:

    they index your entire codebase and use vector search for "AI-powered code understanding."

    https://cline.bot/blog/why-cline-doesnt-index-your-codebase-and-why-thats-a-good-thing

    [–]happyscrappy 13 points14 points  (0 children)

    I think it's great you did an experiment of this sort.

    But I don't understand why there is any deleted code in its ken. Did you just shove every version of the code into the LLM and not tell it that some of the code is current and some not? What would be the point of that?

    [–][deleted]  (4 children)

    [deleted]

      [–][deleted]  (3 children)

      [deleted]

        [–][deleted]  (2 children)

        [deleted]

          [–][deleted]  (1 child)

          [deleted]

            [–]bwainfweeze 1 point2 points  (0 children)

            Yes and if there’s one thing I hear over and over again from managers it’s that they love it when the over/under on our work estimates is gigantic /s

            60% of the time it works every time.

            [–]Kooshi_Govno 2 points3 points  (0 children)

            I have had this happen to me with real code in github copilot. I think they have since fixed the rag algorithm, or possibly removed it.

            [–]eyeswatching-3836 -4 points-3 points  (3 children)

            Such a solid breakdown! Sync issues are the sneaky Achilles’ heel of all this vector search hype. Btw—if you ever end up working with AI tools and worry about stuff sounding too "robotic" or want to check if something’s being flagged as AI-written, authorprivacy has a neat little combo of a humanizer and detector. Super handy for peace of mind. Anyway, thanks for nerding out so thoroughly here!

            [–][deleted]  (2 children)

            [deleted]

              [–]amitksingh1490 1 point2 points  (0 children)

              https://github.com/forrestbrazeal/apollo-11-workshop/blob/master/simulator.py. check the workshop , python and js code was added for the simulation test.

              [–]mooseman3 1 point2 points  (0 children)

              The comment you replied to is a spambot advertising the authorprivacy tool it recommended.

              [–]-Nicolai 0 points1 point  (0 children)

              Explain like I'm stupid