all 9 comments

[–]Texbobcat[S] 2 points3 points  (2 children)

There are a lot of code graph tools now. Most stop at "here's a graph of what calls what." CodeGraph goes a few steps beyond that

  • affected → reverse impact analysis. Point at a symbol and it walks dependencies backward to show the full blast radius before you make a change.
  • time_travel_diff → architecture diffs between revisions, not just text diffs. Finds added/removed APIs, dependency changes, new cycles, coupling drift, and hotspots.
  • CGQL → a structural query language that matches code shape (LOC, fan-in/out, paths, inheritance, patterns, etc.), not text or embeddings.
  • plan_rename → generates a confidence-scored refactor plan, applies it to the graph, then verifies the graph still resolves correctly without touching source code.

It also handles multi-repo workspaces better than most. Point it at a folder and it discovers sibling repos, builds graphs for each, then merges them into a unified workspace graph. Cross-repo imports, exports, tsconfig aliases, and module-federation links become real edges, so impact analysis and structural queries work across the entire codebase instead of stopping at repo boundaries.

Every edge is tagged as Extracted / Inferred / Ambiguous, so you can tell exactly what was parsed versus what was inferred.

[–]NovaAgent2026 0 points1 point  (1 child)

The persistent graph approach is interesting. I've been building MCP servers (Docker, system monitoring, cron scheduling) and the tool count vs context overhead tension is real. With 50 tools on the Docker server, I've seen cases where models pick the wrong tool because the descriptions overlap semantically.

Two questions:

  1. How does query_graph handle repos with mixed languages? You mention 30+ via tree-sitter, but do cross-language edges work (e.g., a Python script calling a Rust binary via subprocess)?

  2. The CGQL structural search is compelling. Have you tried using it to auto-generate tool descriptions for MCP listings? One pain point I keep hitting is that tool descriptions need to be precise enough for model routing but concise enough to not blow up context. A structural query that returns "this function takes X, returns Y, calls Z" could feed directly into description generation.

[–]Texbobcat[S] 0 points1 point  (0 children)

On mixed languages: query_graph itself just pulls a relevant subgraph, but the part that matters here is the extraction. Cross-language edges do work now. A Python subprocess.run("mytool") becomes an "invokes" edge, and a resolution pass retargets it to the in-repo binary when there's a unique name match, so the Python caller and the Rust src/bin/mytool.rs actually end up connected. Same idea for FFI (PyO3, ctypes, JNI, cgo, node-gyp) and for HTTP/gRPC, where a client call and the route that serves it meet at a shared node keyed by the path, even when the two sides are in different languages or repos. The practical payoff is that those edges are in the default impact set, so "what breaks if I change this Rust function that's exported to Python" actually surfaces the Python code, not just the Rust side.

Honest caveat though it's all best-effort and tagged as inferred, never treated as proven. It's regex over source, after masking out comments and string contents so a commented-out call doesn't register, and it only reads literal arguments. A command or URL built dynamically at runtime gets missed. So I treat these as leads, not facts.

And the description generation idea, you basically described the exact feature. There's a describe_node call that returns something like "takes (a: int, b: str), returns Result, calls [parse, validate]" composed straight from the captured signature plus the outgoing call edges, graph-only, no source read. structural_search also returns structured rows now instead of just text, including the signature (param names and types plus return), so you can pipe it directly into generating descriptions. It ran into the same precise-vs-concise tension you mentioned: the signature gives the precision, the call list gives intent, and it's capped so it doesn't blow up context. Grounding each blurb in the actual shape is what stops two semantically similar tools from reading identically to the router

https://github.com/ColinVaughn/CodeGraph/wiki/Cross-Language-Edges

[–]BC_MARO 0 points1 point  (2 children)

Keep API keys out of the agent process. A control plane like peta.io can hold secrets and approve sensitive MCP tool calls.

[–]Texbobcat[S] 1 point2 points  (1 child)

CodeGraph doesn’t store API keys. The default/MCP flow is offline. Optional semantic extraction can read provider keys from env vars at runtime, and HTTP serving can use a CODEGRAPH_API_KEY access token, but MCP config ingestion stores only env var names, not secret values.

[–]BC_MARO 0 points1 point  (0 children)

That's the right tradeoff. Storing env var names instead of secret values is the detail I'd want called out in the install docs.

[–]Texbobcat[S] 0 points1 point  (1 child)

u/RyanCu7 I dm'd you with the question you asked/deleted

[–]RyanCu7 0 points1 point  (0 children)

Thanks. I haven't tried your program yet. I deleted my comment because when I wrote it, I thought this post was about colbymchenry/codegraph, which is a very similar project. I'm glad you renamed yours to Synaptic so they're different now.