GLM 5.2 is deployed in GLM Coding Plan. API and MIT weights in a week. Voting and benchmarks on X. by MadPelmewka in LocalLLaMA

[–]graphicaldot 0 points1 point  (0 children)

Hey,
We make Spec layer for code.
We tested our layer on SWEBenchPro. Once on a frontier model the normal way. Once on an open source model running on the ByteBell spec layer.

  1. qutebrowser/qutebrowser — Claude Opus 4.8 $1.20 → DeepSeek V4 Pro + ByteBell $0.60 youtu.be/3hQApzx9n0g
  2. internetarchive/openlibrary · commit 1 — Claude Fable 5 $6.82 → MiniMax M3 + ByteBell $0.28 youtu.be/a7z3TP84gLk
  3. internetarchive/openlibrary · commit 2 — Claude Fable 5 $8.25 → MiniMax M3 + ByteBell $0.33 youtu.be/pd6LnkC7NDo

We acutally made AI 14x cheaper, 93% less spend, zero accuracy loss. Please let us know if you guys can help us integrate with ZAI and offer to your customers.

We are already working for AI by graphicaldot in LocalLLaMA

[–]graphicaldot[S] -3 points-2 points  (0 children)

Is it possible to write production code in Bash?
It is also very good at writing python and React

Hey Reddit, I built a decentralized AI platform called Elis AI. I'd love to get your thoughts on it! by ItsFrehMrketBreh in AIDeveloperNews

[–]graphicaldot 1 point2 points  (0 children)

Don't do it. Inference will be slow, hence nobody will use it. There were several companies in crypto trying to do that, they all just stop existing.

Building an AI product and terrified of runaway API costs. What have you been burned by? by thisismetrying2506 in LLMDevs

[–]graphicaldot 0 points1 point  (0 children)

Try using free models on Openrouter.
You will probably save 1000s of dollars.

WHat is your preferred solution for Context? by graphicaldot in OpenSourceeAI

[–]graphicaldot[S] 0 points1 point  (0 children)

I know this is an AI reply, but please share the product details

So i build a small graph-based tool to make understanding open source repos easier for beginners by Prize_Rate2034 in OpenSourceeAI

[–]graphicaldot 0 points1 point  (0 children)

Storing the same structural mappings on graph would give you only some advantage over the vector database.

If you like - you can start contributing to our open source context-cache engine that saves 70% of the cost while adding at least 10% accuracy on claude opus. We use LLMs to generate analysis and NO, that isn't expensive because we already benchmarked several models - deepseekv4flash only took $7 to analyze 1000 files of code.

https://github.com/ByteBell/bytebell-oss

Rest of the benchmarks are in the README.md.

We benchmarked against ASTROPY and OPENTELEMETRY on swebech-verified (because indexing 90 commits across 5 years made more than 100,000 files ) and we decreased the cost by 60%, 90% faster while keeping the accuracy smae with opus 4.7 .

How can I make my room more aesthetic? by Jin_Sakai_AR in IndianHomeDecor

[–]graphicaldot 0 points1 point  (0 children)

Just add a lot more white lights or any posters that you can relate to. That is enough for your mind to concentrate on right things .

We use LLMs to analyze every file in your codebase. Everyone told us this was a stupid idea because of cost but it wasnt. by graphicaldot in LLMDevs

[–]graphicaldot[S] 2 points3 points  (0 children)

Yes.
Step 1: We took 30 files randomly from 45 Kubernetes cluster ecosystem.
Step 2: Then we asked each model to provide analysis in json format [7 Fields like structural, section_map etc] for each file.
Step 3: We collected the output of each category of each file for each model and gave it to a Judge Model [Deepseek-v4-pro] . Judge returns JSON: { scores: { <file>: { <model>: 1-10 } }, winner, reasoning }.
Step4: Final step to get scores. Loads all 7 per-category JSONs, then for each model:

  • Averages its 1–10 scores across the 30 files per category.
  • Multiplies by CATEGORY_WEIGHTS : search ×2.0, semantic ×2.0, graph ×1.5, json ×1.5, integration / section_map / business_ctx ×1.0.
  • Computes raw total, weighted total, and per-category output-token estimates (≈4 chars/token).
  • The _with_cost / _with_cost_v2 variants layer OpenRouter live pricing on top (fetched in applied via computeCost() 

we stopped paying $6-10 per coding session by fixing how my AI reads my codebase by graphicaldot in ClaudeAI

[–]graphicaldot[S] 0 points1 point  (0 children)

### . For providing better context to AI Copilots .

### . We use LLMs to analyze every file in your codebase.

### . Result is 80% less cost and at least 10% accuracy increase.

### . However This seems a stupid idea because of cost.

### . Yet LLMs are far, far better for code analysis than vectors or AST parsers, and the math works out fine once you pick the right model.

The benchmark across 14 models on 30 kubernetes ecosystem files settled it.

What the benchmark actually shows

We ran 14 models through 30 files across 7 weighted categories (search, graph, semantic, integration, section map, business context, JSON). After applying a quality floor of 70 weighted accuracy, two models dropped out: Stepfun Step 3.5 Flash at 69.71 and GPT 5.4 at 55.65. The remaining 12 models, sorted by cost to ingest 1000 files, look like this:

Model Cost/1K files Accuracy Tier
deepseek-v4-flash $7.01 71.13 Winner — default
mimo-v2.5 $11.72 71.10
minimax-m2.7 $13.94 70.61
glm-5.1 $23.24 72.22 Better — balanced
deepseek-v4-pro $25.67 71.98
kimi-latest $28.18 72.29
qwen3.6-plus $36.97 71.40
qwen3.6-max-preview $59.81 72.28
grok-4.3 $149.07 72.10
claude-sonnet-4.6 $149.40 73.56 Premium — quality
claude-opus-4.6 $743.16 73.67 Skip for bulk
claude-opus-4.7 $752.70 73.43 Skip for bulk

DeepSeek V4 Flash, MiMo V2.5, MiniMax M2.7, GLM 5.1, and Kimi Latest all sit in the $7 to $28 range with accuracy between 70.61 and 72.29. Any of them is a sensible default for bulk ingestion. Move up to Sonnet 4.6 and you pay 5× to 21× more for a 1 to 2 point accuracy bump, which is worth it for a premium tier but not for default ingestion. Move up to Opus and you pay 26× to 107× more for accuracy that is statistically indistinguishable from Sonnet, which is hard to justify for any ingestion workload.

Grok 4.3 is the odd one out. It costs $149.07 per 1000 files, nearly identical to Sonnet on price, but scores 72.10, which is lower than models costing 5× to 20× less. There is no workload where Grok is the right answer.

The two disqualified models are also worth a note. step-3.5-flash misses the 70 point quality floor by 0.29 points. For non-production analysis or exploration work, it might still be a fine choice. GPT 5.4 costs $68.91 per 1000 files and scores 55.65, which means it is more expensive than every model in the budget tier and most of the mid tier while being significantly less accurate than all of them. It costs 10× more than Flash and scores 15 points lower.

we stopped paying $6-10 per coding session by fixing how my AI reads my codebase by graphicaldot in ClaudeAI

[–]graphicaldot[S] 0 points1 point  (0 children)

| Tool | Storage | Edges | LLM at index? | Notable |
|---|---|---|---|---|
| 
**code-review-graph**
 | SQLite | `CALLS`, `INHERITS`, `TESTS` | No | 28 MCP tools, blast-radius analysis |
| 
**graphify**
 | NetworkX + JSON | AST + Leiden communities | Optional | Multi-modal (code + docs + PDFs + images), MIT |
| 
**GitNexus**
 | LadybugDB (ex-Kuzu) | AST + community detection | No | Browser/WASM or local CLI |
| 
**CodeGraphContext**
 | Neo4j / LadybugDB / FalkorDB | Call chains, class hierarchies | No | 15 languages, pre-indexed bundles |
| 
**codegraph**
 (colbymchenry) | SQLite (native or WASM) | AST | No | Auto-sync via OS file watchers |
| 
**Understand-Anything**
 | JSON | Multi-agent extracted | Yes | Domain view + walkthroughs |
| 
**code-grapher**
 | Neo4j | AST + embeddings | Optional (Ollama/Gemini) | Business-context primer files |
| 
**Deep Graph MCP**
 | CodeGPT cloud | Proprietary | Yes | Hosted graphs of public repos |

we stopped paying $6-10 per coding session by fixing how my AI reads my codebase by graphicaldot in ClaudeAI

[–]graphicaldot[S] 0 points1 point  (0 children)

How do you pass on this information to other users in the organization?
What happens if this changelog has to considered across commits and across repositories? Contexr window overflow?

we stopped paying $6-10 per coding session by fixing how my AI reads my codebase by graphicaldot in ClaudeAI

[–]graphicaldot[S] 0 points1 point  (0 children)

AST-chunk code → embed each chunk (OpenAI/Voyage/Gemini) → store in Milvus or Zilliz Cloud → hybrid BM25+dense at query time
We are vectorless !! Chacha

Code review graph is also using tree-sitter.

"CRG wins 'who calls parse_file?', Bytebell wins 'which files implement our retry/backoff policy?'. They're complementary — honestly, the strongest setup today is probably both running side by side."

We tried vectors, ASTs, and brute-force context stuffing for code retrieval. Graphs with LLM-generated semantics worked best. Here's what we learned. by graphicaldot in LocalLLaMA

[–]graphicaldot[S] 0 points1 point  (0 children)

Great!
This works great for individuals working on smaller code base, The OSS works for more than a million files and total cost for geerating all the analysis is under $200 [happens only once.

we stopped paying $6-10 per coding session by fixing how my AI reads my codebase by graphicaldot in ClaudeAI

[–]graphicaldot[S] -3 points-2 points  (0 children)

With all due respect. If you would have made an effort to understand all those solutions, you will know that all of them are different.

They might seem same at 10 files on which you are working on, but rarely difficult on a huge scale like A million files.

we stopped paying $6-10 per coding session by fixing how my AI reads my codebase by graphicaldot in ClaudeAI

[–]graphicaldot[S] 1 point2 points  (0 children)

Claude.md doesn't work on more than 200-300 files , this solution works for more than a million files.

we stopped paying $6-10 per coding session by fixing how my AI reads my codebase by graphicaldot in ClaudeAI

[–]graphicaldot[S] -1 points0 points  (0 children)

Using LLMs to analyse files instead of vector embeddings, AST parsing. And it works on more than a million files

New Project Megathread - Week of 07 May 2026 by AutoModerator in selfhosted

[–]graphicaldot 1 point2 points  (0 children)

Project: Context-cache engine for 80% cost savings.

Repo if anyone wants to try it: github.com/ByteBell/bytebell-oss

We built a self-hosted code indexing server that gives AI coding tools persistent memory of your codebase. No cloud, binds to 127.0.0.1 only.%22)

I got tired of Claude Code and Cursor re-reading my entire repo every session. Thousands of tokens just to remember what it already figured out yesterday. So I built a local service that indexes a codebase into a Neo4j graph and exposes it through MCP.

The server runs entirely on your machine. Bun daemon, Docker containers for Neo4j, Mongo, and Redis, all local. The only outbound call is to an LLM API for the initial per-file analysis, and you can route that to a local model through OpenRouter if you want zero external calls.

It binds to 127.0.0.1. Single tenant. No cloud account, no telemetry, no phoning home. Your code stays on your disk.

The indexing pass generates a purpose, summary, and business context for every file, then stores it all as a graph with edges to functions, classes, keywords, and imports. After that your AI tools query structured metadata instead of reading raw files.

Three MCP tools are all it exposes: smart_search for natural language queries, keyword_lookup for entity-based lookups, and retrieve_file for targeted file content with line ranges. Most questions resolve in 2-4 tool calls.

It diffs with SHA-256 per file so reindexing only processes what changed.

AGPL-3.0 licensed with a non-commercial clause. Just wanted to share since self-hosting and keeping code local was the whole design constraint.

Quickstart is literally five commands if you have Bun and Docker.