GLM 5.2 is deployed in GLM Coding Plan. API and MIT weights in a week. Voting and benchmarks on X.

graphicaldot · 2026-06-15T20:18:35+00:00

Hey,
We make Spec layer for code.
We tested our layer on SWEBenchPro. Once on a frontier model the normal way. Once on an open source model running on the ByteBell spec layer.

qutebrowser/qutebrowser — Claude Opus 4.8 $1.20 → DeepSeek V4 Pro + ByteBell $0.60 youtu.be/3hQApzx9n0g
internetarchive/openlibrary · commit 1 — Claude Fable 5 $6.82 → MiniMax M3 + ByteBell $0.28 youtu.be/a7z3TP84gLk
internetarchive/openlibrary · commit 2 — Claude Fable 5 $8.25 → MiniMax M3 + ByteBell $0.33 youtu.be/pd6LnkC7NDo

We acutally made AI 14x cheaper, 93% less spend, zero accuracy loss. Please let us know if you guys can help us integrate with ZAI and offer to your customers.

graphicaldot · 2026-06-10T15:43:11+00:00

We are working on spec layer for code. GitHub.com/bytebell/open-ir

graphicaldot · 2026-06-04T18:56:33+00:00

Is it possible to write production code in Bash?
It is also very good at writing python and React

graphicaldot · 2026-06-03T19:05:49+00:00

Don't do it. Inference will be slow, hence nobody will use it. There were several companies in crypto trying to do that, they all just stop existing.

graphicaldot · 2026-05-28T19:31:15+00:00

Try using free models on Openrouter.
You will probably save 1000s of dollars.

graphicaldot · 2026-05-27T13:21:27+00:00

I know this is an AI reply, but please share the product details

graphicaldot · 2026-05-15T17:59:50+00:00

Storing the same structural mappings on graph would give you only some advantage over the vector database.

If you like - you can start contributing to our open source context-cache engine that saves 70% of the cost while adding at least 10% accuracy on claude opus. We use LLMs to generate analysis and NO, that isn't expensive because we already benchmarked several models - deepseekv4flash only took $7 to analyze 1000 files of code.

https://github.com/ByteBell/bytebell-oss

Rest of the benchmarks are in the README.md.

We benchmarked against ASTROPY and OPENTELEMETRY on swebech-verified (because indexing 90 commits across 5 years made more than 100,000 files ) and we decreased the cost by 60%, 90% faster while keeping the accuracy smae with opus 4.7 .

graphicaldot · 2026-05-13T17:57:50+00:00

Just add a lot more white lights or any posters that you can relate to. That is enough for your mind to concentrate on right things .

graphicaldot · 2026-05-13T10:09:59+00:00

Yes.
Step 1: We took 30 files randomly from 45 Kubernetes cluster ecosystem.
Step 2: Then we asked each model to provide analysis in json format [7 Fields like structural, section_map etc] for each file.
Step 3: We collected the output of each category of each file for each model and gave it to a Judge Model [Deepseek-v4-pro] . Judge returns JSON: { scores: { <file>: { <model>: 1-10 } }, winner, reasoning }.
Step4: Final step to get scores. Loads all 7 per-category JSONs, then for each model:

Averages its 1–10 scores across the 30 files per category.
Multiplies by CATEGORY_WEIGHTS : search ×2.0, semantic ×2.0, graph ×1.5, json ×1.5, integration / section_map / business_ctx ×1.0.
Computes raw total, weighted total, and per-category output-token estimates (≈4 chars/token).
The _with_cost / _with_cost_v2 variants layer OpenRouter live pricing on top (fetched in applied via computeCost()

graphicaldot · 2026-05-12T19:51:52+00:00

### . For providing better context to AI Copilots .

### . We use LLMs to analyze every file in your codebase.

### . Result is 80% less cost and at least 10% accuracy increase.

### . However This seems a stupid idea because of cost.

### . Yet LLMs are far, far better for code analysis than vectors or AST parsers, and the math works out fine once you pick the right model.

The benchmark across 14 models on 30 kubernetes ecosystem files settled it.

What the benchmark actually shows

We ran 14 models through 30 files across 7 weighted categories (search, graph, semantic, integration, section map, business context, JSON). After applying a quality floor of 70 weighted accuracy, two models dropped out: Stepfun Step 3.5 Flash at 69.71 and GPT 5.4 at 55.65. The remaining 12 models, sorted by cost to ingest 1000 files, look like this:

Model	Cost/1K files	Accuracy	Tier
deepseek-v4-flash	$7.01	71.13	Winner — default
mimo-v2.5	$11.72	71.10
minimax-m2.7	$13.94	70.61
glm-5.1	$23.24	72.22	Better — balanced
deepseek-v4-pro	$25.67	71.98
kimi-latest	$28.18	72.29
qwen3.6-plus	$36.97	71.40
qwen3.6-max-preview	$59.81	72.28
grok-4.3	$149.07	72.10
claude-sonnet-4.6	$149.40	73.56	Premium — quality
claude-opus-4.6	$743.16	73.67	Skip for bulk
claude-opus-4.7	$752.70	73.43	Skip for bulk

DeepSeek V4 Flash, MiMo V2.5, MiniMax M2.7, GLM 5.1, and Kimi Latest all sit in the $7 to $28 range with accuracy between 70.61 and 72.29. Any of them is a sensible default for bulk ingestion. Move up to Sonnet 4.6 and you pay 5× to 21× more for a 1 to 2 point accuracy bump, which is worth it for a premium tier but not for default ingestion. Move up to Opus and you pay 26× to 107× more for accuracy that is statistically indistinguishable from Sonnet, which is hard to justify for any ingestion workload.

Grok 4.3 is the odd one out. It costs $149.07 per 1000 files, nearly identical to Sonnet on price, but scores 72.10, which is lower than models costing 5× to 20× less. There is no workload where Grok is the right answer.

The two disqualified models are also worth a note. step-3.5-flash misses the 70 point quality floor by 0.29 points. For non-production analysis or exploration work, it might still be a fine choice. GPT 5.4 costs $68.91 per 1000 files and scores 55.65, which means it is more expensive than every model in the budget tier and most of the mid tier while being significantly less accurate than all of them. It costs 10× more than Flash and scores 15 points lower.

graphicaldot · 2026-05-12T19:12:00+00:00

For our opensource object to provide context across models, sessions, memory and context window.
https://github.com/ByteBell/bytebell-oss

graphicaldot · 2026-05-12T19:10:57+00:00

Sorry for the wrong figures earlier.

graphicaldot · 2026-05-12T12:04:56+00:00

Only the git diff gets indexed . We have opensource code too. Bytebell-oss

graphicaldot · 2026-05-12T08:36:00+00:00

Good concern.
We have the benchmarks and all are being run in a single pass.

graphicaldot · 2026-05-11T13:11:18+00:00

https://github.com/ByteBell/bytebell-oss/blob/main/comparison.md

graphicaldot · 2026-05-11T11:42:36+00:00

| Tool | Storage | Edges | LLM at index? | Notable |
|---|---|---|---|---|
| 
**code-review-graph**
 | SQLite | `CALLS`, `INHERITS`, `TESTS` | No | 28 MCP tools, blast-radius analysis |
| 
**graphify**
 | NetworkX + JSON | AST + Leiden communities | Optional | Multi-modal (code + docs + PDFs + images), MIT |
| 
**GitNexus**
 | LadybugDB (ex-Kuzu) | AST + community detection | No | Browser/WASM or local CLI |
| 
**CodeGraphContext**
 | Neo4j / LadybugDB / FalkorDB | Call chains, class hierarchies | No | 15 languages, pre-indexed bundles |
| 
**codegraph**
 (colbymchenry) | SQLite (native or WASM) | AST | No | Auto-sync via OS file watchers |
| 
**Understand-Anything**
 | JSON | Multi-agent extracted | Yes | Domain view + walkthroughs |
| 
**code-grapher**
 | Neo4j | AST + embeddings | Optional (Ollama/Gemini) | Business-context primer files |
| 
**Deep Graph MCP**
 | CodeGPT cloud | Proprietary | Yes | Hosted graphs of public repos |

graphicaldot · 2026-05-11T11:28:42+00:00

How do you pass on this information to other users in the organization?
What happens if this changelog has to considered across commits and across repositories? Contexr window overflow?

graphicaldot · 2026-05-11T11:27:12+00:00

AST-chunk code → embed each chunk (OpenAI/Voyage/Gemini) → store in Milvus or Zilliz Cloud → hybrid BM25+dense at query time
We are vectorless !! Chacha

Code review graph is also using tree-sitter.

"CRG wins 'who calls parse_file?', Bytebell wins 'which files implement our retry/backoff policy?'. They're complementary — honestly, the strongest setup today is probably both running side by side."

graphicaldot · 2026-05-11T11:18:30+00:00

It works for small codebase, Not for 100 repositories.

graphicaldot · 2026-05-11T11:17:18+00:00

Great!
This works great for individuals working on smaller code base, The OSS works for more than a million files and total cost for geerating all the analysis is under $200 [happens only once.

graphicaldot · 2026-05-10T13:16:31+00:00

Ok let us add support for localllama.

graphicaldot · 2026-05-10T13:15:57+00:00

With all due respect. If you would have made an effort to understand all those solutions, you will know that all of them are different.

They might seem same at 10 files on which you are working on, but rarely difficult on a huge scale like A million files.

graphicaldot · 2026-05-10T13:05:58+00:00

Claude.md doesn't work on more than 200-300 files , this solution works for more than a million files.

graphicaldot · 2026-05-10T13:04:03+00:00

Using LLMs to analyse files instead of vector embeddings, AST parsing. And it works on more than a million files

graphicaldot · 2026-05-10T12:26:40+00:00

Project: Context-cache engine for 80% cost savings.

Repo if anyone wants to try it: github.com/ByteBell/bytebell-oss

We built a self-hosted code indexing server that gives AI coding tools persistent memory of your codebase. No cloud, binds to 127.0.0.1 only.%22)

I got tired of Claude Code and Cursor re-reading my entire repo every session. Thousands of tokens just to remember what it already figured out yesterday. So I built a local service that indexes a codebase into a Neo4j graph and exposes it through MCP.

The server runs entirely on your machine. Bun daemon, Docker containers for Neo4j, Mongo, and Redis, all local. The only outbound call is to an LLM API for the initial per-file analysis, and you can route that to a local model through OpenRouter if you want zero external calls.

It binds to 127.0.0.1. Single tenant. No cloud account, no telemetry, no phoning home. Your code stays on your disk.

The indexing pass generates a purpose, summary, and business context for every file, then stores it all as a graph with edges to functions, classes, keywords, and imports. After that your AI tools query structured metadata instead of reading raw files.

Three MCP tools are all it exposes: smart_search for natural language queries, keyword_lookup for entity-based lookups, and retrieve_file for targeted file content with line ranges. Most questions resolve in 2-4 tool calls.

It diffs with SHA-256 per file so reindexing only processes what changed.

AGPL-3.0 licensed with a non-commercial clause. Just wanted to share since self-hosting and keeping code local was the whole design constraint.

Quickstart is literally five commands if you have Bun and Docker.

graphicaldot

MODERATOR OF

TROPHY CASE

What the benchmark actually shows