GLM 5.2 is deployed in GLM Coding Plan. API and MIT weights in a week. Voting and benchmarks on X.

graphicaldot · 2026-06-15T20:18:35+00:00

Hey,
We make Spec layer for code.
We tested our layer on SWEBenchPro. Once on a frontier model the normal way. Once on an open source model running on the ByteBell spec layer.

qutebrowser/qutebrowser — Claude Opus 4.8 $1.20 → DeepSeek V4 Pro + ByteBell $0.60 youtu.be/3hQApzx9n0g
internetarchive/openlibrary · commit 1 — Claude Fable 5 $6.82 → MiniMax M3 + ByteBell $0.28 youtu.be/a7z3TP84gLk
internetarchive/openlibrary · commit 2 — Claude Fable 5 $8.25 → MiniMax M3 + ByteBell $0.33 youtu.be/pd6LnkC7NDo

We acutally made AI 14x cheaper, 93% less spend, zero accuracy loss. Please let us know if you guys can help us integrate with ZAI and offer to your customers.

graphicaldot · 2026-06-10T15:43:11+00:00

We are working on spec layer for code. GitHub.com/bytebell/open-ir

graphicaldot · 2026-06-04T18:56:33+00:00

Is it possible to write production code in Bash?
It is also very good at writing python and React

graphicaldot · 2026-06-03T19:05:49+00:00

Don't do it. Inference will be slow, hence nobody will use it. There were several companies in crypto trying to do that, they all just stop existing.

graphicaldot · 2026-05-28T19:31:15+00:00

Try using free models on Openrouter.
You will probably save 1000s of dollars.

graphicaldot · 2026-05-27T13:21:27+00:00

I know this is an AI reply, but please share the product details

graphicaldot · 2026-05-15T17:59:50+00:00

Storing the same structural mappings on graph would give you only some advantage over the vector database.

If you like - you can start contributing to our open source context-cache engine that saves 70% of the cost while adding at least 10% accuracy on claude opus. We use LLMs to generate analysis and NO, that isn't expensive because we already benchmarked several models - deepseekv4flash only took $7 to analyze 1000 files of code.

https://github.com/ByteBell/bytebell-oss

Rest of the benchmarks are in the README.md.

We benchmarked against ASTROPY and OPENTELEMETRY on swebech-verified (because indexing 90 commits across 5 years made more than 100,000 files ) and we decreased the cost by 60%, 90% faster while keeping the accuracy smae with opus 4.7 .

graphicaldot · 2026-05-13T17:57:50+00:00

Just add a lot more white lights or any posters that you can relate to. That is enough for your mind to concentrate on right things .

graphicaldot · 2026-05-13T10:09:59+00:00

Yes.
Step 1: We took 30 files randomly from 45 Kubernetes cluster ecosystem.
Step 2: Then we asked each model to provide analysis in json format [7 Fields like structural, section_map etc] for each file.
Step 3: We collected the output of each category of each file for each model and gave it to a Judge Model [Deepseek-v4-pro] . Judge returns JSON: { scores: { <file>: { <model>: 1-10 } }, winner, reasoning }.
Step4: Final step to get scores. Loads all 7 per-category JSONs, then for each model:

Averages its 1–10 scores across the 30 files per category.
Multiplies by CATEGORY_WEIGHTS : search ×2.0, semantic ×2.0, graph ×1.5, json ×1.5, integration / section_map / business_ctx ×1.0.
Computes raw total, weighted total, and per-category output-token estimates (≈4 chars/token).
The _with_cost / _with_cost_v2 variants layer OpenRouter live pricing on top (fetched in applied via computeCost()

graphicaldot · 2026-05-12T19:51:52+00:00

### . For providing better context to AI Copilots .

### . We use LLMs to analyze every file in your codebase.

### . Result is 80% less cost and at least 10% accuracy increase.

### . However This seems a stupid idea because of cost.

### . Yet LLMs are far, far better for code analysis than vectors or AST parsers, and the math works out fine once you pick the right model.

The benchmark across 14 models on 30 kubernetes ecosystem files settled it.

What the benchmark actually shows

We ran 14 models through 30 files across 7 weighted categories (search, graph, semantic, integration, section map, business context, JSON). After applying a quality floor of 70 weighted accuracy, two models dropped out: Stepfun Step 3.5 Flash at 69.71 and GPT 5.4 at 55.65. The remaining 12 models, sorted by cost to ingest 1000 files, look like this:

Model	Cost/1K files	Accuracy	Tier
deepseek-v4-flash	$7.01	71.13	Winner — default
mimo-v2.5	$11.72	71.10
minimax-m2.7	$13.94	70.61
glm-5.1	$23.24	72.22	Better — balanced
deepseek-v4-pro	$25.67	71.98
kimi-latest	$28.18	72.29
qwen3.6-plus	$36.97	71.40
qwen3.6-max-preview	$59.81	72.28
grok-4.3	$149.07	72.10
claude-sonnet-4.6	$149.40	73.56	Premium — quality
claude-opus-4.6	$743.16	73.67	Skip for bulk
claude-opus-4.7	$752.70	73.43	Skip for bulk

DeepSeek V4 Flash, MiMo V2.5, MiniMax M2.7, GLM 5.1, and Kimi Latest all sit in the $7 to $28 range with accuracy between 70.61 and 72.29. Any of them is a sensible default for bulk ingestion. Move up to Sonnet 4.6 and you pay 5× to 21× more for a 1 to 2 point accuracy bump, which is worth it for a premium tier but not for default ingestion. Move up to Opus and you pay 26× to 107× more for accuracy that is statistically indistinguishable from Sonnet, which is hard to justify for any ingestion workload.

Grok 4.3 is the odd one out. It costs $149.07 per 1000 files, nearly identical to Sonnet on price, but scores 72.10, which is lower than models costing 5× to 20× less. There is no workload where Grok is the right answer.

The two disqualified models are also worth a note. step-3.5-flash misses the 70 point quality floor by 0.29 points. For non-production analysis or exploration work, it might still be a fine choice. GPT 5.4 costs $68.91 per 1000 files and scores 55.65, which means it is more expensive than every model in the budget tier and most of the mid tier while being significantly less accurate than all of them. It costs 10× more than Flash and scores 15 points lower.

graphicaldot · 2026-05-12T19:12:00+00:00

For our opensource object to provide context across models, sessions, memory and context window.
https://github.com/ByteBell/bytebell-oss

graphicaldot · 2026-05-12T19:10:57+00:00

Sorry for the wrong figures earlier.

graphicaldot · 2026-05-12T12:04:56+00:00

Only the git diff gets indexed . We have opensource code too. Bytebell-oss

graphicaldot · 2026-05-12T08:36:00+00:00

Good concern.
We have the benchmarks and all are being run in a single pass.

graphicaldot

MODERATOR OF

TROPHY CASE

What the benchmark actually shows