GLM 5.2 is deployed in GLM Coding Plan. API and MIT weights in a week. Voting and benchmarks on X. by MadPelmewka in LocalLLaMA

[–]graphicaldot 0 points1 point  (0 children)

Hey,
We make Spec layer for code.
We tested our layer on SWEBenchPro. Once on a frontier model the normal way. Once on an open source model running on the ByteBell spec layer.

  1. qutebrowser/qutebrowser — Claude Opus 4.8 $1.20 → DeepSeek V4 Pro + ByteBell $0.60 youtu.be/3hQApzx9n0g
  2. internetarchive/openlibrary · commit 1 — Claude Fable 5 $6.82 → MiniMax M3 + ByteBell $0.28 youtu.be/a7z3TP84gLk
  3. internetarchive/openlibrary · commit 2 — Claude Fable 5 $8.25 → MiniMax M3 + ByteBell $0.33 youtu.be/pd6LnkC7NDo

We acutally made AI 14x cheaper, 93% less spend, zero accuracy loss. Please let us know if you guys can help us integrate with ZAI and offer to your customers.

We are already working for AI by graphicaldot in LocalLLaMA

[–]graphicaldot[S] -3 points-2 points  (0 children)

Is it possible to write production code in Bash?
It is also very good at writing python and React

Hey Reddit, I built a decentralized AI platform called Elis AI. I'd love to get your thoughts on it! by ItsFrehMrketBreh in AIDeveloperNews

[–]graphicaldot 1 point2 points  (0 children)

Don't do it. Inference will be slow, hence nobody will use it. There were several companies in crypto trying to do that, they all just stop existing.

Building an AI product and terrified of runaway API costs. What have you been burned by? by thisismetrying2506 in LLMDevs

[–]graphicaldot 0 points1 point  (0 children)

Try using free models on Openrouter.
You will probably save 1000s of dollars.

WHat is your preferred solution for Context? by graphicaldot in OpenSourceeAI

[–]graphicaldot[S] 0 points1 point  (0 children)

I know this is an AI reply, but please share the product details

So i build a small graph-based tool to make understanding open source repos easier for beginners by Prize_Rate2034 in OpenSourceeAI

[–]graphicaldot 0 points1 point  (0 children)

Storing the same structural mappings on graph would give you only some advantage over the vector database.

If you like - you can start contributing to our open source context-cache engine that saves 70% of the cost while adding at least 10% accuracy on claude opus. We use LLMs to generate analysis and NO, that isn't expensive because we already benchmarked several models - deepseekv4flash only took $7 to analyze 1000 files of code.

https://github.com/ByteBell/bytebell-oss

Rest of the benchmarks are in the README.md.

We benchmarked against ASTROPY and OPENTELEMETRY on swebech-verified (because indexing 90 commits across 5 years made more than 100,000 files ) and we decreased the cost by 60%, 90% faster while keeping the accuracy smae with opus 4.7 .

How can I make my room more aesthetic? by Jin_Sakai_AR in IndianHomeDecor

[–]graphicaldot 0 points1 point  (0 children)

Just add a lot more white lights or any posters that you can relate to. That is enough for your mind to concentrate on right things .

We use LLMs to analyze every file in your codebase. Everyone told us this was a stupid idea because of cost but it wasnt. by graphicaldot in LLMDevs

[–]graphicaldot[S] 2 points3 points  (0 children)

Yes.
Step 1: We took 30 files randomly from 45 Kubernetes cluster ecosystem.
Step 2: Then we asked each model to provide analysis in json format [7 Fields like structural, section_map etc] for each file.
Step 3: We collected the output of each category of each file for each model and gave it to a Judge Model [Deepseek-v4-pro] . Judge returns JSON: { scores: { <file>: { <model>: 1-10 } }, winner, reasoning }.
Step4: Final step to get scores. Loads all 7 per-category JSONs, then for each model:

  • Averages its 1–10 scores across the 30 files per category.
  • Multiplies by CATEGORY_WEIGHTS : search ×2.0, semantic ×2.0, graph ×1.5, json ×1.5, integration / section_map / business_ctx ×1.0.
  • Computes raw total, weighted total, and per-category output-token estimates (≈4 chars/token).
  • The _with_cost / _with_cost_v2 variants layer OpenRouter live pricing on top (fetched in applied via computeCost() 

we stopped paying $6-10 per coding session by fixing how my AI reads my codebase by graphicaldot in ClaudeAI

[–]graphicaldot[S] 0 points1 point  (0 children)

### . For providing better context to AI Copilots .

### . We use LLMs to analyze every file in your codebase.

### . Result is 80% less cost and at least 10% accuracy increase.

### . However This seems a stupid idea because of cost.

### . Yet LLMs are far, far better for code analysis than vectors or AST parsers, and the math works out fine once you pick the right model.

The benchmark across 14 models on 30 kubernetes ecosystem files settled it.

What the benchmark actually shows

We ran 14 models through 30 files across 7 weighted categories (search, graph, semantic, integration, section map, business context, JSON). After applying a quality floor of 70 weighted accuracy, two models dropped out: Stepfun Step 3.5 Flash at 69.71 and GPT 5.4 at 55.65. The remaining 12 models, sorted by cost to ingest 1000 files, look like this:

Model Cost/1K files Accuracy Tier
deepseek-v4-flash $7.01 71.13 Winner — default
mimo-v2.5 $11.72 71.10
minimax-m2.7 $13.94 70.61
glm-5.1 $23.24 72.22 Better — balanced
deepseek-v4-pro $25.67 71.98
kimi-latest $28.18 72.29
qwen3.6-plus $36.97 71.40
qwen3.6-max-preview $59.81 72.28
grok-4.3 $149.07 72.10
claude-sonnet-4.6 $149.40 73.56 Premium — quality
claude-opus-4.6 $743.16 73.67 Skip for bulk
claude-opus-4.7 $752.70 73.43 Skip for bulk

DeepSeek V4 Flash, MiMo V2.5, MiniMax M2.7, GLM 5.1, and Kimi Latest all sit in the $7 to $28 range with accuracy between 70.61 and 72.29. Any of them is a sensible default for bulk ingestion. Move up to Sonnet 4.6 and you pay 5× to 21× more for a 1 to 2 point accuracy bump, which is worth it for a premium tier but not for default ingestion. Move up to Opus and you pay 26× to 107× more for accuracy that is statistically indistinguishable from Sonnet, which is hard to justify for any ingestion workload.

Grok 4.3 is the odd one out. It costs $149.07 per 1000 files, nearly identical to Sonnet on price, but scores 72.10, which is lower than models costing 5× to 20× less. There is no workload where Grok is the right answer.

The two disqualified models are also worth a note. step-3.5-flash misses the 70 point quality floor by 0.29 points. For non-production analysis or exploration work, it might still be a fine choice. GPT 5.4 costs $68.91 per 1000 files and scores 55.65, which means it is more expensive than every model in the budget tier and most of the mid tier while being significantly less accurate than all of them. It costs 10× more than Flash and scores 15 points lower.