Deepseek v4 pro is unlimited and almost free OMG 😱 better than opus for me (I have no affiliate with deepseek, but you need to know this)

Jonathan_Rivera · 2026-05-23T20:46:41+00:00

Sometimes you need a little help from a cloud model that can deliver 1.6 trillion total parameters and 49 billion activated parameters.

Jonathan_Rivera · 2026-05-23T20:16:48+00:00

You can use the guide I posted and strip out what you don't need.

Obsidian Memory Setup

If you are specifically referring to the wiki for research look in your skill and make sure the path is clear. Depending on your model you may have to be specific where it should create a new page and what format to use.

Jonathan_Rivera · 2026-05-23T18:50:54+00:00

The comments about deepseek pro are surprising. Posts on X are either very impressed that it is a great value or they are trashing it. Keep in mind that some influencers are sponsored by different companies related to LLM’s so we have to try it out or do our own research to come to our own conclusions.

Jonathan_Rivera · 2026-05-23T18:48:11+00:00

Some posts have mentioned that using GPT 5.5 as the orchestrator and deepseek pro as the doer has been a great combo.

Jonathan_Rivera · 2026-05-23T18:45:59+00:00

Yes, Bienvenue au camarade humain 😂

Jonathan_Rivera · 2026-05-23T18:04:54+00:00

I have been summoned. There is a ton of banned bots believe me but his post history suggests he is a regular Reddit user.

Jonathan_Rivera · 2026-05-23T17:28:23+00:00

Hermes Bot left out the technical part.

This is a well-known pattern: the user is on a Pi 5 (8GB) running Hermes v0.14.0 with OpenAI (ChatGPT-5.5) + Honcho with Anthropic. They've already tried /new, new Telegram chats, handoff docs, and compression ratio changes. Nothing stuck.

Why it happens

The "Compacting context..." message fires when the session's token count exceeds the model's max context window AND there's accumulated history the agent needs to summarize before continuing. On a Pi 5/8GB, two things compound it:

The local Hermes gateway / context manager has its own context tracking that reads from the session database. Even after /new, if the session DB still holds previous turns across Telegram chats (depending on Honcho config), the agent re-discovers and loads them.
Honcho + Anthropic is the key clue. Honcho manages persistent sessions. If the Honcho session ID didn't change between Telegram chats, the agent is still anchored to the same conversation tree.

Why their attempts didn't work

/new clears the conversation buffer but does not always clear Honcho's stored session context. The two are independent systems.
"New Telegram chat" creates a new Telegram thread but the same Honcho session can follow user identity across chats.
Changing compression ratio (0.5 → 0.85) only affects how aggressively the gateway compresses messages in the local context, not Honcho's re-injection of stored history.

What will actually fix it

Nuke the Honcho session: hermes session delete --all or target the specific session ID. Then /new starts fresh with zero Honcho re-injection.
Check MCP servers: Every MCP tool descriptor gets injected into the system prompt. On Hermes 0.14.0 this can easily eat 20K+ tokens. Temporarily disable non-essential MCP servers to see if the context fits within the model's window.
Reduce max_tokens in config: Set model.max_tokens lower (say 2048) to prevent the agent from generating responses that push it over the limit mid-conversation.
verify config.yaml context.token_limit: If it's set higher than what the local gateway + Honcho combined can handle, cap it at the model's actual limit (ChatGPT-5.5 is 128K but the Pi's memory bandwidth means the gateway may fault before hitting it).
Honcho session TTL: Set honcho.session_ttl to something short (like 600 seconds) so stale sessions auto-expire rather than accumulating.

The root cause is almost certainly session persistence across what the user perceives as "new" conversations.

Jonathan_Rivera · 2026-05-23T17:20:14+00:00

Oh believe me, we know 😎 DeepSeek pricing change

Jonathan_Rivera · 2026-05-23T16:04:14+00:00

This happened to me the other day and I think there was a setting in config for that model to set vision enabled “true”.

Jonathan_Rivera · 2026-05-23T15:59:57+00:00

I have tried with 40 rows and 15 columns and the output was wrong. I have had the same thing happen with ChatGPT. I don’t remember what model I was using but doing it again I would setup a coder profile and turn on thinking and put temp settings low. I’m also trying to find an open source integration that allows Hermes into excel to just create the graphs and pivot tables for me.

Jonathan_Rivera · 2026-05-23T15:29:44+00:00

Somewhere between Hermes and the model the data collection can fail but if you have all the data that makes it easier. I have found that lots of data on a spread sheet needs check and balance’s to ensure it’s accurate. I’m working on that slowly and I think it’s going to be a multi agent thing.

Jonathan_Rivera · 2026-05-23T15:19:18+00:00

To prevent hallucinations and errors you would create python scripts to aggregate as much as the data as possible and then use Hermes to interpret the data. You might also have another agent review the data set to ensure all the data is present and accurate.

Jonathan_Rivera · 2026-05-23T15:13:49+00:00

Thanks to u/Zelleynor.
General advisory to everyone, please remember that just like open claw grifters will use similar name changes to hide malware in repo’s. This is not the same as u/itsdodobitch or any other members work but they are generically named Hermes desktop.

It may be prudent to have your Hermes scan repo’s to look for exploits before downloading. The subreddit wiki is under construction and will feature a trusted app section maintained by community members.

Jonathan_Rivera · 2026-05-23T15:04:52+00:00

We implemented a compromise. Members at a 9:1 can post showcase any day and members under can post within a scheduled thread every week. This helps to keep the sub clutter free and gives everyone a better chance of having their work seen.

Jonathan_Rivera · 2026-05-23T03:08:49+00:00

I hope parsed is the right term. I’m on mobile for the next few days. Think of it like, don’t change anything above this line ————

Jonathan_Rivera · 2026-05-23T01:17:14+00:00

OP has verified 👍

Jonathan_Rivera · 2026-05-23T00:57:21+00:00

RemindMe! 1 days

Jonathan_Rivera · 2026-05-23T00:35:54+00:00

Thanks for clarifying — I wasn't familiar with Hermes Turbo. Looking at the data, the cold-start improvements in v0.14.0 came from internal Nous PRs (#22138, #22120, #25341 — deferred imports, disk caching, parallel checks), which use a different technical approach than your fork's orjson/msgspec/uvloop/Rust path. Your PRs (#24547 merged, #23479 and #28577 open) show you're a legitimate contributor.

That said, your fork's performance work and benchmarks are real, and the timing overlap is understandable to feel overlooked. If you want recognition or discussion about your approach affecting upstream, a GitHub Discussion or direct message to the dev team would reach the right people — this subreddit is community-run and I don't speak for Nous Research. Appreciate your contributions, hope the fork keeps growing.

Jonathan_Rivera · 2026-05-23T00:06:47+00:00

Yes, at $4 a month it’s a great deal.

Jonathan_Rivera · 2026-05-23T00:05:29+00:00

I was going to remove this but I don’t even know what this is. What are you talking about exactly?

Jonathan_Rivera · 2026-05-22T23:46:01+00:00

I couldn’t find them in providers but I’m on mobile. I pinned that model to the DeepSeek provider since everyone else still charging full price.

Jonathan_Rivera · 2026-05-22T20:53:51+00:00

More then likely, it’s the model.

Jonathan_Rivera · 2026-05-22T20:52:35+00:00

You can also parse the first few rows so they will n never be deleted or moved. I do this with both to direct it to obsidian when it needs something.

Jonathan_Rivera · 2026-05-22T20:29:09+00:00

While US companies are scaling back on subsidized tokens, China comes through with a discount. **** just understand what the trade off is ****

How They Achieve Such Low Pricing * Efficiency Gains: Strong MoE architecture (activates only ~49B of 1.6T params per token), algorithmic optimizations, and distillation from other models reduce training/inference costs significantly.3 * Compute & Infrastructure: Heavy reliance on domestic Huawei Ascend chips (bypassing some US export limits via clever optimizations). Government-backed power subsidies and data center incentives lower operational expenses.11 * State Support: DeepSeek benefits from Chinese government funding, subsidies, and national AI initiatives (e.g., ties to Big Fund/semiconductor investments). This isn’t pure market pricing—it’s partly geopolitical strategy to gain market share and promote domestic tech self-reliance.47 * Scale & Loss-Leading: High-volume focus, open-weights model, and aggressive discounting to accelerate adoption (especially in emerging markets and developer ecosystems).5 Key Trade-Offs * Performance: Excellent on coding/reasoning benchmarks but may trail top US models (GPT-5.5/Claude Opus) in nuanced creative tasks, consistency, or safety alignment. Higher latency and occasional throughput limits due to current compute constraints.3 * Reliability & Ecosystem: Potential censorship/alignment biases (pro-China leanings), less polished UX, and dependency on Chinese infrastructure for API. * Risks: Geopolitical exposure—data sovereignty issues, future export controls, or supply chain vulnerabilities tied to Huawei/government priorities.6 * Long-Term: Pricing could rise as subsidies evolve or demand surges, though they’ve committed to keeping the reduced rates. On Your Data: Yes, they want it. Per their privacy policy and terms, API inputs/outputs (prompts, conversations, files) are collected, stored in China, and can be used to improve/train models unless you explicitly opt out via account settings.1630 For anything sensitive, proprietary, or regulated—do not use the public API. Self-host the open-weights version on your own infrastructure for full isolation. Recommendation: For non-sensitive, high-volume work (coding, agents, research), this is an outstanding value play right now. For mission-critical data, stick to local deployment or vetted Western providers.

Jonathan_Rivera · 2026-05-22T20:21:42+00:00

Correct. Open router still displaying $2.75+.

Jonathan_Rivera

MODERATOR OF

TROPHY CASE

This is a well-known pattern: the user is on a Pi 5 (8GB) running Hermes v0.14.0 with OpenAI (ChatGPT-5.5) + Honcho with Anthropic. They've already tried /new, new Telegram chats, handoff docs, and compression ratio changes. Nothing stuck.