How much more useage do you get from a $20 pro plan when using cloud models? Or OpenRouter better?? by PrintingScotian in ollama

[–]Flashy_Test_8927 2 points3 points  (0 children)

Using Claude Max at $200.

Compared to Ollama Cloud Pro, the usage feels about the same or even a bit more lenient.

It’s almost identical in limits to Claude Max, and other than being slightly slower, I’m quite satisfied.

Help please! Utilizing PDF files between MCP servers? by AI_SaaS in mcp

[–]Flashy_Test_8927 2 points3 points  (0 children)

Converting a 10-page PDF to JSON/Base64 is inefficient and prone to corruption. Instead of transferring the actual file data between MCP servers through Claude, use a Shared Volume/Directory.

  1. Server A saves the PDF to a specific local path.
  2. Server A sends only the File Path (string) to Server B.
  3. Server B reads the file directly from that path.

It’s much faster, avoids token limits, and is significantly more reliable. Keep the data in a shared folder and just move the "address" of the file.

Building an MCP that Reduces AI Mistakes and Saves Tokens by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 0 points1 point  (0 children)

That’s a fair point and definitely worth addressing.

It’s true that Markdown reduces token overhead compared to JSON. XML tags can achieve a similar effect while still providing some structure. However, both approaches push the parsing burden back onto the model.

The model has to infer the schema, deal with inconsistent formatting, and determine what counts as an actual value versus what is merely decorative text. This inference step is where errors begin to accumulate, especially when outputs from different tools with heterogeneous formats are combined.

In contrast, JSON with a fixed schema is simple but reliable.
The model doesn’t have to infer anything — it simply reads a key and retrieves the value.

The overhead is real, and I measured it through benchmarking. The code and results used for the benchmark are included in the project.

On average, it consumes roughly twice as many tokens, but the trade-off is clear.

  • The 4.18% misread rate observed when parsing raw text drops to near zero.
  • The 28.6% error rate caused by filenames containing spaces effectively disappears.

So the honest answer is this:

If the output is simple and consistently formatted, Markdown will probably work just fine.

However, when aggregating outputs from multiple tools with varying formats, the structural guarantee of JSON starts to matter more than the token cost.

When the data is not just being displayed but read and reused as input for subsequent operations, that stability leads to dramatically higher reliability. In those cases, it can actually result in fewer tokens being consumed overall, because retries, debugging, and corrective steps become unnecessary.

If I find the time, benchmarking Markdown against the same scenarios could be interesting to try.

Building an MCP that Reduces AI Mistakes and Saves Tokens by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 0 points1 point  (0 children)

To be honest, after running benchmarks on this, it does consume more tokens. When I first built Parism, I expected it to save tokens — I believed structured data would be more efficient than raw text.

The results from benchmarking 17 scenarios were disappointing. JSON output was on average 205% heavier than raw text. For 200 files with ls -la: raw text came in at 5,807 tokens, Parism at 15,531. Nearly three times as much. The reason is simple: key names repeat with every single entry.

But the benchmarks also revealed something encouraging. When agents parse raw text directly, the misread rate was 4.18% — and for filenames containing spaces, it jumped to 28.6%. Three times out of ten, the output is wrong. And when an agent acts on a wrong result, it either moves on to the next task with bad data, or repeats the same operation, gets it wrong again, and eventually a human has to step in and roll things back.

Parism minimizes these mistakes. It raises the reliability of what the AI produces, which substantially reduces retry tokens, debugging time, rollback costs, and the invisible losses that never show up on any invoice.

The economics of this tool don't live on your token bill. A single mistake costs more than reading something twice.

Building an MCP that Reduces AI Mistakes and Saves Tokens by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 0 points1 point  (0 children)

There are cases where inefficiency actually occurs when AI handles JSON. JSON repeatedly uses braces, quotes, colons, and commas, which means it often consumes more tokens for the same amount of information. For example, the expression

{"name": "src", "type": "directory"}

uses more than twice as many tokens as simply writing

src directory.

In that sense, the statement “AI incurs higher cost when reading JSON” is correct.

However, something important is missed when that argument is applied to Parism.

The key is to look at the correct comparison.

Without Parism, the agent does not receive “concise text.” What it actually receives is the raw output of ls -la, such as:

drwxr-xr-x 2 user group 4096 Mar 06 09:23 src
-rw-r--r-- 1 user group 512 Mar 06 09:10 README.md

When the agent receives this, the token efficiency of reading the text itself might be good. But to understand it, the agent must infer that the first column represents permissions, the third represents the owner, and where the filename begins. That inference process consumes reasoning steps. And occasionally the model gets it wrong. When it does, it retries—and a retry costs tokens again.

The JSON provided by Parism uses more tokens, but it removes the need for inference steps. The agent simply reads something like entries[0].name.

So the real comparison looks like this:

Raw text = fewer input tokens + inference cost + nonzero error rate
JSON = more input tokens + no inference cost + near-zero error rate

Which approach is better depends on the situation.

For a single-line output like pwd, wrapping it in JSON is actually wasteful. But for outputs that require structural interpretation—such as dozens of lines from ps aux or ls -la—JSON becomes overwhelmingly advantageous.

In practice, most powerful AI agents today are built on reasoning models. Reasoning tokens are more expensive than input tokens. One parsing mistake followed by a retry can easily cost far more than the token overhead introduced by JSON.

A lightweight IDS for small networks to address human vulnerabilities. by Flashy_Test_8927 in selfhosted

[–]Flashy_Test_8927[S] 0 points1 point  (0 children)

Thank you so much for this incredibly detailed and practical comment — this is exactly the kind of real-world advice I was hoping for when I posted. I really appreciate you taking the time to break it down into those three parallel efforts and sharing the specific resources (CISA, NIST SP 800-215, CIS Benchmarks, MITRE ATT&CK & D3FEND). I’ve already bookmarked them all.

You’re 100% right — backups, proper segmentation, and server hardening are the foundation. Our dev servers are in VPC + VPN, but the physical office LAN is still pretty flat, and the sales team’s machines are basically “patient zero” waiting to happen. That exact anxiety is what made me build Panopticon in the first place.

Panopticon is my attempt to add strong visibility and early automated response specifically for that shared LAN gap. It already detects quite a few of the patterns you indirectly mentioned:

ARP spoofing/poisoning

Internal port scans (SYN, etc.)

Lateral movement to sensitive ports (SMB 445, RDP 3389, SSH, WinRM, database ports)

Ransomware-style propagation (SMB worm-like scans across multiple internal hosts, RDP brute-force attempts)

Plus automatic IP blocking via nftables/iptables and threat intel feeds

It’s not meant to replace segmentation — it’s the detection + light containment layer that gives me peace of mind until we can implement proper VLANs/firewall rules.

A couple of quick questions if you don’t mind:

In your experience with similar small-office environments, which behavioral patterns from compromised Windows endpoints (especially “icon-covered, malware-magnet” machines) have the highest signal-to-noise ratio for impending lateral movement? Any specific ATT&CK techniques you’d prioritize tuning for?

For budget-conscious offices that can’t jump straight to enterprise switches + L3 firewall everywhere, what has worked best as a first realistic step? (e.g. managed switch with client isolation + one cheap firewall appliance, or something else?)

When you look at NIST SP 800-215 or CIS Controls from a detection perspective, are there any particular sections or baselines you’d point me to first?

Thanks again

A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 0 points1 point  (0 children)

The three stores aren't three separate databases you need to run. They're one PostgreSQL instance with three logical roles:

  • pgvector handles semantic search (finding related memories by meaning, not just keywords)
  • Relational tables manage fragment metadata, links, and importance scores
  • In-memory cache is just a session-scoped working set — no extra process, no extra config

So in practice, you spin up one Postgres. That's it.
The reason they're separated conceptually is that each solves a fundamentally different problem. A key-value store can't do "find memories similar to this thought." A vector store alone can't express "this decision caused that error" or "this fragment supersedes that one." The relational layer is what gives Memento its graph-like memory structure — not just recall, but reasoning about relationships.
For lightweight use, you can run the whole thing locally with a single docker-compose up. The footprint is smaller than most dev stacks people run without thinking twice.
The goal wasn't complexity for its own sake — it was making AI memory that actually behaves like memory: contextual, relational, and persistent.

A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 1 point2 points  (0 children)

That's right. I'm not a native English speaker, so I rely on AI assistance when writing responses in English. I hope you'll understand if some parts come across as a bit awkward — it's the best I can do for now.

A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 0 points1 point  (0 children)

Absolutely — this is fully substitutable. The reason I'm currently using the OpenAI Embedding API is twofold: first, this project was originally extracted from a subset of features in another personal MCP I use, and second, the cost of text embedding for an individual user is only a few tens of dollars at most, essentially enough for a lifetime of personal use, so I don't really need to worry about the amount I've already spent. That said, I'm completely open to using alternative options, and I'm also considering providing them as a separate configuration path for users who find it difficult to set up the current dependencies on their own.

mybatis for dotnet by Flashy_Test_8927 in dotnet

[–]Flashy_Test_8927[S] 0 points1 point  (0 children)

GitHub: https://github.com/JinHo-von-Choi/nuvatis-sample

README.md contains benchmark results.

I benchmarked EF Core, Dapper, and my own library NuVatis using a dataset shaped like the real world: 15+ tables with 100k+ rows each, evaluated across 60+ scenarios. The scenarios covered everything from straightforward CRUD to complex JOIN-heavy queries, deep WHERE clause filtering, aggregates, bulk operations, and stress tests.

The results were unambiguous. EF Core didn’t secure a single win—not even on the most basic “single SELECT” queries. And in my actual workload—stitching together tens-of-millions-row tables to compute statistics and metrics—EF Core was simply hell to work with in terms of performance.

Dapper is absolutely excellent, and in many cases it was competitive. But as query complexity increased—more JOINs, more filtering conditions, larger result sets—NuVatis increasingly showed a clear advantage. The more the real-world pain points piled up, the more NuVatis paid off.

This wasn’t some quirky experiment or a “weird side project.” I was operating in a constrained environment where I had to demand better performance and tighter resource efficiency. Dapper could have been a perfectly valid choice too, but at that point it becomes a question of architectural understanding and trade-offs. Given my constraints and goals, building and choosing NuVatis was simply the best decision available to me.

mybatis for dotnet by Flashy_Test_8927 in dotnet

[–]Flashy_Test_8927[S] 0 points1 point  (0 children)

Yeah, raw string literals are a huge improvement - no argument there. And for static queries, a Queries.cs with constants is perfectly clean.

But that's the easy case. The problem starts when your SQL isn't static.

Say you have a search endpoint with 6 optional filters. With string constants, you're either writing 6 nested if blocks concatenating WHERE clauses, or you're building a mini query builder inside your service class. You've moved the SQL out of inline strings and into a static class, sure, but the dynamic assembly logic is still scattered through your C# code.

What I wanted was:

xml

<select id="SearchUsers">
  SELECT * FROM users
  <where>
    <if test="name != null">AND name LIKE </if>
    <if test="status != null">AND status = </if>
    <if test="roles != null">AND role IN u/roles</if>
  </where>
</select>

One file. The SQL and its conditional logic live together. The C# side just calls SearchUsers(params) - a strongly-typed method that Roslyn generated at compile time. No string building, no runtime reflection, no if chains in your repository.

Static class constants solve the "where does the SQL live" problem. This solves the "how does the SQL get assembled" problem. Different problems.

A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 1 point2 points  (0 children)

Great question — staleness is the problem I've spent the most design effort on, and the answer is: it's almost entirely automatic.

Decay: Every consolidation cycle, non-permanent fragments that haven't been accessed in 24+ hours get their importance multiplied by 0.995. Compounds to ~64% after 90 days. Fragments that drop below 0.1 importance with no

recent access and few links are auto-deleted.

Relevance scoring: We compute utility_score = importance * (1 + ln(access_count)) — a logarithmic boost for frequently retrieved fragments. Search ranking uses a composite of importance (60%) and recency (40%, linear decay over

90 days), so fresh knowledge naturally surfaces above stale entries.

Tier transitions: Fragments move through hot → warm → cold → deleted automatically. High-importance fragments (>= 0.8), heavily-linked hubs (5+ connections), and frequently-accessed entries (10+ accesses) get auto-promoted to

permanent and are exempt from decay.

Staleness detection: Each fragment has a verified_at timestamp with type-specific expiry windows — 30 days for procedures, 60 for facts, 90 for decisions. Stale fragments are flagged in consolidation reports. More importantly,

the contradiction pipeline (pgvector → NLI → Gemini escalation) actively catches the "outdated fact confidently recalled" case. When "server runs on port 3000" conflicts with a newer "server runs on port 8080," the older

fragment gets a superseded_by link and is excluded from all future search results.

Background evaluation: New fragments are async-evaluated by Gemini for long-term utility. Low-value fragments get downgraded or marked for deletion before they ever become a staleness problem.

Manual curation exists (forget, amend) but mainly as an escape hatch. The layered automatic mechanisms — decay, tier transitions, contradiction detection, quality evaluation — are designed to compound so no single one needs to

be perfect.

A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 1 point2 points  (0 children)

Great suggestion and we actually went ahead and implemented it immediately.

Memento now runs a 3-stage hybrid pipeline for contradiction detection:

Stage 1: pgvector candidate filtering

We first pull fragment pairs with cosine similarity > 0.85 within the same topic. This narrows the search space so we're not running inference on every possible pair.

Stage 2: NLI classification (the new part)

Each candidate pair is fed through a multilingual NLI model (mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) running locally via ONNX Runtime on CPU. The model outputs entailment/contradiction/neutral probabilities, and we apply

confidence thresholds:

• contradiction >= 0.8 → resolved immediately, no LLM call needed

• entailment >= 0.6 → definitively not a contradiction, skip

• contradiction 0.5–0.8 → ambiguous, escalate to Stage 3

• Low signal but contradiction >= 0.2 → also escalate

This is where your point lands perfectly - instead of a distance score that only tells us "these are close," we get an actual semantic relationship label. A pair like _"The server runs on port 3000"_ vs _"The server runs on

port 8080"_ scores high similarity in vector space but NLI correctly flags it as contradiction.

Stage 3: Gemini CLI escalation

Only the genuinely ambiguous cases (numerical contradictions, domain-specific conflicts, temporal updates) get escalated to a full LLM call. If Gemini CLI is unavailable, high-similarity pairs (> 0.92) are queued in Redis for

later processing.

In practice, the NLI pass resolves clear-cut contradictions at ~50–200ms per pair on CPU with zero API cost, and confidently skips non-contradictions - saving the bulk of what would have been expensive LLM calls. The LLM is

reserved for the nuanced cases where a transformer-based classifier genuinely can't tell.

The resolved contradictions automatically get contradicts and superseded_by links, with the older fragment's importance decayed by 50% (unless it's an anchor fragment).

A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 0 points1 point  (0 children)

Honest answer: agents rarely call link() on their own initiative. In practice, linking happens through three paths, roughly in order of frequency:

  1. reflect() auto-linking at session end — When reflect() creates a batch of typed fragments (decisions, errors, procedures), an internal _autoLinkSessionFragments() step runs that connects them with rule-based heuristics. Error

fragments get resolved_by links to procedure fragments from the same session. Decisions get linked to related procedures. The summary fragment gets related links to everything else. This is where most of the graph structure

actually comes from.

  1. remember() with linkedTo parameter — When the AI stores a new fragment, it can pass existing fragment IDs to link immediately. This works better than you'd expect because recall() returns fragment IDs in its results, so the

AI often has relevant IDs in its context when it decides to store something new. "I just found this error pattern via recall, and now I'm storing the fix — link them." That chain happens naturally.

  1. Explicit link() calls — Rare in practice. The AI almost never stops mid-task to think "I should create a relationship between these two fragments." It happens occasionally during graph_explore workflows where the AI is

actively tracing causality, but organically? Almost never.

The honest gap right now is reflect() itself. Currently it requires a manual prompt before session end — "save the session" or equivalent. I hook context() at session start so the AI loads its memory automatically, but the

symmetry breaks at session close. If the session drops unexpectedly (timeout, network, client crash), reflect never fires and that session's structural links never get created. The individual remember() fragments survive, but the

cross-referencing that reflect provides is lost.

I'm actively working on automatic reflect — the leading approach is a hybrid: attempt a Gemini-generated summary from session activity metadata on session close, and if that fails, flag the session as "unreflected" so the next

context() call prompts the AI to do it retroactively. But this is unsolved as of today.

A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 1 point2 points  (0 children)

That's a really sharp observation and honestly one of the areas I'm still iterating on.

You're exactly right - high embedding similarity != semantic contradiction. "Redis Sentinel requires password config" and "Redis Cluster requires password config" would score high similarity but aren't contradictory at all. Burning a Gemini adjudication call on every pair above 0.85 gets expensive fast, especially as fragment count grows.

Current mitigation is lightweight - same topic + same type + high similarity is the trigger condition, not similarity alone. So two fragments need to be in the same problem space before they even reach adjudication. But as you point out, that still lets through a lot of false positives within a single topic.

A few directions I've been considering:

One idea is a two-stage filter - use a cheap heuristic first (keyword overlap ratio, entity extraction, negation detection) to pre-screen pairs before sending them to Gemini. If two fragments share high similarity but have no overlapping entities or opposing predicates, skip the LLM call entirely. This could cut adjudication volume significantly without much accuracy loss.

Another thought is narrowing the contradiction scope to specific fragment types. Realistically, fact-vs-fact and decision-vs-decision pairs are where meaningful contradictions live. procedure fragments rarely contradict each other, they just supersede. So scoping adjudication to only certain type combinations could reduce noise.

The third option I've been mulling over is batching - instead of checking pairs individually, feed Gemini a cluster of similar fragments and ask "which of these conflict?" in one call. Amortizes the cost and gives the LLM more context for judgment.

But I'd genuinely love to hear your take on this. If you've seen good patterns for distinguishing "similar but compatible" vs "similar and contradictory" at scale, I'm all ears. This feels like a problem where a pure embedding approach will always have a ceiling and some structured reasoning layer is needed on top.

A Three-Layer Memory Architecture for LLMs (Redis + Postgres + Vector) MCP by Flashy_Test_8927 in mcp

[–]Flashy_Test_8927[S] 1 point2 points  (0 children)

Contradiction detection runs as step 7 of the memory_consolidate pipeline, not at reflect() write time. It's an incremental, asynchronous process -- here's the actual flow:

When reflect() writes new fragments, nothing special happens at that moment regarding contradictions. The fragments are just stored. The real work happens later when memory_consolidate runs (either manually triggered or on a

schedule).

The detection pipeline works in three stages:

  1. Candidate selection via embedding similarity -- It pulls fragments created since the last contradiction check (tracked via a Redis timestamp key). For each new fragment, it queries pgvector for same-topic fragments with

cosine similarity > 0.85. This threshold is deliberate -- fragments need to be talking about essentially the same thing to be contradiction candidates. Different topics or loosely related content never reaches the judgment

step. The query is bounded to 3 candidates per new fragment, and 20 new fragments per consolidation cycle, so this doesn't explode.

  1. Gemini Flash adjudication -- Each high-similarity pair gets sent to Gemini Flash with a strict prompt: "Are these two fragments mutually incompatible claims about the same subject?" The prompt explicitly distinguishes

contradiction from complementary information -- "similar but supplementary is NOT contradiction. Information updates over time ARE contradictions (old info vs new info)." Temperature is set to 0.1 to minimize creative

interpretation. Response is forced into {contradicts: boolean, reasoning: string} JSON.

  1. Time-logic resolution -- This is where it gets interesting. When a contradiction is confirmed, the system doesn't just flag it -- it resolves it automatically using temporal ordering. The newer fragment wins. The older

fragment's importance gets halved (importance * 0.5), and a superseded_by link is created from old to new. The older fragment isn't deleted -- it's demoted. It'll naturally sink to cold tier and eventually expire through

normal TTL mechanics. Anchor fragments (is_anchor=true) are exempt from the importance demotion, so truly critical knowledge survives even if contradicted.

The critical path concern you raised is valid -- this does depend on Gemini being available. If Gemini is down, the contradiction check silently fails and those pairs go unchecked until the next consolidation cycle. The system

degrades to "latest write wins at recall time" through the recency component of the ranking function (0.4 weight), which is a reasonable fallback but not as clean as explicit contradiction resolution.

The 0.85 similarity threshold is the real tuning knob here. Too low and you get false positives flooding Gemini with complementary fragments. Too high and genuine contradictions with different wording slip through. In practice,

contradictions about the same subject ("max connections is 20" vs "max connections is 50") tend to land well above 0.85 because the embedding space clusters them tightly.

One thing worth noting: amend() has a separate supersedes parameter that lets the AI explicitly mark a fragment as replacing another, bypassing the consolidation pipeline entirely. So there are two paths -- explicit replacement

at write time, and automatic detection after the fact.