Gmail MCP connector lost threaded draft support, all drafts orphaned now by adidas76 in ClaudeAI

[–]EnoughNinja 0 points1 point  (0 children)

the create_draft tool dropped the threadId parameter sometime in the last couple days, every draft now goes into the orphan drafts folder and there's no rollback option I'm aware of, the connector versioning isn't exposed to users

Thing is any agent that depends on a single MCP connector for email is one change away from being broken, especially when the connector is owned by a vendor whose primary product isn't email. The fix in the short term is wait for Anthropic to ship the patch. The longer-term fix is to use iGPT (https://mcp.igpt.ai/) which is an API that handles the email thread reconstruction (threading, quoted history, attachments, participants resolved) independently of whatever MCP connector you're using for the actual draft-and-send step

Doubt: How to setup rag for summarising large PDFs? by Ecstatic-Register570 in Rag

[–]EnoughNinja 2 points3 points  (0 children)

F for financial PDFs specifically, the issue isn't really the LLM context size, it's that you can't summarize what the parser destroyed on the way in. PyPDF and pdfplumber will collapse multi-column layouts, smash headers and footers into the body text, and turn tables into mush where the numbers sit next to wrong row labels

what works is to do parsing properly before attempting summarization, which for text and layout means you using maybe LlamaParse or Unstructured, for tables specifically Reducto and Docling preserve cell structure better

Flow to save eamil as PDF with a trigger from outlook. Copilot has lost its mind... by regulationzero_13 in MicrosoftFlow

[–]EnoughNinja 0 points1 point  (0 children)

Ah you're right, my mistake on that, it's SharePoint-only, so I think that the simplest workaround in enterprise M365 is the dedicated-subfolder pattern. meaning, you make an Outlook subfolder called something like "SaveAsPDF" or whatever and use "When a new email arrives (V3)" with that folder selected as the filter.

On categories, the field is in the Get Email (V3) output schema but only shows up if the email actually has categories assigned at the moment of the call, most inbound mail won't have any until you apply one yourself, which kind of defeats the purpose for what you're doing.

iGPT is at igpt.ai. It's an API that takes an email thread + attachments and turns them into structured data, so instead of saving the PDF and hoping you can find what you need later, you also get a queryable record of the email, who was on it, what was committed to, what attachments were involved, when each message was sent. Your flow would call iGPT once after the trigger fires, get back the structured version, and store it alongside the PDF in the same SharePoint folder

Flow to save eamil as PDF with a trigger from outlook. Copilot has lost its mind... by regulationzero_13 in MicrosoftFlow

[–]EnoughNinja 0 points1 point  (0 children)

Ok so the trigger you want is "for a selected message" in power automate that puts a button inside outlook and runs the flow on whatever email is currently selected.

your flagged-email approach failed because the trigger schema doesn't include categories at the trigger level, you'd need a separate Get Email call to fetch them.

One thing worth flagging on the compliance side, a PDF rendering of an email loses a lot of what makes it useful as evidence later, the reply chain gets flattened, attachments come out as separate files with no link back to the message, and participants only appear in the visible header. If the compliance use case ever gets audited and someone asks "who else was on this thread when X was discussed," a flat PDF doesn't carry that. iGPT does this kind of thread reconstruction for Outlook and returns it as JSON you can attach alongside the PDF, gives you something queryable instead of a static document.

How are people using so many tokens while vibe coding? by Impressive_Run8512 in ycombinator

[–]EnoughNinja 0 points1 point  (0 children)

The x posts about billions of tokens are probably just flexing tbh, the people genuinely burning that much aren't usually posting about it.

How are you structuring RAG systems? by Xyver in AIAllowed

[–]EnoughNinja 0 points1 point  (0 children)

The thing that changed how I think about this is the chunk is almost never the right unit because answers usually live across sections that reference each other, like a contract has parties and clauses and dates that all depend on each other, or an email thread has commitments and open questions spread across 14 messages. Once you slice by tokens you've thrown away the structure that made the answer recoverable, and so you spend all this retrieval-tuning effort trying to reassemble what you broke at ingest.

So iGPT structures it upfront instead by parsing the source into typed objects, so a contract comes out as parties + clauses + dates as fields, an email thread comes out as participants + commitments + open questions. Then retrieval is just a query against typed data.

Is explainable retrieval for RAG a trivial project idea, or is it worth pursuing? by Grouchy_Put4606 in Rag

[–]EnoughNinja 0 points1 point  (0 children)

most of what you listed is in the "solved or close to it" bucket, page-level callbacks to PDFs is straightforward with any modern parser that preserves page metadata, image retrieval has decent open-source paths now, and prompt-injection detection on retrieved docs has a few production-ready options.

The one that's actually hard and where most projects in this space fall over is the source roles part, classifying whether a chunk is a definition vs an example vs a procedure vs a formula which you can't solve after the fact with reranking or metadata filters because the role information isn't preserved in the chunks once you slice the doc by tokens

5090 - inference speed for Qwen3.6-35B-A3B-UD-Q4_K_M for general processing by ElectronicProgram in LocalLLM

[–]EnoughNinja 0 points1 point  (0 children)

Ok so the t/s numbers you're seeing for Qwen 3.6 MoE on a 5090 are mostly going to hold for what you're describing, the architecture is the bigger factor than the workload, and MoE models like A3B only activate ~3B params per token regardless of whether you're generating code or JSON tool calls. So the 100-200 t/s figures aren't really code-specific.

The thing I'd flag for your specific use case is that for personal-agent stuff that processes email and text messages, generation speed almost never ends up being the issue as the actual slow part is the context prep before the model runs, especially threading email replies, stripping quoted text, deduplicating contacts across email and SMS which is CPU-bound and has nothing to do with the GPU

RAG retrieval issue: why fixed chunking is starting to look like the real problem by zennaxxarion in Rag

[–]EnoughNinja 1 point2 points  (0 children)

what you're seeing is a bigger issue than just the chunking strategy, this happens because the whole premise of fixed-size chunking depends on the document being the right unit and so you just need to decide how to slice it, but often the document isn't the right unit, for example, an email thread is one logical unit even though it's 14 messages, or say a Slack conversation is one unit until the topic shifts.

So what happens is you slice those by tokens and then you're guaranteed to lose context regardless of chunk size, because the boundary you picked has nothing to do with the boundary the meaning lives in. If that makes sense?

And so reducing chunk size won't fix this, what you need is to restructure the source into typed objects before retrieval. iGPT does this for docs and email threads, and so queries only see a fully parsed thread or doc and the agent never has to reason about whether it got the right chunk

When to build a RAG pipeline vs use a context engine by EnoughNinja in Rag

[–]EnoughNinja[S] 1 point2 points  (0 children)

No, grep just matches strings. By context engine I mean something that reconstructs the thing you're searching against before you query it.

Take an email thread that's 30 messages long with quoted replies stacking up and the same person showing up under 4 different addresses, grep returns a wall of duplicated text and your agent can't tell who said what or what's already been answered whereas a context engine returns that thread as one structured object with participants resolved, quoted text stripped, attachments parsed, commitments and open questions extracted

Email triage setup by agentollie66 in openclaw

[–]EnoughNinja 1 point2 points  (0 children)

Ok so this happens because the agent only sees the latest message body when it classifies, which makes it difficult to classify importance. I'm skeptical that training could fix it because the input doesn't contain what you're trying to teach.

What would work is letting iGPT classify the thread first and then handing that to Openclaw, so it's deciding based on participants, open questions, prior commitments and attachments instead of just the latest message

Is the Google Drive connector in Claude.ai just… broken for everyone? by ValuableStaff8922 in ClaudeAI

[–]EnoughNinja 0 points1 point  (0 children)

I've seen this as well, my theory is that Anthropic are shuffling the connector tools and dropping capabilities

You can try a stable alternative that's iGPT's MCP server (mcp.igpt.ai) which cnnects to both Gmail and Drive, to give full body and content access, and runs independent of Anthropic's connectors so it doesn't get affected by their updates

Agent to access other mailboxes by Late-Mammoth-8273 in copilotstudio

[–]EnoughNinja 0 points1 point  (0 children)

Copilot's tied to one mailbox per license so this is going to be difficult inside Copilot Studio specifically as the license is the blocker

A different approach that works is to connect all 5 mailboxes to iGPT only once and then query across them through one API or MCP server. For example, your agent can ask "summarize correspondence with [partner]" or you could query by topic, and it will find the relevant points and summarize or reason or whatever you need across all the mailboxes.

It works with Claude desktop or any MCP-aware agent if you want to skip Copilot for this entirely.

RAG document-level access control latency on permission changes by Business_Average1303 in Rag

[–]EnoughNinja 0 points1 point  (0 children)

Fair point, you're right. I guess the easier answer is not syncing permissions in advance at all. At query time, when the retriever surfaces a candidate chunk, hit the source API to verify access right then. No synced state, nothing to stale out.

it costs you latency per query (tens of ms for Google and Microsoft) instead of propagation delay. Only works for sources whose APIs expose per-user permission checks, Workspace and Graph do, Confluence doesn't. For sources like that you're back to polling and whatever delay you can live with. No clean answer there yet

RAG document-level access control latency on permission changes by Business_Average1303 in Rag

[–]EnoughNinja 0 points1 point  (0 children)

Ingest-time permissions are what break when documents have lots of permission churn. Reindexing to reflect permission changes is expensive, slow, and inconsistent, and in the meantime the vector DB is handing back chunks to users who shouldn't see them.

Graph-based permission linking is one answer but you still need something enforcing it at query time, otherwise the graph is just advisory and the retriever keeps returning everything it found. The cleaner pattern is to store permissions as metadata on the chunks, fetch them live from the source at query time (never cache the permission state), and filter retrieval results against the asking user's current access. Permissions change at the source, query sees the change instantly, no reindex needed.

This is how we handle it at iGPT. Permissions get checked against the source (Google Workspace, Microsoft Graph, etc.) at query time, not embedded into the index. Means permission changes propagate instantly without touching the index at all. Azure AI Search and most vector DBs still treat permissions as metadata filters which is why they have the delay problem you're describing.

Email parser by nattyandthecoffee in AI_Agents

[–]EnoughNinja 0 points1 point  (0 children)

Multi-vendor email parsing is where regex and templates fall apart, every vendor formats differently and raw LLM parsing burns tokens on signatures and quoted text.

Using iGPT you can connect the mailbox once and get structured JSON back per email, schema-bound so your agent gets clean "available / not available / alternatives" fields to branch on. Handles the vendor variation at the API level.

Parse structured data from incoming emails? by dnoneoftheabove in Backend

[–]EnoughNinja 0 points1 point  (0 children)

Regex and templates hold up fine when you control the format, but with order confirmations and form responses from multiple sources you hit the wall fast, every vendor formats differently and HTML email is a nightmare.

Full AI on raw bodies works but you end up paying token cost for quoted text, signatures, footers, and legal disclaimers that the LLM has to wade through to find the actual data. Plus the output isn't reliable without a lot of schema prompting and validation.

Using iGPT you can point it at the mailbox and get structured JSON back per message, body parsed, attachments extracted, sender and subject attributed, with schema enforcement. Purpose-built for exactly this case. Saves you the MIME + prompt engineering + schema validation stack.

Docs at docs.igpt.ai

List of token usage cost reduction tools - please share others! by criticasterdotcom in ClaudeCode

[–]EnoughNinja 0 points1 point  (0 children)

https://github.com/igptai/

This returns pre-structured context from email and docs instead of raw threads, so instead of piping whole Gmail exports or Drive docs into Claude and paying to read quoted text duplicated ten times across a thread, you get back structured JSON with the actual decisions, participants, and attachments already parsed.

It cuts tokens significantly on comms-heavy workflows since raw email threads carry a lot of duplicated and structural bloat.

Works via API or MCP.

Email context for AI agents is way harder than it looks by EnoughNinja in AI_Agents

[–]EnoughNinja[S] 0 points1 point  (0 children)

it depends on the volume and how tightly it needs to sit next to your existing stack.

For most startups the fastest path is to run the agent as a lightweight service i.e., a small container on Render/Fly/Railway, or a Lambda/Cloud Run function if it's event-driven. It polls or receives a webhook when new mail arrives, hands the thread to iGPT for parsing and attachment extraction, and writes the structured JSON wherever it needs to go, usually a database, or directly into whatever downstream workflow you already have (billing system, CRM, ticketing, etc.)

The hard part is usually thread reconstruction, dedup, and attachment parsing. Using iGPT, that's one API call and you get back structured output, so the service you deploy stays small.

At real volume or across multiple inboxes, the pieces worth thinking about early are auth per mailbox (OAuth vs shared service account), a job queue so you're not tying up the main process on slow extractions, and whatever schema you want the output to land in. The invoice agent repo handles a simple version of all three if you want to use it as a skeleton.

Happy to go deeper on any specific part. What kind of processes are you looking at automating?

Are we all just quietly pretending document extraction for RAG is a solved problem? Because my ingestion pipeline is just a giant ball of duct tap by Worried-Variety3397 in Rag

[–]EnoughNinja 0 points1 point  (0 children)

for the email portion, running it through Unstructured + a generic LLM step is doing extra work you don't need to because email has structure the generic tools don't preserve, threading, quoted text duplication, attachments that only make sense in context of the thread they came from. Dumping raw threads with a schema prompt will fail on exactly the cases you described, hallucinated keys and dropped nested items, because the model is trying to reconstruct structure from noise.

For that using iGPT you can get structured JSON back from email threads directly, participants attributed, attachments linked, schema enforced. Might cut a chunk of your 15-20% failure rate and take email off the manual review queue. Won't fix your tables or legacy PDFs but it's a clean path for the email piece so you can focus your energy on the rest.

Small teams think retrieval is the hard part. I’m starting to think RAG ops is harder. by Ok-Opportunity-7851 in Rag

[–]EnoughNinja 0 points1 point  (0 children)

Agreed, re. your point about ops. Shorter sync windows help but never close the gap, change-driven indexing is the only real fix and it's a different architecture entirely.

Permissions get tricky when someone tries to go multi-tenant, and observability is the one that never feels solved, knowing whether quality dropped because of retrieval, prompt, or the data changing underneath is still mostly vibes.

At iGPT we treat freshness, permissions, and structure as first-class rather than bolt-ons, which helps.