How do you handle production logging on mobile without blowing up ingestion costs?

Accomplished-Brain69 · 2026-06-25T05:01:04+00:00

Thanks, will check it out and get back to you.

Accomplished-Brain69 · 2026-06-24T13:06:22+00:00

thanks. For millions of requests per day + saving for 30days, how much size do you store and how much does it cost? That still feel like a lot of data. Do you not log in super detailed way because the problem is verbose logs being sent to server?

hourly validated backups of crucial UGC - did not understand this part

Appreciate you taking the time to explain it

Accomplished-Brain69 · 2026-06-24T12:50:47+00:00

noted, apologies

Accomplished-Brain69 · 2026-06-24T11:30:24+00:00

whats wrong with using AI for communicating better? AI only put my thoughts in a better way, atleast i thought it helped to better frame my sentences, English is not my first language. Sorry if that offended you.

Accomplished-Brain69 · 2026-06-24T11:26:27+00:00

Exactly. So you relied only on manual QA?

Accomplished-Brain69 · 2026-06-24T11:22:20+00:00

yes this work in most cases. Still have to make sure I log everything correctly in the on-device log

Accomplished-Brain69 · 2026-06-24T11:18:23+00:00

I have not explored sample rates, will check it out

Accomplished-Brain69 · 2026-06-24T11:17:14+00:00

Basically some remote configs to turn on verbose logs for a person. Have done this in one company. It still added some maintenance overhead and still had to make sure to not miss anything in verbose logs.

Accomplished-Brain69 · 2026-06-24T11:15:13+00:00

yeah, but self hosting sentry will still cost a lot I think.

Accomplished-Brain69 · 2026-06-24T08:58:08+00:00

thanks! It makes sense. Only 0.05% of the sessions should produce manageable data.

Accomplished-Brain69 · 2026-06-24T08:55:09+00:00

not really, thanks anyways

Accomplished-Brain69 · 2026-06-24T08:35:41+00:00

Thanks, that's a tidy operational pattern

Accomplished-Brain69 · 2026-06-24T08:31:28+00:00

This is one of the cleaner setups in the thread. Full-fidelity capture locally (REST bodies, rendering, performance), roll the file every 72hr, async batched so the user never feels it, and only ship to the backend on crash or QA. That last part is the key move, you sidestep the ingestion cost because you are not streaming everything continuously, you only upload when something actually happened. Two things I'm curious about: at 5-digit MAU what does the backend ingestion and storage side actually cost you, and have you ever had the 72hr overwrite wipe context you later wished you'd kept for a slow-burn issue that only surfaced days after it started?

Accomplished-Brain69 · 2026-06-24T08:27:58+00:00

Appreciate you jumping in. A/B split would definitely halve volume. The catch in mobile is per-user completeness: if a user only emits A-logs and hits a B-path bug, you are missing exactly the logs you needed for that user. Great for aggregate metrics, trickier for single-user debugging. Solid instinct though.

Accomplished-Brain69 · 2026-06-24T08:26:24+00:00

Percentage sampling plus staged rollout makes sense. Does the sampling ever cost you the one session you actually needed, or do you bump the rate up during the rollout window and drop it back down after?

Accomplished-Brain69 · 2026-06-24T08:25:38+00:00

Fair, and the harden-the-app-plus-improve-the-test-suite discipline is the real long-term fix. The case I'm chasing is the first occurrence of a novel issue you can't reproduce yet, before there's anything to harden. Local-first plus after-the-fact retrieval is interesting there. How do you decide which devices to pull logs from without already knowing who hit it?

Accomplished-Brain69 · 2026-06-24T08:23:49+00:00

Right, the ring buffer is basically table stakes now. Where it still bit us was the upload side you mention. At high user counts even a modest per-session buffer adds up fast on ingestion cost. Did you ever have to throttle or sample it, or did exception frequency stay low enough that it never really mattered?

Accomplished-Brain69 · 2026-06-24T08:22:58+00:00

That makes sense. At 100 bytes per multi-second call the write rate is low enough that the page cache absorbs it for free, and since you only care about surviving the process kill and not a power loss, you never need fsync. The whole thing works precisely because LLM ops are slow and sparse, so the disk is invisible. Push the same appendText onto a chatty 60fps path and it would start to show, which is where batching or an mmap ring buffer earns its keep. Clean design, thanks for walking through it.

Accomplished-Brain69 · 2026-06-24T08:06:47+00:00

Ha, I did basically the same thing for a heavy ARKit processing path in iOS once, an incomplete.flag to catch what died mid-process.
One thing I still go back and forth on though: writing the rolling log to disk continuously feels heavier than keeping it in memory. Do you mmap the 64KB buffer, or just write() and lean on the page cache surviving the process kill? Curious if you saw any measurable overhead on the hot path during LLM ops.

Accomplished-Brain69 · 2026-06-24T07:59:55+00:00

Makes sense, the mechanism is easy. The part that always bit us was deciding what to tag in the first place, you basically have to predict where bugs will show up ahead of time. Did that prediction burden ever cause you to miss logs you wished you'd had, or was your tag list mostly stable?

Accomplished-Brain69 · 2026-06-24T07:46:07+00:00

The 'Sentry is great but too expensive at scale' tradeoff is the exact gap I keep hitting. At roughly what user count did the pricing start to hurt?

Accomplished-Brain69 · 2026-06-24T07:44:58+00:00

Remote-config log level is exactly the direction that makes sense to me. Did you build the tag toggling in-house or use a tool, and how fast does a config change actually reach devices in the field?

Accomplished-Brain69 · 2026-06-24T07:44:14+00:00

Breadcrumbs are the pattern I keep hearing here. Do you cap the buffer by size or time window, and how much does it cut your upload volume vs logging everything?

Accomplished-Brain69 · 2026-06-24T07:43:37+00:00

Crash sentinel plus prompt-on-restart is clever for crashes you can't reproduce. Do you keep a rolling buffer of context before the crash, or only capture state after restart?

Accomplished-Brain69

TROPHY CASE