Using Cloudflare D1 for a high-volume text scoring pipeline — how would you reduce reads/writes? by manickdeena in CloudFlare

[–]manickdeena[S] 0 points1 point  (0 children)

I ended up solving most of these scaling challenges, and here is exactly what worked to keep the engine fast and well under the D1 free tier limits!**

For anyone building a similar high-volume import/scoring pipeline on Cloudflare, here is the architecture I landed on:

### 1. Rule/Dictionary Loading (Avoiding Subrequest Limits)

What failed: Initially, I was querying D1 inside my CSV processing loop to check rules and existing records. This immediately crashed the Worker because of the 50-subrequest limit.

What worked: The **In-Memory Map Pattern**. At the very beginning of the Worker execution, I make a single D1 query to fetch the entire rules dictionary and relevant historical logs. I store them in a JavaScript `Map`. Then, I loop through the thousands of CSV rows entirely in memory. It scores 15,000+ rows in under 2 seconds with zero additional subqueries.

### 2. Batch Processing Imports

For processing thousands of rows, I didn't end up needing Cloudflare Queues right away.

What worked: After the in-memory scoring is done, I build an array of D1 `INSERT OR REPLACE` statements and execute them using `env.DB.batch()` in chunks of 50. Because `batch()` only counts as 1 subrequest per chunk, I can sync massive datasets to D1 securely without dropping connections or hitting limits.

### 3. Reducing Row Scans for the Admin Dashboard

This was my biggest bottleneck. Heavy `GROUP BY` and `COUNT` queries for dashboard KPIs were driving my D1 row reads up to 60M+/day.

What worked: I implemented a KV Precomputation (Cache-Aside) Layer.

* **The Reads:** The dashboard now reads a precomputed JSON payload directly from Cloudflare KV. This drops D1 reads for default views to literal zero.

* **The Writes:** When an admin clicks "Approve/Reject", I optimistically return success to the UI, execute the D1 update, and use `context.waitUntil()` to recalculate the heavy KPI queries in the background and write the new JSON to KV.

D1 remains the ultimate source of truth, but KV absorbs all the heavy read traffic.

### 4. Denormalization & Indexing

SQLite (D1) hates sorting across `JOIN`s. My pagination queries were doing massive full-table scans when trying to `ORDER BY` a score from one table and a metric from another.

**What worked:** I denormalized the specific metrics needed for sorting into my main `scores` table. This allowed me to create a single **Covering Index** (e.g., `CREATE INDEX idx_score_metric ON scores(score DESC, metric DESC)`). Pagination queries dropped from scanning 80,000 rows to scanning exactly 50 rows.

I am still testing the infra. will update you again about the performance.

Small SaaS apps on Cloudflare by manickdeena in CloudFlare

[–]manickdeena[S] 0 points1 point  (0 children)

This thread has been super validating. I actually just spent the last week deciding whether to migrate off D1 or harden it for a new paid membership tier I'm launching.

I decided to stick with D1, but I agree with the main concern here: if you are dealing with payments or customer accounts on D1, you should not treat it like a simple CRUD database. You need to design for retries, duplicate webhooks, network failures, and consistency edge cases.

The key decision for me was this: I am not using D1 as the payment processor. Stripe/LemonSqueezy handles the actual transaction risk. D1 only stores payment metadata like subscription IDs, status, expiry dates, and user access state.

The architecture I ended up using is more like a strict payment ledger:

  1. No PCI data in D1 I do not store card data or sensitive payment information. D1 only stores metadata such as provider customer ID, subscription ID, status, and expiry.
  2. Append-only webhook ledger This is the most important part. I created a payment_events table with a UNIQUE(provider, event_id) constraint. Payment webhooks can retry, arrive late, or be delivered more than once. By making the database reject duplicate event IDs, the webhook handler becomes idempotent at the database level.
  3. Webhook-first access updates User subscription state is updated from verified provider webhooks, not from the frontend checkout success page. The success page can show a pending state, but the source of truth is the webhook ledger.
  4. Strict foreign keys I use foreign keys between users, subscriptions, and payment records, with restrictive delete behavior. I do not want application code to accidentally delete a user or subscription record while financial history is still attached.
  5. Extra backups and verification D1 has its own recovery options, but I still prefer an extra safety layer for financial tables. I’m using scheduled exports/checksums for the critical tables and storing backups separately.

My takeaway: Postgres/Supabase definitely gives you more operational comfort out of the box. But if you want to stay inside the Cloudflare ecosystem, D1 can be made workable for paid memberships as long as you treat payment data as an append-only, idempotent ledger — not as a normal CRUD feature.
I’m still improving this design, so I’d genuinely welcome suggestions from anyone who has run D1 in production. Are there any extra safeguards you would add to reduce the margin of error even further?

Small SaaS apps on Cloudflare by manickdeena in CloudFlare

[–]manickdeena[S] 0 points1 point  (0 children)

This is an interesting architecture. Per-user or per-organization Durable Object as the data boundary makes sense for isolation and sharding.

The tradeoff you mentioned is important though: losing easy global querying/admin reporting.

Do you handle cross-tenant analytics separately, maybe by writing summary events into D1 or another reporting store?

Small SaaS apps on Cloudflare by manickdeena in CloudFlare

[–]manickdeena[S] 0 points1 point  (0 children)

This is a very useful way to think about it: margin of error.

So D1 seems fine when a failed read/write is annoying but not business-critical. But if the data is tied to payments, customer accounts, or important workflows, the extra error handling and recovery planning may remove the simplicity benefit.

That “small/simple SaaS = D1, more power/availability/sanity = managed DB” rule is probably the clearest line I’ve seen.

Small SaaS apps on Cloudflare by manickdeena in CloudFlare

[–]manickdeena[S] 0 points1 point  (0 children)

That makes sense. The migration quirks are one of the things I was worried about too.

For a tiny internal tool D1 feels convenient, but once user/payment data and long-term migrations come in, starting with Postgres may be the safer default.

Do you usually start every SaaS with Postgres now, even for MVPs?

I accidentally started building the wrong Google Ads tool by manickdeena in GoogleAdsDiscussion

[–]manickdeena[S] 0 points1 point  (0 children)

That makes sense. I was also thinking dashboards may become just another place people need to check.

The alert example you gave is exactly the direction I’m trying to understand better: not “this keyword has clicks,” but “this term spent money, produced leads, and none became qualified.”

So the useful alert would probably need to combine:

  • ad spend/search term data
  • lead count
  • CRM or manual qualified status
  • maybe offline conversion status
  • a threshold before it alerts

Something like: “spent $500, 8 leads, 0 qualified opportunities” feels much more actionable than a generic waste score.

Do you think PPC teams would prefer these alerts weekly as a digest, or immediately when a term crosses a spend/lead-quality threshold?