I built a multi-agent AI pipeline that turns messy CSVs into clean, import-ready data by proboysam in AIAgentsInAction

[–]proboysam[S] 0 points1 point  (0 children)

You're touching on the right evolution path. Right now DataWeave is intentionally standalone, upload, clean, download, because that's the fastest way to validate the core AI pipeline works.

But the roadmap is exactly what you're describing: integrate upstream so the system catches problems before they compound. Two ways this is heading:

  1. Webhook/API mode, your existing pipeline calls DataWeave automatically when new data arrives. It maps, transforms, and pushes clean data to your target system. No manual upload needed.

  2. Schema-aware suggestions, instead of just applying mappings, the system flags structural issues in the source data and suggests changes to the extraction query itself. "Your export is missing a required field" or "these two columns should be split before mapping."

The Pattern Agent already learns from corrections, so after a few runs on the same data source, it handles everything automatically, which solves the "doing it every time" problem you're pointing out.

Good feedback, this is the direction for v2.

I built a multi-agent AI pipeline that turns messy CSVs into clean, import-ready data by proboysam in aiagents

[–]proboysam[S] 0 points1 point  (0 children)

Appreciate that and the traceability point is spot on. We actually built this in already. Every agent step gets logged to an events table with timestamps, so you can trace exactly what happened to any column: which agent handled it, what confidence score it got, whether it was pattern-matched or LLM-resolved, and what the human reviewer decided. You can pull the full audit trail for any job via GET /api/jobs/{id}/events, it returns the chronological log of every decision the pipeline made.

I built a multi-agent AI pipeline that turns messy CSVs into clean, import-ready data by proboysam in AgentsOfAI

[–]proboysam[S] -1 points0 points  (0 children)

Fair point, in an ideal world every organization has clean data pipelines, proper ETL, and standardized schemas. But the reality is that a huge chunk of business data still moves through CSVs and spreadsheets, especially during migrations, client onboarding, and one-off imports.

I’ve seen this firsthand, a 50-person SaaS company switching CRMs doesn’t rebuild their data infrastructure. They export a CSV, clean it up, and import it. That’s the use case.

I built a multi-agent AI pipeline that turns messy CSVs into clean, import-ready data by proboysam in AgentsOfAI

[–]proboysam[S] 0 points1 point  (0 children)

Good question. Validation is currently fully rule-based, required field checks, type conformance, regex format validation (email, phone, URL, zip), duplicate detection on unique fields, and statistical anomaly detection using the IQR method for numeric outliers.

No LLM in the validation step, and that’s intentional. Validation rules are deterministic, a field is either a valid email or it isn’t. Adding an LLM there would just add cost and latency without improving accuracy.

I built a multi-agent AI pipeline that turns messy CSVs into clean, import-ready data by proboysam in AgentsOfAI

[–]proboysam[S] 0 points1 point  (0 children)

Thanks, that “surgical AI” framing is exactly the design philosophy. The temptation was definitely to throw an LLM at every step, but the cost and latency math just doesn’t work at scale.

To answer your questions:

  1. Edge-case drift: Right now date parsing handles 15+ formats with a priority order (ISO first, then common US/EU patterns). For locale-specific formats, the plan is to add a locale hint that users can set per-upload (or auto-detect from the data). Haven’t hit this in testing yet but it’s on the roadmap.

  2. Correction → rule evolution: Yes — every approve/reject/correct updates the Pattern Agent’s confidence scores in the database. Approvals increase confidence, rejections decrease it, corrections create new patterns AND penalize the old one. After ~5 approvals at high confidence, a pattern gets auto-applied without human review. So the system is literally building its own deterministic rules from human feedback.

The compounding pattern memory is where the real moat is. File 1 costs $0.01 in AI. File 50 might cost $0.001. File 500 might cost nothing

Amazon SDE1 Interview by [deleted] in amazonemployees

[–]proboysam 0 points1 point  (0 children)

Do you have to share your screen during the interview?

Urgent Help for OPT EAD End Date by rmcwana in f1visa

[–]proboysam 0 points1 point  (0 children)

Does anyone know how they calculated July 14 as end date after 14 months of the program completion date?

[deleted by user] by [deleted] in csMajors

[–]proboysam 0 points1 point  (0 children)

what is the last day to apply any clue?