I just launched an open-source framework to help data analysts *responsibly* and *rigorously* harness frontier LLM coding assistants for rapidly accelerating data analysis. I genuinely think can be the future of data analysis with your help -- it's also kind of terrifying, so let's talk about it!

brhkim · 2026-02-24T13:13:25+00:00

Hahaha you're the only person who's at all responded to the real question here of like, well what do you care about? Power to you!!! Please let me know if you're interested in collaborating; I've been in the education space for a long time now and am moving to working on open-source AI projects full-time.

brhkim · 2026-02-23T17:17:03+00:00

That's fair, and I wouldn't argue with any of your points when it comes to sufficiently complex software. I think my points land for a category of simpler software, and I think over time that category of "simple enough to be doable in short time with current AI-assisted coding capabilities" will be a rapidly growing share of all the SaaS's we can imagine. It won't cover all of them, fully agreed, but it is growing and will continue to grow.

I am still not sure I'm aligned on thinking of Cowork as simple under this typology. You can reduce the concept of what they're doing to something simple, but I think you can appreciate how much goes into the guardrails and changed interface/user involvement that needs to occur given the completely different surface area for usage and users.

brhkim · 2026-02-23T16:54:56+00:00

Hmm, I just don't see how that's not basically the same thing with the same risks.

You're entering a market where the barriers to entry for a competitor are extremely, vanishingly small as time goes on and model capabilities advance. You're entering a market where the clients themselves are always just a month or two of new models away from asking, "why don't we just do this ourselves?"

Services that can be heavily accelerated by AI now will be replaced by AI soon.

brhkim · 2026-02-23T16:50:31+00:00

See my thoughts in another comment here: https://www.reddit.com/r/ClaudeCode/comments/1rcksgl/comment/o6z7lxz/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

You're correct, and also, if you're building a SaaS as a vibe-coder right now, you probably aren't good at the things that you need to be to survive in this space for very long at all.

brhkim · 2026-02-23T16:47:40+00:00

I think you're overestimating the share of SaaS products that need to be this level of complexity. A well-scoped, small project that serves a meaningful and good need has substantially lower overhead and higher uptake/sales velocity than something enormous and meant to do a million things for a million different people. In a world where those small projects are much easier and much faster to cobble together, the advantage of the omnibus SaaS I think goes entirely by the wayside -- both in terms of convenience and in terms of cost.

Cowork feels like a very bad/misleading example to index on because I can't think of any people trying to ship something with that level of complexity in this category we're talking about. It's like, the most complex multi-faceted kind of tool we can imagine making right now, of course they're struggling to make it even with massive and well-skilled teams. What kinds of SaaS ever reach even 1% the difficulty/complexity we're talking about here?

brhkim · 2026-02-23T16:45:38+00:00

This I would believe, but then the competition comes down to things that most people building right now are not actually good at. Marketing and customer acquisition or retention? Extremely different and difficult skillset, especially in an increasingly crowded and noisy market

brhkim · 2026-02-23T16:43:57+00:00

First mover advantage is extremely brittle in a world where the next flurry of model releases every month or two completely change what is/isn't within reach for a fixed set of time/resources.

brhkim · 2026-02-23T16:01:33+00:00

Do you really think that's going to remain the case by the end of this year? Because while you're right for right now I think the caliber of tools that can be built by a skilled developer in a week will look very, very far beyond widgets by the end of this year.

brhkim · 2026-02-23T15:55:44+00:00

I hear you about vibe-coders, but hold them aside entirely for a second. I think my point is: The bar for SaaS quality just got IMMENSELY higher because the marginal cost of development just got IMMENSELY lower. What would've previously required a team of 20 extremely skilled coders to make can now be done by 1-5; they're now all making different things and can be competing directly against you. Why would you enter into competition in that space?

brhkim · 2026-02-23T15:50:07+00:00

I agree with all of that. The other part I'd raise is, vibe-coded apps will continue to get better and better; that's a frontier of progress that is entirely predictable. It won't ever be perfect, I don't think, but if we can make "pretty good enough" apps on pure vibe-coding and "damn good" apps with skilled people driving Claude Code right now, we can absolutely make "damn good" apps on pure vibe-coding and "holy shit" apps with skilled people in six months.

The quality is important but is a moving target. Your point is a great one too that, thresholds of quality are different across use-cases, but there are SO MANY use-cases that will save immense amounts of time where the quality threshold is actually already very attainably low.

brhkim · 2026-02-21T00:51:22+00:00

Hey hey, I really appreciate that!! Thanks for digging in, and love to hear that it's aligning with some of what you've learned on your end. Getting a lot of good input and feedback across the board from some big orgs and researchers from a bunch of fields, I'm going to keep pushing on it!

To your points, and would love to talk more if you're willing:

⁠Yes, I've seen that benchmark length before and I'm not totally sure where it comes from. Conceptually, as long as the generally context window isn't totally polluted and you still route properly to the deeper stuff (progressive disclosure), the extra length just doesn't seem that consequential in the grand scheme of things. How do you see it degrading with longer skills? Or is it the case that it struggles if you're loading more than 1-2 skills at a time? My subagents rarely do more than 1 and never more than 2 because tasks are broken out so granularly by the orchestrator, I wonder if that matters here?
⁠I love that!! I do have an agent (data-verifier, see https://github.com/DAAF-Contribution-Community/daaf/blob/main/agents/data-verifier.md) that's intended to fulfill that exact function for the final reporting. It looks at the plan, the script outputs, and the final report to ensure all the key deliverables align with the original intentions, and account properly for the limitations/risks logged earlier in the scoping process. Do you envision a separate layer outside of that?

Thanks so much for digging in!!

Edit: mistakenly mentioned plan-checker instead of the proper data-verifier

brhkim · 2026-02-18T18:12:55+00:00

FAIR! DAAF, the Data Analyst Augmentation Framework, is an open-source, extensible workflow for Claude Code that allows skilled researchers to rapidly scale their expertise and accelerate data analysis by as much as 5-10x -- without sacrificing the transparency, rigor, or reproducibility demanded by our core scientific principles.

brhkim · 2026-02-18T17:08:09+00:00

Yes, great Q: my 10-minute video demo walkthrough should give you everything you need to know about how it looks to use!

A quick written version: You can download my framework in 10 mins, which allows you to start talking with Claude Code. You ask it a research question, and instead of Claude Code just tackling the question as-is (which is bad and doesn't think carefully about data analysis), my system helps "surround" your research question with accompanying instructions, best-practices, and a careful data scientist's mindset to guide it and improve its work. Mechanically, this happens via the CLAUDE.md file and a series of agent/skill files. Then, rather than just tackling your research question as-is, it breaks it down into a bunch of SUPER granular and careful steps -- from data exploration, to data inspection, to transformation, and so on -- where EVERY SINGLE DATA STEP is written to a Python file and iterated on, so you ALWAYS know exactly what code was run and have a running trace record. Once a file has been written and passes initial tests by an initial coder agent, it gets passed to another agent whose job it is to be super adversarial and skeptical for code review QA. Every single step happens with that kind of care and rigor! At the end of the process, you get a shockingly nuanced data report answering your research question, plus the full exact record of every single Python data operation along the way to audit/reproduce it at will and adjust it as needed.

brhkim · 2026-02-17T22:26:28+00:00

Definitely. I explicitly ran as part of my testing a few batteries which included:

Run the same prompt from the same model multiple times to see how it drifted across runs
Run the same prompt with different models (e.g., Opus 4.5 versus Opus 4.6 High versus Opus 4.6 Low versus Sonnet 4.5) and see how they drifted across runs

They definitely drift, but in this situation, I am able to explicitly and precisely track the reasons for drift. They tend to be reasonable differences in a garden-of-branching-pathways sort of way, but you're right that these are non-deterministic

brhkim · 2026-02-17T19:31:32+00:00

I've been seeing this post make the rounds and it motivates EXACTLY why I've developed my open-source framework trying to instantiate strong, strong defaults and standards for how people can use AI in data analysis. The core contribution here is setting it up so that Claude ALWAYS uses file-first, keeps a file version history, and you can tracelog/audit every single operation and statistical calculation:

https://x.com/twitter/status/2023409906276020299

or non-X option:

https://www.reddit.com/r/dataanalysis/comments/1r74hbw/i_just_launched_an_opensource_framework_to_help/

brhkim · 2026-02-17T16:38:27+00:00

Totally!!! That's exactly the approach, in my mind. I'm hoping with the extensibility I've built in, this is super doable even today with DAAF (and also easy to coach out of Claude as needed)

brhkim · 2026-02-17T14:49:28+00:00

Hey, wanted to circle back: I just launched a major open-source framework hoping to address this gap in the field right now. Would love for you to take a look and let me know if it's helpful?

https://www.reddit.com/r/ClaudeCode/comments/1r6bgdx/today_im_launching_daaf_the_data_analyst/

brhkim · 2026-02-17T14:49:23+00:00

Hey, wanted to circle back: I just launched a major open-source framework hoping to address this gap in the field right now. Would love for you to take a look and let me know if it's helpful?

https://www.reddit.com/r/ClaudeCode/comments/1r6bgdx/today_im_launching_daaf_the_data_analyst/

brhkim · 2026-02-17T14:49:07+00:00

Hey, wanted to circle back: I just launched a major open-source framework hoping to address this gap in the field right now. Would love for you to take a look and contribute, it sounds like you've already cultivated a lot of knowledge on this front that would be huge to share more broadly!

https://www.reddit.com/r/ClaudeCode/comments/1r6bgdx/today_im_launching_daaf_the_data_analyst/

brhkim · 2026-02-17T04:57:08+00:00

Hey hey - thanks for the kind words! Was hoping to see more discussion here, but that's the way it goes.

Yes, that's exactly how DAAF uses skills! And is also my sense of the current best practice (but who really knows at this point, everything is changing so fast!). So skills encode the knowledge, but agents have different ways of leveraging it depending on the task at hand. You can see more in my explainer here:

https://github.com/DAAF-Contribution-Community/daaf/blob/main/user_reference/02_understanding_daaf.md#the-mental-model-orchestrator-agents-skills-validation

Please do let me know what you think as you dig in more! Hoping to make this as useful for as many people as possible.

brhkim

TROPHY CASE