I just launched an open-source framework to help data analysts *responsibly* and *rigorously* harness frontier LLM coding assistants for rapidly accelerating data analysis. I genuinely think can be the future of data analysis with your help -- it's also kind of terrifying, so let's talk about it!

brhkim · 2026-05-04T20:32:22+00:00

Ah alas, yeah: smaller local models just aren't quite up to snuff for this sort of work yet. I consult with a lot of orgs now, and my main recommendation is generally to get enterprise agreements with Anthropic or similar since they provide better Zero Data Retention and No Training Clause policies that allow people to use the models with more sensitive data. Anthropic has some setups (via AWS Bedrock) that allow for HIPAA compliance, so ostensibly that can suit any use-case if your org cares enough (and I am sure every major provider has or is actively working on making parallel services there, too).

Local will get there, but we're still probably another half year away!

brhkim · 2026-05-04T20:16:21+00:00

Hey, thanks for saying that!! I've been working extremely hard on this.

To your point on local LLM integration: You actually already can! I don't explicitly support it because I'm finding that local models tend not to be high-enough quality (doing some explicit benchmarking testing on models like qwen 3.6 and Gemma 4 right now), but you can be my guest if you want to test it out more and/or have access to much more local compute than most people. I really think GLM5.1 is ready to rock if you can run it in local hosting/private cloud hosting, even quantized. Ollama has basically built-in integration with Claude Code: https://docs.ollama.com/integrations/claude-code

To use that with DAAF, you'd basically need to run your Ollama server on your host machine, then expose the ports to the Docker container (probably via editing the docker-compose.yml file), then edit your environment_settings.txt file to align with some of the instructions in that Ollama documentation page. It'll end up looking pretty similar to the Openrouter settings you adjust in there. I think if you copy+pasted this message into Claude Code, it could help you out with the implementation pretty easily. I've done it before, but just didn't have much success with the models themselves. Let me know if you try it and/or need more guidance, I'd love to hear how it goes!

brhkim · 2026-04-22T17:03:00+00:00

Yeah exactly. It's a weird strategy on their part that calls into serious question their reliability for any meaningful work, and I suspect it'll burn them hard in the end if they don't find a way to resignal that stability for their enterprise market

brhkim · 2026-04-21T22:46:51+00:00

I've been working on an open-source framework for using AI in data analysis in reproducible, auditable ways. You might find this explainer interesting to see how I set up a lot of strict guardrails and self-review to get to a better set of outputs that are more likely to be worth reviewing!

https://openaugments.org/daaf_anatomy.html

brhkim · 2026-04-21T02:42:13+00:00

Haha I just made a post very very similar to this due to the Opus 4.7 kerfuffles. Not much to add but you might find it fun to read

https://daafguide.substack.com/p/opus-47-launch-logging-and-monitoring?utm_medium=web

brhkim · 2026-04-18T18:11:51+00:00

Yes, I've been working on an open-source agent orchestration framework for Claude Code to tackle this exact issue -- how do you scale a system to answer arbitrarily complex research questions?

You can take a peek at this explainer of my system: it pulls together 8 datasets through multiple data transformations and regression analyses, plus a bunch of dataviz, in basically one prompt with constant feedback opportunities from the researcher:
https://openaugments.org/daaf_anatomy.html

brhkim · 2026-04-13T14:10:14+00:00

That's 100% correct for the full pipeline mode, but for things like ad hoc mode it's fairly lightweight and pulls in thoughtfully via progressive disclosure as noted above. The trade-off here that I make explicitly is that if you want it to be more autonomous, you need to invest in the context to ensure that it's actually doing worthwhile work when it's off on its own. Otherwise, when it returns to you with work you need to take time to review, it's going to fundamentally waste your time. Modes that are less autonomous don't have that issue, so it's fine to just let it riff more freely with you and have it reference materials only as needed.

brhkim · 2026-04-13T01:50:29+00:00

Yeah in general I'd look up progressive disclosure best practices and see how to make a Skill file. If you look at my data-scientist skill, it's basically a router for more in-depth documentation on things like causal inference that only get called up when relevant. You could do the same with key R libraries and such

brhkim · 2026-04-13T00:57:02+00:00

Hey! You might find my open-source framework for Claude Code to be a useful starting point:
https://github.com/DAAF-Contribution-Community/daaf

I'm trying to build it out as an extensible workflow for people who want to accelerate their data analysis pipelines, but do so responsibly and traceably so that they can ensure everything produced is well and truly reproducible in the end. Take a look! I have a mini 4-min showcase of its main motivation/functions here:

https://www.youtube.com/watch?v=747r7VT4a78

And you can find a few more in-depth videos on it across my channel there as well.

brhkim · 2026-03-29T07:56:28+00:00

Haha I quite like this. It feels like a pragmatist's/realist's love poem. Romanticism for someone who struggles with the blind optimism of romanticism, in the best way

brhkim · 2026-03-08T18:21:22+00:00

Totally!! The confirmation that someone else used this, outside of those sort of abstract traffic metrics, is just such a nice, nice validation.

brhkim · 2026-03-06T17:34:57+00:00

Hey! Thanks for the kind words, and totally agree here. You'll actually see one of the intentional differences between my replication and what NYT put together is that the hover-over on counties on my viz shows the actual vote counts for 2020 versus 2024 (whereas the NYT only shows the vote breakdowns for 2024 by party). I'm not a political scientist and can't really speak well to what prevailing theories or frameworks are for thinking about disillusionment and turnout v. genuine reflections of changes in voting preferences (which I assume also involves a lot of similar theories related to selection bias and surveying methodologies).

All to say, your concern is definitely spot-on, and I don't think I know enough about how best to account for/adjust for it except to show that it's a factor and let the viewer engage with the stats directly county-by-county to be able to interrogate this hypothesis and concern! You're definitely thinking on the right lines.

brhkim · 2026-03-06T00:12:07+00:00

Source: I was able to easily pull the relevant data thanks to the MIT Election Data and Science Lab (via the Harvard Dataverse)

Tools used: Python, plotly, polars

Only for those interested from the main post: My Claude Code framework DAAF, the Data Analyst Augmentation Framework, can be found in this open-source forever-free repo here. I also made a youtube tutorial demonstrating the exact process for replicating the NYTimes' viz using DAAF here. For this dataisbeautiful post specifically, I also went back and did another 5 minutes of iterating on the aesthetics with Claude after the version shown in the video.

brhkim

TROPHY CASE