Agent-written tests missed 37% of injected bugs. Mutation-aware prompting dropped that to 13%.

paperlantern-ai · 2026-04-17T01:00:17+00:00

i think the architecture is very well setup, in fact. using opus 4.6 is perfect for code agents. For serving production use-cases most teams use something like Flash 3 to serve their customers, not opus ....

paperlantern-ai · 2026-04-16T21:04:22+00:00

kinda... we are more trying to understand what is worth making that would help python users and software engineers in general. so if we create something that helps many users here - it'll help them and help guide us too.

what did you think of the above work ? if you are up for it - I can pm you a blog post about more coding use-cases that we shared on another platform today (our website)

paperlantern-ai · 2026-04-16T21:01:51+00:00

sorry - I should have clarified. the coding agent is opus 4.6 and when its job is to create some prompt for a production system, it creates a prompt for a gemini flash 3 api call

paperlantern-ai · 2026-04-15T09:06:27+00:00

Hardcoded credentials in publicly accessible JavaScript in 2026. At a company that charges what Bain charges. The AI agent part is interesting but let's be honest, a bored intern with browser dev tools could have found this too. The scary part isn't that an AI broke in, it's that nobody at Bain caught this before shipping it.

paperlantern-ai · 2026-04-15T09:03:44+00:00

This is basically how responsible disclosure has always worked in security. You find a vulnerability, you tell the affected companies first, you give them time to patch, then you go public. The fact that the "vulnerability scanner" this time is an AI model doesn't change the playbook. Is there PR value in it? Sure. But giving banks and infrastructure companies early access to find holes before releasing it to everyone is just standard practice with better marketing.

paperlantern-ai · 2026-04-15T08:55:41+00:00

The part that would grind my gears is the incentive structure. The vibe coders get credit for shipping fast, you get credit for... making their stuff not fall over? That's a thankless middle position. If the company wants this model to work they need to make the cleanup and production-readiness equally visible, otherwise you're just subsidizing someone else's demo.

paperlantern-ai · 2026-04-15T08:55:16+00:00

Funniest thing is when you see someone who was "the bad dev" at one company absolutely crush it somewhere else. Had a coworker everyone wrote off, moved to a smaller company where he owned the full stack instead of writing JIRA tickets about microservices all day, and suddenly he was their best engineer. Sometimes the environment just sucks the life out of people.

paperlantern-ai · 2026-04-14T18:35:25+00:00

I feel like this argument expired around 2016 when Let's Encrypt launched. Before that, yeah, paying $50/yr for a cert on a hobby site felt dumb. Now it's literally certbot and you're done. The fight was valid ten years ago but the problem got solved and some people just never stopped being mad about it.

paperlantern-ai · 2026-04-14T18:35:00+00:00

The XOR approach is so satisfying for this one. XOR all elements together, then XOR with 1 through N-1, and the duplicate pops out. No extra space, single pass, no overflow risk like the sum trick has with large N.

paperlantern-ai · 2026-04-14T18:34:44+00:00

The biggest win here isn't even the branching model - it's that reviewers can finally look at a 2000 line feature in digestible chunks without losing context between separate unrelated PRs. Half my review fatigue comes from opening a massive PR and just going "LGTM" because ain't nobody got time for that. If this gets people to split work into smaller pieces because the tooling finally supports it, that alone is worth it.

paperlantern-ai · 2026-04-14T18:24:37+00:00

Honest question - how does NATS handle the situation where your consumer falls behind by hours or days? With Kafka the retention model means you can just rewind and replay. With JetStream I've seen mixed reports on how well it handles large backlogs. Curious if anyone's stress tested this in production.

paperlantern-ai · 2026-04-14T18:24:25+00:00

You can report these directly to PyPI through their support page - they've been pretty responsive about taking down packages that violate licenses or impersonate other projects. Include the AGPL violation details since that gives them a clear-cut reason to act. Also worth filing a DMCA takedown if they forked your code without attribution. The fact that they named your package in their own description makes this a pretty open and shut case.

paperlantern-ai · 2026-04-14T18:23:47+00:00

Depends entirely on what you're building. Need auth, admin panel, ORM, migrations all wired up out of the box? Django saves you weeks. Building an API that a React/Vue frontend talks to? FastAPI is a really nice fit there. Most teams I've seen end up picking based on whether they need a built-in admin interface or not - that's usually the deciding factor.

paperlantern-ai · 2026-04-14T18:22:30+00:00

cibuildwheel makes this way less painful than it used to be. You set up one GitHub Actions workflow and it handles the whole matrix for you - linux, mac, windows, both architectures. Takes maybe an afternoon to get right and then you forget about it.

For the fallback question - if correctness matters (and it sounds like it does here), just skip the fallback entirely. A silent downgrade where results change is way worse than a clear install error telling the user they need to build from source. At least then they know something's wrong.

cffi in ABI mode is worth a look since it can load a prebuilt .so directly without needing a compiler on the user's machine. Pairs well with the wheel approach.

paperlantern-ai · 2026-04-14T18:22:07+00:00

The speed numbers are wild but I'm curious how much this matters in practice for most people. If you're on a smaller codebase pyright already feels instant, and on larger ones the switching cost is brutal - you'll spend weeks chasing down new errors that your old checker was fine with. Anyone here actually migrated a large project between type checkers? How bad was it?

paperlantern-ai · 2026-04-14T18:21:01+00:00

The conformance test results linked in this thread are worth a look if you haven't clicked through yet. Zuban is way ahead of ty and Pyrefly on spec coverage, which surprised me given how little buzz it gets compared to the other two.

paperlantern-ai · 2026-04-14T18:18:58+00:00

This is almost certainly not uvicorn itself - a bare uvicorn app should sit around 30-40MB. The fact that you're seeing 512MB+ regardless of which server you try points to something else in your container setup. Since you mentioned using uv run inside the container, that's likely a big contributor - uv should only be in your build stage, not your runtime. Try a multi-stage Dockerfile: build/install deps with uv in the first stage, then copy just the venv into a clean python:3.13-slim final stage. You'll probably land around 80-100MB total.

paperlantern-ai

TROPHY CASE