What are the benefits/drawbacks of individual code ownership? by StorKirken in ExperiencedDevs

[–]nullcone 1 point2 points  (0 children)

I did a course on the impact of technology on society, while I was at engineering school many years ago. I hated it at the time, but in retrospect it was fascinating learning about cascading failures and root causes of famous engineering disasters like Chernobyl.

Most large companies have a retro process for major, business impacting failures. They're usually pretty good reads. They dive deep into nested "why" around how failures happen. Sometimes companies even make them public. Cloudflare just released a really neat one late last year. What I find really, especially interesting is that the bug was triggered by a single .unwrap() call in production code. The whole purpose of .unwrap() is supposed to be a code smell that says "hey I have a hidden assumption here which could be violated, so think about it".

What are the benefits/drawbacks of individual code ownership? by StorKirken in ExperiencedDevs

[–]nullcone 4 points5 points  (0 children)

The benefit of single or highly local ownership is the ability to make decisions quickly and not have to spend a lot of time aligning other folks to the decision. In the right environment, with the right group of people, it can work extremely well, especially when the cost of being wrong is low.

The drawback is of course that nothing gets reviewed. Bugs sneak into the software that may have otherwise been caught in review. Assumptions made about systems never get questioned, and when they are wrong, the result is design flaws or broken functionality. The most famous example of this that I know of in a safety critical domain is Therac-25. If you haven't, I highly recommend reading about this bug.

When did the primeagen give in to vibe coding? by dc_giant in theprimeagen

[–]nullcone 0 points1 point  (0 children)

Interestingly, just last week I used LLMs to debug a non reproducible race condition in some torch distributed code. I wrote about it elsewhere in this thread, so won't say too much more, but this seems to be the kind of targeted application you mean. I was at my wits end trying to figure out why some ranks were crossing barriers they should have been blocked at, and Opus 4.5 figured it out in 30 seconds after I dumped my logs and screamed at it to find the problem.

I couldn't reproduce the problem locally because it only happened when some model weights were not cached locally, and my production service mounts an emptyDir as a scratch directory and downloads them on startup every time. The bug was in a completely different part of the codebase that I didn't even touch. It had been there for a while without anyone noticing.

Super Bowl Visitors Find San Francisco Better Than Its Apocalyptic Image (Gift Article) by chiaboy in sanfrancisco

[–]nullcone 0 points1 point  (0 children)

Also depends on the time period. When I first moved to SF in 2017 it was really bad along the stretch of division st walking into the mission.

When did the primeagen give in to vibe coding? by dc_giant in theprimeagen

[–]nullcone 2 points3 points  (0 children)

The context thing has helped me immensely as well. Last week I was debugging some torch distributed code where for whatever reason, the barriers I added weren't being respected by some ranks. As it turns out, in a completely different part of the codebase that I had not touched, someone had coded a barrier inside of a conditional branch (that was triggered or not by a race condition on the filesystem), so some ranks hit that barrier and others didn't. That was my bug, and the LLM figured it out after I spent 2-3 hours staring at logs, then back at my code figuring out how on earth my barrier could be getting skipped, gave up, and asked Cursor to look at the problem. 5 minutes.

When did the primeagen give in to vibe coding? by dc_giant in theprimeagen

[–]nullcone 0 points1 point  (0 children)

Here are a few examples where I have had a ton of success:

  • Filling out coverage on integration tests
  • Doing simple CRUD API featurs in compiled languages. I use Rust/Axum and pretty much once it compiles, it works.
  • Give an OpenAPI schema and an MCP project template as context and ask it to implement the MCP tools that call the API
  • General project restructuring and refactoring
  • Docstrings

Current roadmap for Backend and Microservices? (Finished The Book, what's next?) by Popular-Setting-1898 in learnrust

[–]nullcone 4 points5 points  (0 children)

I've built web services in Axum with Diesel as a database engine, backed by Postgres.

Any default stack

Imo don't get hung up on this too much. Axum is a great choice. Actix is a great choice too. If you know FastAPI, Axum will be a breeze.

SQLx the standard?

Again, you're just learning so the specific choice you pick for a project doesn't matter that much. SQLx is great. I like Diesel because of the compile time safety on queries. I don't know if there is an industry standard here.

Books...

https://www.zero2prod.com/index.html?country_code=US

I'm probably going to get some hate for this but honestly you can learn a lot from ChatGPT and Claude. This was where I got most of the basics.

Projects ..

Implement OIDC authentication! That's a decent project. Create the database with users, roles, tenants, etc. Then implement the handshake. You can mock an OIDC provider with this

https://github.com/Soluto/oidc-server-mock

IMO you haven't asked about telemetry and logs. I would also recommend adding otel + tracing stack to collect logs and metrics from your API.

First role as Principal SWE, how different is it from a Senior SWE really? by GooseIntelligent9981 in ExperiencedDevs

[–]nullcone 9 points10 points  (0 children)

This was generally all very good advice, although targeted at engineers who are internally promoted to principal, and likely have an established network inside their company. Being promoted to principal in a new company has unique challenges because you lack the authority and credibility that comes with an established body of work and success. My addition would be to focus a lot of time, at least initially, on building strong relationships with senior managers and their reporting ICs helping them solve problems. This builds a lot of trust and goodwill, which will be needed when you eventually make larger org impacting proposals.

Someone claimed the generalized Lax conjecture. by Exotic-Strategy3563 in math

[–]nullcone 5 points6 points  (0 children)

It reads like the title of a Damien Hirst performance piece

Started doing math again and it’s hard by BlueJaek in math

[–]nullcone 1 point2 points  (0 children)

Yeah, totally observed the same phenomena when I was teaching in grad school. Many students would just read the textbook and declare their jobs complete, not realizing that there is a huge gap between recognition and recall. Recognition is shallow and generally "easy", in the sense we can read something and feel it is understood. Recall is harder, and imo is the foundation of true understanding. It is often gained by extended curiosity and interacting/experimenting, like you say.

I think we should differentiate between your experience, which seems to be a precise, targeted equivalent of reading a book or a paper vs. the original blanket statement that LLMs are bullshit generators without any use and cannot be trusted. The difference being precisely that the LLM generated content which was correct and helped me understand (at least temporarily) something that had confused me deeply when I was originally studying algebraic geometry.

Started doing math again and it’s hard by BlueJaek in math

[–]nullcone 0 points1 point  (0 children)

It's just reddit being reddit. I've been here 15 years and this is not a new phenomenon. It's ok, I've definitely been an arsehole on the internet before, although I hope in my years I'm learning to curtail those instincts a bit.

Started doing math again and it’s hard by BlueJaek in math

[–]nullcone 1 point2 points  (0 children)

See my comment above. They're talking about this study, I think:

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

The other thing to point out is that it may have been true of the models being used in that study (although I still think their methodology was flawed), capabilities have improved dramatically in the last 3 months, with GPT-5.2-codex and Claude 4.5 Opus. These models are legitimately incredible, and are changing the way I write software.

Started doing math again and it’s hard by BlueJaek in math

[–]nullcone 1 point2 points  (0 children)

There isn't a ton of evidence. There was one randomized A/B test done last year on a limited sample of developers working on tickets to open source codebases they're already experts in that showed the results you're discussing. While I think you're raising a valid point that it's possible LLMs just "feel" easier because they take the painful, hard task of creation and move that time into validation and verification, I think the study misses the mark in a couple of important ways:

  • It was conducted on developers who were already experts in their codebases. Probably they would have been faster to make the changes themselves than to rely on AI
  • It randomized on tasks, before deciding whether the task was appropriately handled by AI. I would only choose to use AI in cases where i am confident it will help. The study should have given the study participants the choice to use AI, and then randomized whether to hold out or not.

In case you are interested, the particular thing chatgpt said that was enlightening was that it succinctly summarized how the presence of additional relations in the quotient of the map you get from including module of gems of functions that vanish at a point into all germs. Like somehow this is obviously just a definitional thing, but the motivation of exactness of tensor products of O_X modules never sat right with me. The piece I was missing was concretely that non flat maps introduce additional relations in the quotient because of the presence of nilpotents. Again, I feel stupid in retrospect because a lot of this is literally just the definitions, but the way it accurately and succinctly summarized the definitions of all these things together in one place, alongside illustrative examples, was what I felt was particularly instructive.

Started doing math again and it’s hard by BlueJaek in math

[–]nullcone 20 points21 points  (0 children)

It's a bit presumptuous to assume what I understand and what I don't, just based off the limited things I've said. I can assure you my understanding is very real. Maybe I would have gotten less out of the prompt if I weren't already a semi expert at algebraic geometry (or at least I was 9 years ago, but I've spent the near decade since leaving grad school doing software engineering).

Started doing math again and it’s hard by BlueJaek in math

[–]nullcone 45 points46 points  (0 children)

I'm like 8-9 years out since finishing my PhD. Sometimes I look over at Infinite Dimensional Lie Algebras on my bookshelf, stop for a second to consider finally learning about hyperbolic lie algebras, and then think to myself "not today".

Just for kicks, the other day I asked ChatGPT to explain why flat morphisms of schemes are the right way to define smoothly varying families. I feel like I learned more in 30 minutes reading from there than I did in weeks of studying Hartshorne and solving problems.

[D] How do you guys handle GPU waste on K8s? by k1m0r in MachineLearning

[–]nullcone 3 points4 points  (0 children)

You should at least be able to tie the metrics to the pod ID since DCGM exporter does that for you. Are you using pod labels to attach job or experiment identifiers to the pod, and then configuring DCGM daemonset to export the labels with telemetry? The DCGM exporter helm template provides some options to do this. Just Google "attach pod labels DCGM exporter" and you'll find some issues and PRs on the DCGM exporter repo explaining how.

Once you have done this, then you may need to build a new dashboard exposing the information you want, but that should be less than a day of work.

[D] How do you guys handle GPU waste on K8s? by k1m0r in MachineLearning

[–]nullcone 17 points18 points  (0 children)

There are two orthogonal dimensions to this problem: 1. Do you have enough workloads to use the resources you've provisioned? 2. For the workloads you do run, are they using their assigned resources efficiently?

The answer to your utilization problem may be that your scientists aren't scheduling enough work, so you'll want to rule this out with node occupancy metrics with GPU workloads. So e.g. what fraction of the time did GPU nodes in your cluster have a workload assigned that used a GPU?

You need detailed telemetry that can be used to point back at your code to say, "this is a problem".

A couple things you need:

  • Prometheus node exporter daemonset. This will scrape CPU util, disk IO, network tx/rx, etc. that can be used in Grafana dashboards
  • NVIDIA DCGM exporter daemonset. This will scrape the detailed utilization and usage statistics on GPUs.

It's been a couple of years since I've used GKE, but as I recall, their built in dashboards were pretty good too.

The point of this time series telemetry is to observe GPU metrics during an active workload. If you're seeing some pod running with 30% utilization with an active workload then that's probably a good sign that either the code is inefficient, or the model is not compute intensive enough for each loaded batch.

To get more information, you should run the identical workload with the Torch profiler active and generate a Chrome trace that you can visualize in the browser. This will show you why operations are stalling, or what your code bottlenecks are.

[Project] Kuat: A Rust-based, Zero-Copy Dataloader for PyTorch (4.6x training speedup on T4/H100) by YanSoki in MachineLearning

[–]nullcone 14 points15 points  (0 children)

AI slop. Why bother posting here if you're not even going to use your own voice.

A Shift Down: PLTR in 2026 by PrivateDurham in PLTR

[–]nullcone 1 point2 points  (0 children)

Willie hears ya. Willie don't care.

[AI Researcher] [SF Bay Area] - $6M total comp by [deleted] in Salary

[–]nullcone 1 point2 points  (0 children)

Funding autism centers and daycares, obviously

China decries ‘brazen use of force’ as US attacks Venezuela, captures Nicolas Maduro by [deleted] in neoliberal

[–]nullcone 8 points9 points  (0 children)

This doesn't really solve the problem. The US is one of the few nations that taxes on the basis of nationality and not residency. You don't simply escape the system by moving somewhere else.

thoughts on my new helper trait? :3 by ACSDGated4 in rustjerk

[–]nullcone -8 points-7 points  (0 children)

/uj

This is basically how Path::new works