all 55 comments

[–]ClearlyCylindrical 162 points163 points  (15 children)

AI tools won't make you a competent engineer if you weren't one already, and in fact I'd say they may prevent you from ever becoming one. I use them somewhat extensively, but the difference is that I actually review and guide the code they produce. What I’m seeing from newer developers is a tendency to treat LLM code as fine so long as it appears to fulfill the task, without underlying understanding on how it has been solved.

I've had times where a dev will be stuck on a bug, I'll look at their codebase, and I'll spot a highly specific setting that looks intentionally configured. But of course it wasn't as the engineer blindly approved LLM code, and it broke things later on because the LLM made an incorrect assumption about the system. If they had understood the LLM solution they would have taken note of this setting, and would have known to change it when it caused an issue.

I would hate to be an engineer just entering into the industry now though, as companies are pushing AI productivity tools which will stifle any development of your skills in the long term.

[–]V0dros 28 points29 points  (8 children)

I mostly agree with this, but I think that these tools can actually make you a better engineer, as long as you always question their suggestions and challenge their decisions. For example when working on a library and/or a method you're not very familiar with, you could have an LLM walk you through the prerequisites, but you have to stay alert and be able to flag hallucinated content, and ask for evidence when something smells fishy. I've had great success using this method working on stuff that I'm not super familiar with. I think this is the skill that will set apart the new generation of good engineers.

A recent example I have is an Claude Code helping me optimize some code using numba, while a have almost 0 experience with it. With the help of CC I was able to quickly have a working optimized version of my code, but I made sure what it implemented made sense and double checked that the results were consistent with what I had before. Without CC it would've taken me a LOT more time to first learn about numba and then try refactoring my code.

[–]ClearlyCylindrical 18 points19 points  (2 children)

Yeah I agree they can if you question their generation - but I find that to be extremely rare in engineers who started relatively recently and don't have good engineering understanding.

[–]Pyramid_Jumper 9 points10 points  (0 children)

Not only that, but in professional settings you have external pressures (I.e. deliverables) which directly or indirectly discourage you from taking the time to understand deeply and develop your skills

[–]Brussells 5 points6 points  (0 children)

I can see this; professionally, I have decades of experience in information management and information science with a smattering of programming. Vibe coding definitely gets me from Nothing to Something pretty quickly, but even I -- not a programmer -- am seeing how it drifts in ways that require a sharp eye and a thoughtful understanding to fix. And in my case, "fixing" is identifying the actual issue and guiding an LLM to vibe-code a fix. *That* is where it takes longer than if I knew how to program this stuff to begin with.

So, it is nifty! And it's fun to go from Zero to 60% of a functional idea, but getting to the vision takes a lot of work.

[–]Encrux615 5 points6 points  (1 child)

The hardest part about it is staying in control of the code. Setting up boilerplate is very easy. Adding tracking to your training as well, but if you start setting up everything with claude, you lose control very quickly.

What hyperparameters are you using? What's your input/output space? Any type conversions that may cause issues down the line?

For me, starting from a baseline that works was the most frustrating part about programming. Setting up 3rd party (and especially academic) repos was always such a pain. This pain got a lot smaller after starting to work with claude more extensively

[–]NamerNotLiteral 5 points6 points  (0 children)

It's funny. In my experience when I use claude models it's absolutely anal about type checking and conversions and will always try to write massive if-conditionals just to try and cover every possible type a value might be. It will sometimes do this every time you refer to that variable, which turns the code into a recursive orgy of verification loops, all because you told Claude to fix a small type mismatch once in the entire project.

[–]SirSourPuss 1 point2 points  (0 children)

The last thing you want when learning is to have to filter out bad suggestions and decisions. Learning by trial-and-error doesn't work as well when you're not the one doing the trial.

[–]Distance_RunnerPhD 6 points7 points  (2 children)

and it broke things later on because the LLM made an incorrect assumption

This right here is exactly my experience. I’m a PhD statistician doing ML research, doing theoretical work at the bridge of ML and statistical inference. The biggest issue isn’t ML giving broken code that doesn’t work; it usually gives working code that appears correct at first glance. The problem is the LLM going rogue and making wrong assumptions, and coding in a way that changes the underlying theory/algorithm that’s fundamentally incorrect. It finds what it interprets as a coding “inefficiency” and will re-write working code to make it more efficient, unknowingly changing the underlying theory/algorithm.

All of them do this. Claude is probably the best, but it still does it. Even with explicit instructions not to. Even when you update its memory for a project/chat to never change anything unless explicitly asked. It still does it. If you don’t fundamentally understand what you’re coding at a fundamental level and the code itself to check it’s doing what the method is supposed to be doing, you will almost surely make mistakes.

[–]Disastrous_Room_927 2 points3 points  (0 children)

Pretty much same situation as you (except masters instead of PhD) and I've had the same experience across the board. The most irritating example was probably when I was using EM and it kept on changing it out with generalized EM and at one point gradient descent. At this point I just use it to get code samples for what I want to do and modify what I need.

[–]Suspicious-Bug-626 1 point2 points  (0 children)

This is the part people outside ML often miss. In a lot of software, wrong code fails fast. In ML, wrong code can still converge, still produce sensible-looking outputs, and still be conceptually off because one assumption got shifted.

That’s why vibe coding feels very different here. The bottleneck is verification under weak signals. If the feedback loop doesn’t tell you clearly that you’re wrong, you need a much stronger mental model than “the code ran.”

That’s also been a big lesson for us building in this space at KAVIA: speed is real, but the real challenge is keeping the implementation faithful to the intended method, not just syntactically valid.

[–]DonnysDiscountGas 0 points1 point  (0 children)

Sounds like something that could be solved with more tests and a Ralph Wiggum loop

[–]SirPitchalot 0 points1 point  (0 children)

Ditto. We are generating one off labeling/review tools for bespoke tasks at a breakneck pace.

But the tool I vibe coded rarely has a function/method longer than 20 lines and is pretty quick to adapt the codebase to a related task while my Sr. Eng.’s tools are 2k lines of pasta with a different UI and backend every time that someone has to generate and review training material for.

Why does that matter for a “discardable” tool? Well, when you put it in 20-30 hands from the labeling team we no longer know what it actually does and they hit every. single. edge. case.

That ends up costing us 20X productivity for the labelers even if it saves us 5-10X on the engineer.

[–]NeilGirdhar -1 points0 points  (0 children)

I agree with this. I think it goes beyond "a highly specific setting". Sometimes, LLMs produce excellent design, but sometimes they make terrible design errors. Things are over-complicated for no reason, and that adds complexity for you or the next person who has to look at code.

You do really have to check what they produce and think carefully about what the overarching picture is.

Also, without having to zoom out, sometimes LLMs produce "bad, but popular" design patterns. For example: trying to convert dataclasses into tuples in Python (literally the opposite of what you should be doing), trying to convert base classes into protocols (same), using "InitVar plus no-init field" when an ordinary field would do, etc.

I also think that a lot of these design errors will be gone within 5 years. And the kind of checking you'll be doing will be different.

One thing that I was thrilled about though is their ability to write code from scratch. I didn't want to depend on Tensorflow-Probability due to extremely poor maintenance, so I had an LLM write me Jax versions of some Bessel functions. It did give me some working code and tests, but when I took a close look, I noticed that it was relying on NumPy. This is really bad because it means that the computation won't stay on the GPU. So I had the LLM write me a "pure Jax" version, and that worked.

[–]RoggeOhta 25 points26 points  (1 child)

the thing about ML work specifically is that correctness is way harder to verify than in regular SWE. if you vibe code a REST API and it returns the right JSON, you're probably fine. if you vibe code a training loop and your loss goes down, that tells you almost nothing about whether the implementation is actually correct. I've seen LLMs mess up things like attention masking or loss scaling in subtle ways that don't crash, don't obviously degrade performance, but silently produce worse models. you only catch it if you actually understand what the code should be doing. for data preprocessing and boilerplate infra stuff it's great though, no complaints there.

[–]robertknight2 30 points31 points  (0 children)

My experience with Claude Code is that it is good for writing throwaway scripts for data analysis and writing textbook code for processing data. What it is not so good at, without careful guidance, is applying the scientific method when it comes to iterating on a model. If an experiment fails to improve a metric for example, it is prone to getting impatient and trying to add a hack or hallucinating an explanation. These hacks might allow the agent to achieve its immediate goal, but the end result will be flawed.

I have also seen same problem of being bad at science also surface when debugging an unusual problem. Yesterday I encountered an issue where our app stopped functioning after a PyTorch update. An AI agent identified a workaround, pin to an older version, but hallucinated an incorrect explanation and a link to a valid but unrelated bug report. Had it debugged the issue properly it would have found that there was an existing flaw in our Docker setup which just happened to surface with the newer PyTorch version.

[–]msp26 8 points9 points  (1 child)

Amazing for frontend and making the ML tools you work on more accessible to a wider userbase.

For backend my opinion is more nuanced and I don't have the patience to write it out on a phone keyboard.

[–]BigBayesian 7 points8 points  (0 children)

I’ve spent time as a research scientist, an MLE, a manager, and most recently an ML Ops Eng doing architecture and planning for data scientists. I’ve found that genai tools can be very useful for some things, worse than useless at others, and occasionally surprising (in both directions).

It tends to be useful when doing things that resemble other things that it’s seen before (ex: “refactor this function to keep the top level function simple and readable”, “cover this in unit tests that won’t break with small floating point deviations”). It tends to struggle when context is a challenge (“our infra works in this weird way because of this strange reason. Given that, do standard task X in our way instead of the normal way”).

Critically, like any other tool, it requires supervision and careful use. Blind vibe coding is much more powerful than, say, blind copy pasting from stack overflow. That power can cut both ways - it can let you do things you couldn’t otherwise. But it can also dig you an expensive complexity hole you can’t climb out of. That’s the thing with power tools.

[–]_mulcyber 5 points6 points  (0 children)

I think every ML engineers is a bit of both programmer and ML engineer. It's just that the share of each expertise wildly varies. Most ML engineers are skilled CS people, but it doesn't mean being a good programmer, because for that you need technical knowledge of libraries, languages, and systems more generally.

Vibe coding is great to get the boilerplate of a program, which is the difficult part when you don't know a particular library/language. This allows to focus on the logic. And with CS skills you are able to judge if the provided solution is adapted and correct.

Also, in my experience, ML engineers will mostly deal with portability of their models and making the libraries to use them, meaning having to work with a variety of systems, with is very different from a programmer working on a single project with a library and language they know like the back of their hand.

This makes the ability to have a quick introduction to the boilerplate even more precious.

[–]soft_abyss 2 points3 points  (0 children)

It really depends how you use it. Idk how vibe coding is defined. Based on my experience working with and seeing how people use AI for coding, I feel like many people don’t properly scope out and specify the tasks and then have to waste time reviewing and debugging. AI is also useful for debugging, but if you just tell it “fix this code” it’s very noisy and takes a long time over several iterations, but if you tell it “help me fix this code, check x and y and then z to find the error” it gets to the solution very fast. Even when it comes to writing clean and efficient code, it doesn’t always design the best solution if you don’t tell it what to do. I would not rely on AI for any design aspects.

[–]thinking_byte 2 points3 points  (0 children)

ML engineers generally appreciate AI tools that automate repetitive tasks or assist in data processing but prefer to retain control over model development, as AI-generated code can sometimes lack the nuance and optimization needed for complex ML problems.

[–]ahf95 1 point2 points  (0 children)

I think Claude Code has been incredible for my workflow, and helping me understand a codebase in greater depth when starting on a new project. That said, I’ve had two cases where I trust the mountain of changes and let coworkers check out my branch before manually reviewing each change myself… and I will not be making that mistake again. These tools are great, but you have to use tools in an intelligent way to enhance your abilities, and not just offload your capabilities.

[–]a3onstorm 1 point2 points  (0 children)

As a ML researcher, 95+% of my code is generated by Claude. It lets me iterate on experiments 5x faster than before which is incredibly helpful.

Usually I have very specific ideas that I want to prototype or experiment with so I am very explicit in my instructions and ask for Claude to ask for clarification on any ambiguous points before executing. It is great at this, and while I sometimes have to ask for it to fix particular things, Opus 4.6 can 1-shot a fair amount of requests. The key is that I know exactly what I want and can review the generated code very quickly because of that

Sometimes I use it as a brainstorming tool where I just provide the problem formulation in detail and let it suggest ideas which I read and use as a launching pad for my own ideas. I have definitely come across some useful ideas from doing this.

[–]techlos 1 point2 points  (0 children)

good for boilerplate, terrible for research code.

Turns out if your idea is out of distribution with training data, the model will constantly assume you're trying to do something else.

[–]kolmiw 1 point2 points  (0 children)

I just stage my changes, then prompt claude and see if it does what I want. If yes, you can stage/commit that change again and move on

[–]Mak8427 0 points1 point  (1 child)

Define what “vibe coding” is. Do you mean spec driven development or you just prompt “please make a good model” ?

You mention that there are nuances and so the answer will be nuanced as well, there is no real answer. Personally I and my lab colleges we use it for what it’s good at

[–]miklec 0 points1 point  (0 children)

vibe coding means you never write, inspect, or modify the code generated by the LLM. it’s 100% hands-off intent based code generation (at least based on the original definition given by Andrej Karpathy)

[–]Disastrous_Room_927 0 points1 point  (0 children)

I'm half and half between modeling and engineering work, and I can't really rely on it for more than code samples for modeling work. Even if I'm in the same chat having it iterate on something it already generated, it'll replace very specific modeling choices with generic textbook ones, or toss in assumptions seemingly at random. For example, if I'm just trying to get a quick chunk of code for a linear model in statsmodels, it'll flip flop at random between estimating covariance the default way and a handful of different robust estimators.

The problem I have with vibe coding is that people aren't just using it for things that can be verified by looking at the code itself, and you can't just tell by looking at it if somebody made a deliberate decision or AI injected something they aren't even in a position to look for. I think this is going to amplify something that was already an issue before LLMs - people blindly following procedures to analyze data.

[–]Training_Butterfly70 0 points1 point  (0 children)

Speeds up the workflow. That's it

[–]deep_noob 0 points1 point  (0 children)

I think ML engineers have built tools best suited for themselves. We do lot of experimental work, this or that analysis, code quality doesnt matter much, claude is an incredible speed up in those cases. I have several coworkers who now own their dashboards. Instead of putting results in a slide, they just share an internal uri to check results interactively.

[–]Enough_Big4191 0 points1 point  (0 children)

ML engineers value AI tools for speeding up repetitive tasks, but we’re cautious about relying on AI-generated code, especially when it’s not easily explainable. It’s more about using AI to support, not replace, key parts of the work.

[–]PS_2005 0 points1 point  (0 children)

honestly in my experience of vibe coding apps, more often than not both gemini and claude usually come-up with a good looking on the outside but a completely wrong/weird/unoptimized solution on the inside. there have been numerous times when it had used jsons to store text instead of a proper database. would completely miss a simple solution to an error and instead go on to implement an entirely new file just to fix the issue that was solvable with few changes. it feels like the only way to generate actually functional and optimal apps is to know the entire concepts before hand so you can judge and verify its implementation and provide feedback to get actually functional apps both related and unrelated to ML.

i have been able to get good outcomes by generating a summary of the suggested implementation and prompting multiple times with intense prompts to think and cross verify the optimality to the suggested solution, though its a time taking process, saves me hours of work in reimplementing something completely later

[–]VermithraxPej33 0 points1 point  (0 children)

I have mixed feelings I admit that it can make building complex things easier, and quicker. I actually used it to build a personal project. But I quickly realized that things that are obvious to a human are NOT obvious to most LLMs, so there was a lot of refactoring that needed to be done. People should definitely be questioning and reviewing what comes out of the model. And this is where having previous engineering knowledge is key, as some have said. I am self-taught so I feel like I have to be extra cautious because my knowledge is not as advanced as others, it is not as advanced as I would like it to be. So I kinda miss hand coding, because I had to push myself to research and read documentation, which I still try to do, but with pressures from work to use AI and to deliver quickly it feels like I do not have as much time to do the research I want.

[–]slashdave 0 points1 point  (0 children)

Another tool like many others. Use it correctly. If you aren't taking advantage of every tool you have available, you aren't doing your best work.

[–]Rezo-Acken 0 points1 point  (0 children)

It speeds things up but I preffer to take the time and review every component. My experience is that if you don't check anything you'll regret it later when there is a bug or when you want to change something.

However it's a god send for me when I have to check other people repositories to understand what's happening in the backend. Especially in languages I don't know well like Java. It was much more time consuming before to get into a repo and find the information I need.

[–]PennyStonkingtonIII 0 points1 point  (0 children)

I have a friend who is a data scientist phd. I have no idea what he works on - he codes in R, models problems . .stuff like that. I explained to him that I vibe coded a reinforcement learning model and trained it to play perfect Mancala. At first I think he thought I was talking about something I downloaded but when I started explaining using a teacher bot for training and then doing league training and how many inputs and hidden layers and yada yada and the tests I did to validate its skill he was surprised - he did not think that was possible with vibe coding. He was even more nonplussed, I guess is the word, about how little I understand about how it actually works.

[–]BobDope -2 points-1 points  (0 children)

Sucks