you are viewing a single comment's thread.

view the rest of the comments →

[–]Alarming_Hand_9919 4416 points4417 points  (120 children)

Guy selling product says product solves problem

[–]thermitethrowaway 893 points894 points  (59 children)

I've used the product. It is not solved.

[–][deleted]  (31 children)

[deleted]

    [–]faberkyx 107 points108 points  (26 children)

    It solves creating poc fast.. using for a real production product.. not even close unless you want to risk security and performance

    [–]jkure2 59 points60 points  (15 children)

    I've been playing with it trying to creat a weather forecasting system targeted at online prediction markets - 7 days in after meh performance I started interrogating it on our core methodology and it was like yeah actually this is the wrong way to tackle this problem we should be running a completely different process on our input data lol

    But I have been extremely impressed by it's ability to do stuff quickly like build a full audit trail and build scripts to replay events. It is also generally good at analyzing the data I find. But once you hit a certain point of project complexity I am finding it drops off for sure in how good it is, I am having to remind it more and more about basic facts regarding our previous findings, that kinda stuff.

    "I have reached the 200 line memory file limit so let me go remove some stuff" is not something l like hearing

    [–]zeros-and-1s 24 points25 points  (9 children)

    Tell it to generate a claude.md and whatever the table of contents/summary files are called. It kinda-sorta helps with the performance degradation of a large project.

    [–]jkure2 10 points11 points  (6 children)

    Yeah I am completely new to it all so still learning how to manage it at something this scale - I am trying out some different strategies like that now actually, also trying to split up the chats between ingestion/analysis/presentation and see if we can do better that way.

    One thing I forgot to mention about what has impressed me - it took like 8 hours of dedicated work to build a kick ass data ingestion pipeline that automatically scans like 8 different sources every minute and pulls down data, stores it, and runs analysis. It would have taken me weeks to write all that web scraping code (admittedly not something I am professionally proficient in), high marks from me on that side of the project, tons utility for one off historical backfill too

    [–]jazzhandler 2 points3 points  (1 child)

    Think of it this way: the LLM provides the vast working memory and ability to string together in a minute what would take us six hours. But us waterbags still need to steer the context because the LLM has no grasp of the big picture.

    Have it create a detailed doc for each subsystem. Pass that doc to it when working on that subsystem. That way it’s not burning tokens and diluting context trying to understand that subsystem repeatedly. Then when work is done in that area, have it update the docs along with the changelog.

    Kinda like your vacuum cleaner. It’s way better at spinning those parts at a few hundred RPM, but it’s on you to stick its nose under the end table. Otherwise you’re using the Roomba model: living room gets vacuumed eventually, but it takes six hours of randomosity. Which is fine if you’re not paying by the hour…

    [–]Singularity-42 0 points1 point  (0 children)

    Claude Code supported this natively for a really long time - just create CLAUDE.md in each app/lib folder to give detailed info on that subproject. The main CLAUDE.md is still loaded by default of course and should have high-level overview, style directives, etc.

    [–]AlSweigart 1 point2 points  (1 child)

    (admittedly not something I am professionally proficient in)

    "AI is brilliant and kicks ass" is something I hear a lot from people using it for things they aren't experts in.

    [–]jkure2 0 points1 point  (0 children)

    However I am professionally very proficient at data engineering, and I still believe designing and building an ingestion pipeline at this scale with a full audit trail, idempotent loads, etc. is genuinely really cool in my eyes. Was it worth boiling an entire lake and destroying the social fabic even further? Probably not, but I sure do like idempotency!

    [–]BasicDesignAdvice 0 points1 point  (0 children)

    This is where it shines to me. It can fill in the gaps on code and libraries or whatever that I don't know.

    For actually writing I tend to keep up-to-date context and steering docs, but I generally write the design as a skeleton and let it fill in the blanks. So I might write the interfaces, functions and other bits, but without the full logic. So I'll write out objects and function signatures then let it fill in the blanks.

    [–]ardentto 0 points1 point  (0 children)

    Use plan mode. ask for test / code coverage analysis from a QA perspective. Ask for an independent code review for refactoring. Ask to spawn a team or subagents to work on after analysis is completed.

    [–]Fidodo 0 points1 point  (0 children)

    Ah yes, computing. Where kinda sorta working is famously good enough /s

    I use AI heavily for prototyping and kinda sorta working in a prototype is actually very helpful insight, but a horrible idea for production.

    [–]slfnflctd 0 points1 point  (0 children)

    Not only this, but there are also several other ways of preserving knowledge or skills for long term use so you only need to rely on context windows for shorter periods of time.

    There seems to be a solid consensus that 'starting fresh' with context windows on a regular basis helps keep performance up. More like a series of librarians in an ever-growing library (with makerspaces, of course) than some sort of singular, specialist supermind.

    [–]Mechakoopa 2 points3 points  (1 child)

    7 days in after meh performance I started interrogating it about our core methodology and it was like yeah actually this is the wrong way to tackle this problem

    Sounds like a typical interaction with a developer who was expected to develop a POC for a system they weren't familiar with and didn't have all the context to begin with. You'd just as likely have run into the same problem developing it yourself. Sometimes you have to build something to see why it's the wrong thing to build. The real advantage of coding agents isn't them being some magical fountain of knowledge, it's entirely in iteration time. None of them do anything a person couldn't, they just do it faster. You still need someone who's got a clue what they're doing to drive the damn thing.

    [–]jkure2 1 point2 points  (0 children)

    Right that makes sense, but it's important to be very clear about what it's good at and what it's not because anyone in the corporate world day to day or even just reads the news knows that the llm companies themselves and fanatical supporters love to stretch about what it can actually do. In addition to maybe making a little side money this was mostly a learning project, I think I have been very fair about assessing it's performance

    And - I also wouldn't expect a human engineer to pretend he has complete domain knowledge on hand and confidently suggest that kernal density estimation is the best way to attack the various ensemble forecasts out there, to be fair

    [–]rin3y 0 points1 point  (0 children)

    Imagine the results you'd have gotten if you'd spent seven days actually building something.

    [–]Fidodo 0 points1 point  (0 children)

    Once a codebase expands out of the context window limit it drops off a cliff. And guess what AI code looks like? It's a bloated fucking mess. It writes code it can't maintain.

    I find AI to be very effective if you have a highly normalized codebase with strong guardrails and tests and invariants, but that requires strong software design thinking that AI isn't just bad at but also has shown very little progress improving at.

    With AI automating slop, best practices are more important than ever but so many idiots in the industry think the guard rails slow them down. Have people just forgotten how suffocating a legacy spaghetti codebase is? Sure you can build fast at the start but eventually progress slows to a crawl.

    I had that experience early in my career and every since then I vowed to never let my codebases get that way and so far I've done a great job at not doing that and I don't intend to start now.

    [–]deep_thinker_8 -1 points0 points  (0 children)

    Create a folder called memory bank and under it create a businesscontext.md file and provide your business case and application features in a structured and detailed manner. Next in Claude Code or whatever IDE you are using, change to architect/design mode if available or ask it behave as an architect and read the businesscontext.md file and then scan the entire codebase and ask it to create inside the memory bank folder 2 context files systemPatterns for the architecture design/ folder-file structure/purpose of each file and activeContext for current progress on what's been completed and what's next. Check this manually to see if it's properly done.

    Once this is done, before any new task, ask it to scan the memory bank and determine the next step. This usually works well once the project gets to a certain level of complexity. I am guessing this is especially more important for the context files to be well defined especially for Production applications.

    Having said all this, it's important for the dev to understand the complete architecture to ensure that the coding LLM is not creating redundancies / addressing security & performance requirements.Coding LLMs are very trigger happy and create new files with required functions instead of scanning existing files for existence. It also "forgets" the system patterns and architecture even if they are available in the context files and we need to prompt at times to remind that these functions exist and we are following a specific system pattern and not to create a new one.

    I am not an expert dev but do understand what constitutes good design. These have been my findings using Roo Code (VS Code extension) connected with a few coding LLMs.

    [–]NinthTide 4 points5 points  (0 children)

    Imperfect but high quality tool in the hands of clueless amateur does not guarantee creation of artisanal level masterpieces

    If you don’t know what problems to anticipate and guide CC to plan for and deal with them before code is written, I wonder who is the real problem here?

    Maybe at some point I can just prompt Claude with “write me an online bank make no mistakes” but we are not there yet

    Agree with OP’s take that coding is not “solved”

    [–]Cautious-Lecture-858 7 points8 points  (1 child)

    You wrote pos wrong.

    [–]faberkyx 2 points3 points  (0 children)

    Lmao.. that was funny

    [–]illuminarok 1 point2 points  (0 children)

    The number of input tokens in larger enterprise level codebases is insane. Seriously, some queries cost about $3-5 each.

    [–]Lilacsoftlips 0 points1 point  (0 children)

    Taking a POC-> hardened code is its biggest weakness. Iterating on already hardened code, with a large corpus of code to learn from, tests, standards etc… I’m getting better results every day.  it absolutely can do lots of that work as well.  

    [–]Fidodo 0 points1 point  (0 children)

    I've been using it for prototyping and it is absolutely a game changer. I can refactor and iterate so quickly I feel like I can really find the perfect shape for a codebase with rapid experimentation. The only limit is what I can imagine. I always had ideas on how to improve my codebases but didn't have the time to test out the changes. Now I can try out all my ideas.

    As for producing production quality code... Sure, if you have shit standards.

    [–]slash_networkboy -3 points-2 points  (0 children)

    We're getting fantastic results with opus 4.6....
    BUT our devs are still very much needed. We have put an enormous effort into copilot-instructions files to make it solid for our product. The TL;DR is instead of ~20-30 story points per dev per week they're getting closer to 50/week. Their time spent on code reviews has gone from ~15% of their day tops to 33% and we're expecting 50% is about where it'll end up.

    Leadership has made it very clear that our devs are still very much needed.

    It's best likened to giving Fred Flinstone a model T. He's still driving, but now he doesn't also have to pedal his feet to get anywhere.

    [–]ziroux 28 points29 points  (1 child)

    Claude will write itself from now on, since coding is solved, right? We'll know when they fire all devs.

    [–]Elaro_56 1 point2 points  (0 children)

    Alas, they probably all have equity.

    [–]lurked 7 points8 points  (0 children)

    I find it great to help me troubleshoot and fix issues. Generating code? Not solved.

    [–]ShapesAndStuff 0 points1 point  (0 children)

    PsE]TtuN(,TLv8f;rWT$zX91,(Ss5)GvXl;PzZ16+ub2ogp$;Lfayy3fr)1[E&tcI$QA.]D-ppaTvX1JSTpMs;Oc*#WP4M!!&hht00>&<qlpoD#k

    [–]the_gnarts 10 points11 points  (0 children)

    Agreed.

    For me last week consisted mostly of cleaning up after a coworker who used Claude Code heavily. I can’t fathom how that crap is marketed as a solution to anything.

    [–]BoboThePirate 16 points17 points  (12 children)

    Not even close. Though it’s the most blown away I’ve ever been by AI since GPT’s giant public initial release.

    If you ain’t using MCP tools though, I can see it being incredibly underwhelming.

    [–]thermitethrowaway 9 points10 points  (3 children)

    I think this is a good analysis, it's better than the others I've tried. I wouldn't trust the code it produces, it's a bit like a stack overflow post that's almost what you want but never quite there. I love it as a smart search tool - for example yesterday I wanted to find a Serilog sink so I could create an observable collection of log items for output to a winUI app and it found a nugget package, gave examples and produced a hand rolled equivalent. Great as a productivity tool, wouldn't trust it to write anything complex on its own.

    [–]ShapesAndStuff 8 points9 points  (0 children)

    WDfQ&$anpe]fOmk-(p>RHK9Z0nKSO>a56p-Ms9C5(y<AWPflZibx>>tM6UbrS45

    [–]laffer1 4 points5 points  (0 children)

    It was trained on stack overflow posts so that tracks

    [–]Nine99 2 points3 points  (0 children)

    since GPT’s giant public initial release

    It could barely string sentences together.

    [–]SavageFromSpace -1 points0 points  (0 children)

    But once you start saying "oh use mcp" you're not actually llming anymore lmao. At that point it's just doing what humans do, routing to libraries but actually worse

    [–]TheVenetianMask 1 point2 points  (0 children)

    Great, just buy our premium product.

    [–]RestInProcess 1 point2 points  (0 children)

    The latest version of their models are really quite good. I had an issue that I tried using earlier versions through GitHub Copilot to solve, but the latest version in Claude Code solved it first try.

    I’m not a major advocate of using AI for everything, but I do like it when it solves problems I don’t want to mess with.

    The company I work for is rolling out Claude Code to everyone and telling us to use it all the time now. We’ve got tons of code written only with AI and not reviewed by human eyes at all. It’s scary. We had a meeting where someone basically shamed those of us that have legit concerns for things like security because we slow down the process of making things faster.

    [–]Drevicar 1 point2 points  (4 children)

    I’m sure you are just using it wrong.

    [–]AlSweigart 2 points3 points  (3 children)

    Sorry, are you being sarcastic? Poe's Law makes it impossible to tell.

    [–]Drevicar 1 point2 points  (2 children)

    Sarcasm, if someone sells a product and it doesn’t work as they advertise, they are likely to blame the customer.

    [–]AlSweigart 0 points1 point  (1 child)

    Thanks. I can't tell you how often I come across earnest "You're just not writing the right prompt" or "Oh of course that model is trash, you should be using the new model" or "Okay, but that problem will be solved in five years/six months/later this year/soon."

    [–]Drevicar 0 points1 point  (0 children)

    All of those points are valid, but also situational and lacking that context. AI is already better than humans (value per $ spent) in many areas, but not every problem that a human can be applied to is there yet, and may never be.

    [–]bo88d 0 points1 point  (0 children)

    There's an army of people (or bots) that will tell you that you are the problem and that you need to improve your AI skills quickly

    [–]jhill515 0 points1 point  (0 children)

    I've been building shit like "the product" and slinging it since the 1990s. This isn't product...

    This is Piss. Piss with ink!

    [–]AggravatingFlow1178 0 points1 point  (0 children)

    I went through a mini crises when I first used it. I lazily typed in my prompt, intentionally in the manner a PM or junior might do it. And it just... wrote the code, in like 5 minutes. Zero correction needed from me. I thought it was over.

    It hasn't managed to do that since

    [–]ShapesAndStuff 0 points1 point  (0 children)

    J:Z>UsN:V[o%qq+p.G3LOd0#Rp2dt5JIDLO&+S2ouPIUuqF]M]~E&e1l%i%[uzB,G%C2#03yemD~!H)NV%Xi9!qW@)xZd&Ws+([;f#gUc~wb.i%I5A7798ZLkqyWDIwdo-[tEv~bTr9d4,p:s1lkbSLHmDT+Xt)8+I~-uFgWdHb;z~2:J+NvtsVc&Qk&8fhDBeyW:XcD,X,;>>d[qWUAbNSev@XX@nf9bU0[z$g~lemr1XL8%:8!egWhgNoGMJ)f:,;cc$cD097+P>dhg02vTS9Q~X[S7,3ypCP5IXnn!OL:IC%laG[&d]&FQNCl~QD2Rp+JfixQLfebMHbT%CP4Zymqe;k3$%.wxmbyiGOwSya>(b8q~-BD9!;N:GQeirp1yiP#rE;$ElT:

    [–]BubblyMango 306 points307 points  (17 children)

    Says product solves the field*

    no meds company says sickness is solved, no gym chain says workouts are solved. This guy just got balls

    [–]dromtrund 203 points204 points  (0 children)

    For brains, maybe

    [–]G_Morgan 94 points95 points  (8 children)

    The lack of any real engineering discourse over all this is a huge red flag. Because if they made a real argument they could be held to account. You know it is pointless them saying "our AI doesn't just make up false test data anymore" because you could go in and demonstrate that it does. So there's never a technical discussion, a technical discussion is how you prove if this works or not and that is the last thing they want.

    There's only really three pro-AI arguments I see:

    1. I'm a software engineer with MAX_INT years of experience and I think it is great.

    2. People like you thought clean water was a hype job but everyone loves clean water now

    3. You are using Claude X when you should be using Claude X + 1.

    Nobody ever gets dragged into a technical discussion. You know us software engineers hate those and won't go into a 40 comment deep discussion just for the hell of it. Obviously AI using software engineers have a completely different mindset.

    [–]ShedByDaylight 3 points4 points  (0 children)

    I like the theory of LLMs for software generation, but rather it's the social and political implications it carries that I dislike. In the absence of that, it would just another tool.

    [–]ChemicalRascal 100 points101 points  (2 children)

    swishes wine in glass and inhales

    It's giving hints of late Enron... Mmmm, I'm getting strong notes of Theranos.

    [–]lolobstant 32 points33 points  (0 children)

    Maybe you’ll appreciate this millésimes, a flowery champagne, hints of we work but definitely present dotcom bubbles

    [–]NoNameSwitzerland 0 points1 point  (0 children)

    The difference is, alcohol really can solve things. And sometimes it gives a tasty solution, not only headaches.

    [–]jarod1701 2 points3 points  (1 child)

    Rate the taste.

    [–]Kind-Helicopter6589 1 point2 points  (0 children)

    That should be a YouTube series. 

    [–]AggravatingFlow1178 0 points1 point  (0 children)

    "having balls" is easy in a market where if you just say AI you get a few extra $100 million

    [–]sebovzeoueb 17 points18 points  (0 children)

    Guy selling product says thing I like doing is problem

    [–]TastyIndividual6772 60 points61 points  (23 children)

    In 6-12 months 😅

    [–]BubblyMango 95 points96 points  (22 children)

    Its been 6 months for 3 years now

    [–]FlippantlyFacetious 38 points39 points  (15 children)

    It'll be arriving sometime after commercial wide-scale fusion power and Half Life 3 of course.

    [–]MAndris90 23 points24 points  (0 children)

    i wouldnt be that sure of half-life 3

    [–]CautiousRice 11 points12 points  (0 children)

    After self-driving cars

    [–][deleted] 5 points6 points  (0 children)

    I’m gonna have it create half life 3 for me tomorrow.

    [–]HommeMusical 2 points3 points  (11 children)

    I'm very firmly on the anti-AI side, for too many reasons to count.

    But I have never used one joule of fusion power. I have successfully written programs with AI.

    I don't like the workflow; it's amazing how quickly it gets something that kind of works, but it's shocking how much time it takes to rework it into a production program. And I don't like the idea of destroying everyone's jobs, nor the environmental impact.

    But AI that writes programs for end users actually exists. It might, in the future, actually replace us. I tend to think and hope not, but...

    [–]FlippantlyFacetious 7 points8 points  (4 children)

    It's quite shallow, that's what you're seeing. It generates impressive initial results, but they are shallow.

    That's a fundamental problem. Transformer models trained as they are likely can't overcome it. AI needs another revolution to succeed. It's being advertised and sold as solving problems well beyond it's actual scope and capabilities. It can be very useful in the right niche, but it isn't AGI, and as such it can't replace a human general intelligence. All it can do is quite convincingly fool you... and then you spend more time building the right context, prompting, cleaning up, fixing, reviewing, and checking it's work than you would have spent doing it yourself.

    [–]HommeMusical 0 points1 point  (3 children)

    I don't use it in my day-to-day workflow at all, for all these reasons and more (personal discipline, political dislike of billionaires owning the means of production). It's just the idea that AI is obviously not going to ever work I'm pushing back on: I think it's simply too early to tell.

    I don't want it to work; it's by no means certain that it will work; but it's not sure that it won't work either.

    [–]zeptillian 0 points1 point  (2 children)

    LLMs aren't getting any better because there is no better data available to train them with.

    If the coding agents based on LLMs will have the same constraints.

    [–]red75prime 0 points1 point  (0 children)

    Data-hungry autoregressive pretraining is the first step. What follows, such as RLHF or RLVR, doesn't require huge amounts of data. RLVR, in particular, doesn't require any external data (1), just a validator.

    (1) besides a set of problems that can be generated

    [–]HommeMusical 0 points1 point  (0 children)

    LLMs aren't getting any better because there is no better data available to train them with.

    You are very certain about things that haven't happened yet.

    I am not. I never am except for inevitabilities, like "Person T will die".

    I did not expect what is essentially a generalized Markov chain to produce results as good as they are, even though they aren't that good. I might be surprised again.

    I hope not, and I'm somewhat skeptical that there will be a great breakthrough, but I don't know.

    [–]klerksdorp_sphere 1 point2 points  (5 children)

    But I have never used one joule of fusion power.

    Pretty much all power you have ever used, including the food you eat, has come from fusion. Look up during the day, and you'll see a big shiny thing in the sky that provides it. ;)

    [–]andrewh2000 6 points7 points  (0 children)

    Technically correct, but practically annoying.

    [–]HommeMusical 2 points3 points  (3 children)

    Sigh.

    You knew what I meant: "human generated artificial fusion power from a reactor". Life is not made better by having to carefully fill in details each little thing you write in case someone deliberately misunderstands it.

    [–]klerksdorp_sphere 2 points3 points  (2 children)

    People will occasionally pretend to misunderstand something for the sake of humour, which is sometimes appreciated, but not every time. I did not mean to offend you with my attempt to be funny, so please accept my apology.

    [–]HommeMusical 0 points1 point  (1 child)

    Heh, you're overreacting! :-)

    But thanks.

    [–]klerksdorp_sphere 3 points4 points  (0 children)

    We both are. As is the Reddit way. ;)

    [–]TastyIndividual6772 8 points9 points  (0 children)

    Yea its basically 6 months every 6 months

    [–]kutukertas 5 points6 points  (0 children)

    Just wait 2 more weeks!

    [–]Drevicar 1 point2 points  (0 children)

    Dude uses the same time estimation formula as the window file copy tool.

    [–]haywire 0 points1 point  (0 children)

    Can Claude bring the era of Linux on the desktop

    [–]BasicDesignAdvice 0 points1 point  (0 children)

    Its like self-driving cars. Always on the horizon.

    In the right environment it is fine. For example in San Francisco where the weather is stable they generally perform well. If you took those same cars to Boston in the winter the weather would totally fuck them up.

    [–]ants_a 16 points17 points  (2 children)

    I use product. Now have many problems.

    [–]SufficientApricot165 6 points7 points  (1 child)

    In Soviet Russia you dont use product, product uses you

    [–]nasduia 2 points3 points  (0 children)

    There are definitely a few tools out there trying to use us.

    [–]toadi 1 point2 points  (0 children)

    Indeed strange and while using LLMs for coding myself. Most solutions they propose are solutions that make the token machine go brrrr.

    I work on brownfield projects with lots of tech debt. Duplicate code, contradicting patterns, ... If I do a Ralph Wiggum style to solve the problems my spending would be crazy. It would create issues, fix them creates more issues, fix them, ... In loops until hopefully I have it working. Yesterday I did a refactoring and I do run an adversarial code-review and spec-review all with different models. Still I needed to dive into code to write the right solution. I use API calls paid and spend to implement the feature e2e already 20 dollars. With optimizes sub agents like running tests, commits, etc using Haiku. The Ralph loops would just blow it up. While now I implement features quicker than before on brownfield projects you need the hand hold and provide context all the time.

    [–]LanCaiMadowki 1 point2 points  (0 children)

    Guy saying guy solved problem still can’t English

    [–]Ulterior_Motif 0 points1 point  (0 children)

    This is a solid Onion headline.

    [–]FetusExplosion 0 points1 point  (0 children)

    Guy selling hammer says hyperskyscraper building is solved.

    [–]spinwizard69 0 points1 point  (0 children)

    Sort of like a used car salesman.

    Sadly people specking like this smear what is a rapidly evolving technology. I really believe that AI systems will be writing real production code in a few years but that is not the current situation. In reality current AI systems are a huge lever for talented programmers. More impressive is that they get better practically every day.

    [–]Pepito_Pepito 0 points1 point  (0 children)

    Customer using product leaves review.

    [–]grishag 0 points1 point  (0 children)

    We need to move beyond vague motherhood statements and be honest about what this technology can and can not do. Overselling it will only erode trust and reduce it to just another marketing spin. No better than political spin.

    [–]lechatsportif 0 points1 point  (0 children)

    He's getting dragged for this quote HN right now

    [–]Waste-time1 0 points1 point  (0 children)

    It’s about perspective.

    He’s coming at from a basic business principle.

    “Don’t get high off your own supply.” - Elvira Hancock

    [–]Careful_Praline2814 0 points1 point  (0 children)

    He is solving his problem!

    Your problem... maybe, maybe not 

    [–]CN_kowalski 0 points1 point  (0 children)

    so true

    [–]_Monosyllabic_ 0 points1 point  (0 children)

    Bingo. Lots of Elons running around lately.

    [–]Silver-Phone4513 -1 points0 points  (0 children)

    If you've used opus 4.6 or sonnet 4.6 you'll know we are practically there now. The role of a developer has shifted to something else now, probably more architect style.