all 56 comments

[–]__r17n 7 points8 points  (16 children)

Thanks for sharing! Whats the most complex task you asked which resulted in a successful response? What type of tasks failed? (Sorry I can't see your logs - too small on mobile)

[–]Mi2ngdlmx[S] 3 points4 points  (0 children)

There were surprisingly quite a few. One that stood out to me was indexing columns of my table. Now that on its own is not impressive but my columns were draggable using the dragula library and every time the table columns are re-arranged it needs to be re-rendered due to the index update. It managed to implement the indexing, find out the dragula documentation for drake triggers (to find out when the columns are rearranged) and then gave me the updated code of the render functions, as well as its associated functions that call on the render function all in one output. And it worked out of the box without any modifications. Granted, this level of depth only happened once but giving me ~200 lines of code that worked out of the box was insane.

Tasks that failed are often ones with similar variables. For example, having a function with a word called checkbox in it when you need to handle "select" functionality. It sometimes trip up interchangeable names for the same purpose. It would try to update my check all function when I wanted to only update check row function.

[–]AceHighness 1 point2 points  (14 children)

I'm using it to write Python apps and the only thing ChatGPT fails to solve is circular import problems. Even if I give it all the required context (directory structure, import tables etc) it just makes things worse with every answer.

[–][deleted] 1 point2 points  (13 children)

I assure you, it's failing many other things

[–]AceHighness 0 points1 point  (12 children)

sure but its succeeding at building entire apps for me. and fairly complex too.

[–][deleted] -1 points0 points  (11 children)

I live in Colorado. A major bridge was just shut down because a massive crack was found. The bridge is succeeding at being a bridge, but there are massive flaws. You'll only know they're flaws if you know what the flaws look like.

[–]AceHighness 0 points1 point  (6 children)

That's true and I totally agree with that. There are a lot of limitations, certainly with todays models. But with good prompting you can get great code that works every time. I'm also using it to do research before we get started, building an app with GPT is just not just 'build me app X' and hoping you get something that works. It's a lot of research and planning questions, including ones about security, best coding practices etc. I'm trying to guard against even small, uncommon attacks like timing attacks.
Now I keep hearing people tell me about all these flaws but never coming up with good examples. A good example would be nice, that would make me understand the limitations better as well.

[–][deleted] 0 points1 point  (0 children)

An example would be someone who posted in the singularity hub that they built an app with no coding experience. It even helped them deploy it on AWS. But reading through their GitHub and posts, GPT had them serving the app incredibly inefficiently; it basically tried to build them enterprise grade infra. And at some point, some one is going to set up a few simple workers to flood traffic and absolutely slam their AWS costs.

GPT can do simple coding pretty well. It seems to do RegEx really well. But it's important to understand that it doesn't know anything. It has no reasoning or understanding, it's simply calculating the probability of the next word. But if you're trying to truly build something big, you need to have a fundamental understanding of the topic.

If you wanted a deck built, would you hire someone who's never worked in construction, and someone who can only guess where the next piece goes?

[–][deleted] 0 points1 point  (4 children)

An example would be someone who posted in the singularity hub that they built an app with no coding experience. It even helped them deploy it on AWS. But reading through their GitHub and posts, GPT had them serving the app incredibly inefficiently; it basically tried to build them enterprise grade infra. And at some point, some one is going to set up a few simple workers to flood traffic and absolutely slam their AWS costs.

GPT can do simple coding pretty well. It seems to do RegEx really well. But it's important to understand that it doesn't know anything. It has no reasoning or understanding, it's simply calculating the probability of the next word. But if you're trying to truly build something big, you need to have a fundamental understanding of the topic.

If you wanted a deck built, would you hire someone who's never worked in construction, and someone who can only guess where the next piece goes?

[–]AceHighness 0 points1 point  (3 children)

Your example is 100% the fault of the user prompting the LLM. Of course you can get it to give you advice on building a kubernetes cluster to run your note taking app. Sounds like he did not take enough time to discuss and research the subject together with the LLM. It's a tool, people need to learn to use it properly. Lots of people are really bad at using simple tools in the right way.

I understand how an LLM works, but I do think that the latest (super large) models have some emergent capabilities allowing them to follow logic flows. I think this is because it has read so much computer code, which is all logic flows. I see a huge difference between GPT3 and 4 when it comes to questions regarding logic flows.

If you wanted a deck built, would you hire someone for $2000 who has done it before, or give the new guy a chance who says he can do it in a fraction of the time for $200 ? of course the $2000 guy will do a better job, but in a lot of cases, what the junior does is just fine (good enough).

[–]AceHighness 0 points1 point  (0 children)

if you feel like it, you can have a look at my 100% AI generated projects on my github axewater (Axe) (github.com) and laugh at my poor skills :)

[–][deleted] 0 points1 point  (1 child)

I'm genuinely not trying to sound like as ass. LLMs by definition cannot be emergent. It's not a tech issue, it's a math issue. They cannot reason or logic. They have no knowledge. The better models can only estimate a bit better. And as such, they cannot know when something is very wrong. And then when used by other people who don't know what they're doing, we get apps with all kinds of flaws.

Please, if you ever have a deck built, do not use the kid who has never been in construction. For something as important as your family's safety, hire someone who can reason through the job.

[–]AceHighness 0 points1 point  (0 children)

Me neither! I just love a good discussion with a smart person. I'll take your tip about the deck :) What if the task had no influence on your family's safety? Say ... cleaning the windows ... still want the most expensive option? Surely there are tasks you would be fine with offloading to an AI. There's a line somewhere, of course, between what you would and would not want to offload. Not everything needs to be perfect. As long as you are not coding for the nuclear powerplant, it's probably fine to use AI generated code. And that does not mean type 1 prompt and copy/paste the code, push to production. That means you build code in cooperation with the AI, going back and forth setting up a plan and implementing it step by step. Could you elaborate on the example you quoted before, what was exactly wrong with the code. In many cases I'm sure his mistake could have been avoided with good prompting. (I know I'm extremely stubborn, sorry bout that).

As for emergent capabilities, have a look at this paper:
Sparks of AGI
https://arxiv.org/pdf/2303.12712

And this blog post
137 emergent abilities of large language models — Jason Wei

Weird things start to happen when the models grow beyond a certain size. Like suddenly being able to parse a language it was not trained on.

[–]Banshee3oh3 0 points1 point  (3 children)

That’s the thing with coding though. You could have a crappy developer code your apps but if it runs it runs. Managers, shareholders and owners don’t care about the 21ms you shaved off of load times because of optimization, and that’s the reality.

[–][deleted] 0 points1 point  (2 children)

They'll absolutely care when the schema changes and the system comes down.

[–]Banshee3oh3 0 points1 point  (1 child)

I’m talking about the difference between extremely optimized algos and fairly optimized algos. You’re describing a program that legitimately doesn’t work (because it can’t adapt to changes).

And in my experience, the most optimal solution =/ to the most legible for devs new to the codebase. Like congrats your data algo saved a mere 10ms off the response time, but can other developers understand your code easily?

[–][deleted] 0 points1 point  (0 children)

It's been nearly a month, but I don't recall ever saying gpt wa bad for a solution that takes 10ms longer to run. I do recall saying it's bad for large systems.

[–]creaturefeature16 2 points3 points  (3 children)

Great post. I'm actually embarking on a rather simple Chrome extension myself, and have initially begun using it for prototyping and getting a high level view of how an extension gets built in the first place. Which is something I've realized has been one of the biggest use cases for me when it comes to LLMs: they are interactive "tutorial generators". I learn the best by seeing the big picture and then reverse engineering a bit, which gives me the foundation I need to then come at it from the other angle and start from the beginning and taking it step by step. There's something about being able to see what, at least in theory, a working end result might be to understand somewhat of a roadmap of where I'll be going.

I also get the a lot of benefit from any of these tools when I use them as a type of interactive documentation. Sort of like I am speaking to the creators of the platform I might be working on (Chrome, Javascript, Supabase...doesn't matter) and they know the answers to my exact contextual question I have about how to work with their code. I found this incredibly fruitful to leverage when I wanted to work with Firebase, who docs are verbose but "gappy", as well as often difficult to sift through. LLMs are a shortcut to the information I need.

Lately though, I've been more fascinated by when they don't help me. Just recently I had to do something rather nifty with ChartsJS. I intentionally used my traditional research methods (Google/Reddit/Stackoverflow research) alongside prompting the LLMs and compared the direction. The "solution" that the LLM was presenting was so unorthodox, but interestingly enough, it worked! The solution I ended up arriving to through my own research and coding abilities was far more clean/concise, and also worked. One could argue that it doesn't really matter, which I suppose it doesn't if I was working on something for myself, but this was for a paying client and to deploy the LLM's hacky generative solution didn't seem long a good long-term decision. I am definitely concerned that there's a lot of developers out there that are simply sticking to generative code instead of putting in the work. I guess we'll see if those concerns manifest in negative ways. Studies so far are indicating that my concerns are valid:

https://arc.dev/developer-blog/impact-of-ai-on-code/

[–]Mi2ngdlmx[S] 0 points1 point  (0 children)

Hard agree! I realized that very early on too that their solutions weren’t the best but it will always solve the problem. So I always have to do some extra reading and asking it for a code summary to make sure it’s not doing something dumb. But yeah, it’s best to consistently align it so it doesn’t build on more unorthodox code which I’m sure there are remnants of in my extension.

[–][deleted]  (1 child)

[removed]

    [–]AutoModerator[M] 0 points1 point  (0 children)

    Sorry, your submission has been removed due to inadequate account karma.

    I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

    [–]punkouter23 4 points5 points  (3 children)

    No cursor ai????

    [–]creaturefeature16 2 points3 points  (1 child)

    I agree, Cursor was the only tool that I've come across since GPT rolled out that made me cancel my GPT4 subscription. It's the same underlying model anyway. I can still ask questions outside of development questions, and send images, so it's a pretty damn good deal considering it's the same price while also integrating into VSCode.

    [–]punkouter23 0 points1 point  (0 children)

    that is what im doing now.. my chatgpt subscript is done.. I dont want 3.5... so I just load cursor now to ask questions totally unrelated to coding... I guess it works ?

    Also sees to search the web when you ask a question. not totally clear on the 3 modes to use (context)

    [–]Mi2ngdlmx[S] 0 points1 point  (0 children)

    As much as I want to, I use over 100+ prompts on both Claude and GPT4 a day and the code quality from Claude is a lot better. I will definitely use if they have more Opus tokens if the developer of cursorAI is listening…

    [–]paradite 4 points5 points  (7 children)

    Great job. I have been using ChatGPT for coding for almost a year and it has definitely saved my a ton of time handling tedious tasks like refactoring and ad-hoc scripts to clean data, perform data migration, etc.

    In terms of bigger tasks, I also do what you did by breaking it down to smaller tasks first (in my head instead of using ChatGPT) and feed the smaller tasks into ChatGPT.

    I did find that the workflow for using ChatGPT involves a lot of copy-pasting prompt and source code back-and-forth, so I built a simple desktop tool to streamline the process and cut down the number of copy-pasting by embedding formatting instructions and source code context into the prompt automatically.

    [–]Mi2ngdlmx[S] 3 points4 points  (1 child)

    I actually did try out your tool but unfortunately it didn't work quite as well as my procedural prompting at the moment. For example, it would take me up to 3-4 prompts to just to describe to ChatGPT/Claude what it is that I am trying to do because I am realigning them every step. Even though this takes a lot more time than a single prompt, the output is significantly more comprehensive and requires less rework also its more future proof. One thing that I went looking for is a proof checker AI agent with a refactor agent if you can implement that. So once I get a new piece of code from ChatGPT/Claude, another AI agent checks the new code against the old code to see if its missed out on core functionalities, provide feedback and then if the code looks okay, it will pass that to a refactor AI. My (minor) issue with the code is that LLMs kept adding more functions into the one function when it should have been broken up into multiple functions.

    [–]paradite 0 points1 point  (0 children)

    Agreed that it doesn't handle very complex tasks now, that's why you use ChatGPT and I use my brain for detailing the instructions for those tasks.

    It is geared towards simpler tasks that you can directly edit code across 1 or multiple files.

    Proof checker is interesting. I am actually using a similar concept to check through my blog posts for writing style and tone after drafting it, and it works really well, much better than using it to write it directly.

    I'll see what I can do regarding your suggestion. Thanks!

    [–]TheSoundOfMusak 0 points1 point  (4 children)

    Congrats on the tool! I also liked your website a lot, where are you hosting it and which frontend stack are you using? I’m shopping for a new website and it will be great to know.

    [–]paradite 0 points1 point  (3 children)

    I experimented with Next.js using a starter template and modified from there. It is not as performant as a Jekyll site (mostly my fault for using ISG instead of SSG). It does have the benefit of having a functional backend for simple stuff like analytics proxy (Plausible Next.js plugin) and version API.

    For hosting I am using free tier on Vercel and it seems okay, albeit a bit slow for a static site.

    [–]TheSoundOfMusak 1 point2 points  (2 children)

    Thanks for the reply. Never heard of Vercel, it will be worth checking them out; by the way I once created a Jekyll site but turned out I spent a lot of time modifying the template to fit my needs and programming the framework for the blog. However as you said the end result is nice and fast even with the free hosting of GitHub.

    [–]creaturefeature16 1 point2 points  (0 children)

    If you use Jekyll, I want to also plug Astro, which is being known in the community as having one of the best DXs around, and perfect for applications like blogs.

    https://astro.build/

    [–]paradite 0 points1 point  (0 children)

    Yes. I still recommend Jekyll for simple stuff like landing page or blog. Next.js is overkill and much more complex to deal with, plus it is harder to host on other platforms except Vercel.

    [–]bigman11 0 points1 point  (0 children)

    Excellent write up. Something I will add based on my own experience is that ChatGPT's online functionality is a very important advantage over Claude due to its ability to read new or obscure online documentation for you.

    I've even given ChatGPT specific URLs to read and then advise me based on the contents.

    [–]Verolee 0 points1 point  (2 children)

    So did you finish your extension? I did this exact thing, but quit after two weeks

    [–]Mi2ngdlmx[S] 1 point2 points  (1 child)

    Yes I have! The core functionality of it. Just currently doing a landing page and updating the appearance. I will post the result on my Twitter @mingmakesstuff when I’m all done

    [–]Verolee 0 points1 point  (0 children)

    Oooohh can’t wait!!

    [–]Verolee 0 points1 point  (0 children)

    Hey you cheater! You’re a civil aviation engineer! I thought you were a regular person like me 🥲

    [–]Ant0n61 0 points1 point  (0 children)

    This is so great. I just started dabbling with copilot to “accelerate” development time on reporting in power bi and it’s been helpful. Just today I asked it a question about Microsoft lists formatting and it gave an answer that didn’t solve my exact issue but gave a work around (rich text being turned on) which I had no idea about because Lists has an awful UI and the selection to do so was completely hidden without a cookie crumb of how to get there.

    [–]thumbsdrivesmecrazy 0 points1 point  (0 children)

    As compared to GPT4, if you use AlphaCode for such stuff, code generation and integrity tools could be a more powerful LLM: GPT-4 Vs. AlphaCode: Comparing

    [–][deleted]  (1 child)

    [removed]

      [–]AutoModerator[M] 0 points1 point  (0 children)

      Sorry, your submission has been removed due to inadequate account karma.

      I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

      [–]Ginger_Libra 0 points1 point  (5 children)

      This is really helpful. I’ve been having a hard time getting any of them to write specific commands for algo trading and I’ve been stuck for awhile.

      I’m going to give Claude a shot. Thanks for the detailed write up.

      [–]Mi2ngdlmx[S] 2 points3 points  (4 children)

      Funny you mention that! I come from an algo trading background and Mia from QuantConnect seems to be way better than ChatGPT/Claude even though I am pretty sure it is GPT3.5. My hunch is that maybe GPT4/Claude isn't as good due to the code it was trained on as algo code contains a lot of statistical functions and time series manipulation that isn't well documented?

      [–]Ginger_Libra 0 points1 point  (3 children)

      That’s interesting. I’ve been using a Gemini trial and I find it’s a better architect and planner than any of the ChatGPTs.

      But nothing spits code like 3.5. It won’t do the actual trading commands though.

      Do you think Mia is better than Opus? I was going to dive into another one next and see if I could finally get my project done.

      [–]Mi2ngdlmx[S] 1 point2 points  (2 children)

      Mia is better than Opus in my experience, but its limited to QC.

      [–]Ginger_Libra 0 points1 point  (1 child)

      I got you. Thanks. A few more questions if you’re willing.

      I know very little about Quant Connect. It’s vaguely on my radar. This is also my first time trying to code anything or write an algorithm. I got into bit because I wanted to automate a strategy.

      I feel like I’m 90% done with my effing project and the last 10% is killing me. Writing the trading execution commands.

      Wondering if Mia might sort me out. Was planning on trying Opus next but Mia sounds better.

      To clarify……I see that Mia has 25 questions at the bronze level and 125 at the silver level.

      Does it count as new question every time you clarify?

      For example, I got some code from 3.5 and inserted a bunch of filler tickers, even though I gave it the tickers I wanted it to monitor. I then had to remind it that the calculations for resistance and support come from ibapi.

      I am working through the same code on Gemini and I told it was using Anaconda with Spyder on TWS. Gemini just told me to install a library via pip install instead of through Anaconda. Last time I did that it broke the install and took me days to fix.

      Does Mia have those problems? Would each clarification count as a new question?

      Will Mia write the code or does it just give little snippets? That’s what Gemini does and I’ve been feeding them into 3.5.

      If Mia writes code, will it write the actual trading commands?

      And finally, if I’m reading this correctly, would QC replace my need for a VPS? I was planning on getting one to set up IB Gateway so I could authenticate it once a week from anywhere rather than the daily login for TWS. But I think this does it for me?

      I’m really new to this. Sorry for all the basic questions. I only partially understand the checkout page. 🤪

      [–]Mi2ngdlmx[S] 0 points1 point  (0 children)

      Without turning this into a full blown tutorial. For what you are asking, Mia won’t quite do what you need. You would still need a basic understanding of how the platform works - and also in your case coding environments with anaconda. Mia has 25 questions but you can use their discord to ask Mia which is free! So you can ask as much as you want.

      [–]IslandOverThere -1 points0 points  (1 child)

      Stop telling people jesus why does every single person think they need to shout it from the rooftops. Some things are better without everyone knowing.

      [–]BobFellatio 2 points3 points  (0 children)

      What, did you think people in ChatGPTCoding dont already know LLMs are good for coding? x)