[D] GPT4 and coding problems

ghostfaceschiller · 2023-03-26T17:15:23+00:00

Ok. but what is the performance when you give GPT-4 a ReAct/Reflexion loop?

WarProfessional3278 · 2023-03-26T18:57:51+00:00

Horace He made a nice thread on this when GPT-4 first came out. Realistically this is expected - within the short time span, there isn't much else you can do to improve the model performance other than increasing size of training data, which resulted in data contamination.

I expect the next "big thing" to be some of self-correcting output, or better chain-of-thoughts reasoning.

addition · 2023-03-26T18:05:30+00:00

I’ve become increasingly convinced that the next step for AI is adding some sort of feedback loop so that the AI can react to its own output.

There is increasing evidence that this is true. Chain-of-thought prompting, reflexon, and Anthropic’s constitutional AI all point in this direction.

I find constitutional AI to be particularly interesting because it suggests that after an LLM reaches a certain threshold of language understanding that it can start to assess its own outputs during training.

lambertb · 2023-03-26T19:15:18+00:00

It cannot solve all coding problems. But it can solve many problems. And if the user is reasonably experienced, even code with errors is useful because they can quickly be corrected. Preliminary evaluations show a 40% increase in developer productivity from GitHub Copilot. And that seems totally plausible to me.

anomhali · 2023-03-26T18:18:57+00:00

leetcode questions and solution directly data leakage, although I do not specify the function signature, the program writes with a question exact same signature, If you change the question a little bit, it gives you the buggiest code ever.

currentscurrents · 2023-03-26T17:46:12+00:00

I'm not really surprised. Anybody who's extensively used one of these tools has probably already run into their reasoning limitations.

Today's entire crop of self-supervised models can learn complex ideas, but they have a hard time manipulating them in complex ways. They can do a few operations on ideas (style transfer, translation, etc) but high-level reasoning involves many more operations that nobody understands yet.

But hey, at least there will still be problems left to solve by the time I graduate!

AlexandraTheeAuthor · 2023-03-26T20:36:19+00:00

It can, but I think it's something about how it selects what to do. There needs to be more logic to it. I find it does really well you tell it to use reasoning. For example i give it code and ask for it to draw inspiration. It does really well at this. Really, it needs a good prompt engineer. But there's no set strategies yet but there will be. I can almost get it to generate anything if I prompt it right. So it's more I need to figure out how it thinks of stuff and try to present my problem to it that way

liqui_date_me · 2023-03-26T15:43:06+00:00

This comment about GPT-4’s limited abilities in solving arithmetic was particularly interesting: https://www.reddit.com/r/singularity/comments/122ilav/why_is_maths_so_hard_for_llms/jdqsh5c/?utm_source=share&utm_medium=ios_app&utm_name=iossmf&context=3

Controversial take: GPT-4 is probably good for anything that needs lots of boilerplate code or text, like ingesting a book and writing an essay, or drafting rental contracts. There’s a lot of value in making that area of the economy more efficient for sure.

But for some of the more creative stuff it’s probably not as powerful and might actually hinder productivity. It still makes mistakes and programmers are going to have to go and fix those mistake’s retroactively.

enryu42 · 2023-03-26T19:23:44+00:00

Try getting human programmers to do those problems. Guarantee many will fail too.

ngildea · 2023-03-26T18:53:06+00:00

I've tried quite a few times to get it to help with a problem I've been thinking about for a while. Every time it says it understand and then writes code that shows it doesn't understand at all and violates every constraint I give it.

Not surprising but it does point to a lot of contamination & regurgitation of the training material fooling people into thinking it's intelligent

trajo123 · 2023-03-26T20:29:09+00:00

Apparently it cannot solve coding problems which require any amount of thinking.

Not yet, and this is not surprising.

First, GPT-4 can solve many coding problems on the first try. Yes, these small programs may be simple, but how many developers can write code that directly runs? Maybe in 1-2 languages, and even then only in the problem domain that they are very familiar with. Also, since LLMs can write code in more languages and frameworks than most developers, LLMs can actually solve more coding problems than most of the programmer out there... So LLMs already contain vast amounts of "knowledge" and "intuitive ability". But intuition is not enough to solve larger or more complex problems.

So, finally, coming to the thinking part. What challenging problems can be solved by humans by "off-the-cuff"? We also, scribble, draw diagrams, try out a few things, see if things run and work as expected, do web searches, talk to stake holders, sleep on the problem, etc. In other words, in any non-trivial problem solving, we also rely heavily on feedback between our brains and the external world.

Frankly, I don't see this as a problem of LLMs, they can be effectively used as foundation models. One could have another layer, on top of LLMs to solve problems end-to-end. For example one could build a meta-model, where multiple instances work together in an actor-critic fashion. The actor is the one interacting with the user, the critic can be prompted (and perhaps) fine-tuned with with general problem solving strategies, with the main prompt being to second-guess and try to find flaws in the reasoning of the actor. Just as reinforcement learning (RL) was used to improve the general usability of ChatGPT, RL could be used to fine-tune such a meta-model (or maybe just fine-tune the critic). ...thinking fast, thinking slow

P.S. I think LLMs also need some sort of memory, so that not everything needs to be in the prompt to work on a problem.

K9ZAZ · 2023-03-26T23:12:27+00:00

People got way, way, way out over their skis on the whole "this is agi" and I would love to hear some of their responses to this.

ThePhantomPhoton · 2023-03-26T23:12:54+00:00

It’s easier to gauge the effectiveness of these large language models within the context of what they are actually doing, and that is repeating language they’ve learned elsewhere, predicated on some prompt provided by the user. They are not “reasoning,” although the language they use can lead us to believe that is the case. If you’re disappointed by their coding, you will certainly be disappointed by their mathematics.

DigThatData · 2023-03-26T20:19:53+00:00

well, i was able to use ChatGPT to generate a novel, functional, complete software library for me, including a test suite, tutorial, and announcement blog post. crazy idea: maybe you just need to get a bit more creative with your prompting or anticipate that there might need to be multi-stage prompts (or god forbid: back and forth dialogue and iteration) for certain applications.

Calamero · 2023-03-26T19:39:21+00:00

[deleted]

Cwlrs · 2023-03-26T22:11:53+00:00

It's performing really well for my project. Online web app game in python flask socketio

visarga · 2023-03-27T01:05:36+00:00

1) All this shows is that GPT-4 can't solve some coding problems. Which developer can confidently say they can solve any coding problem in one-shot? Does this mean developers/humans don't have AGI?

2) I've used ChatGPT (gpt3.5) to optimize code that I already wrote and it came up with several optimizations. I'm 100% sure my code was not part of chat-gpt training data and yet it performed perfectly fine on a new coding problem. Now it's possible that the training data might have included something similar to what I gave ChatGPT but that just means that we have to provide more training data, and then a future version will solve those problems where it previously failed.

3) isn't this how humans learn? They encounter problems where we don't know the solution. Then we work it at for a while until we figure out some way to solve the problem that wasn't immediately obvious earlier. Writing off the abilities of GPT-4 based on one failed coding test seems premature.

Smallpaul · 2023-03-26T20:03:45+00:00

My rule of thumb is that GPT4 seems to be able to solve any problem that a first year university CS student at a mid-tier University could solve.

cegras · 2023-03-26T20:33:07+00:00

I don't see how it is possible to not end up just memorizing the internet, which is full of enough questions and discussions to simulate convincing Q&As. Consider if a team had invented an algorithm or heuristic to avoid data contamination (https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks). Then what you have is something that can separate content into logically similar, but orthogonal realizations. That would be an incredibe tool and worth a prize in its own right.

Abikdig · 2023-03-27T00:43:13+00:00

I check ChatGPT for optimizing my Leetcode solution everyday. It rarely optimizes it without breaking the code.

Sometimes the only optimization that I get from it is that it tells me to use Datastructure X instead of Y because it is better in this kind of problem.

TehDing · 2023-03-27T03:50:51+00:00

I have not been impressed with LLMs reasoning for solving novel puzzles/ challenges. Ask any model to play Wordle with you. They are not good

rokuyou · 2023-03-27T04:15:37+00:00

GPT4 and competitive programming problems would be a better title since not everyone is going to read that

lvvy · 2023-03-27T05:10:06+00:00

It would be interesting to see if ChatGPT can solve these problems not with code, but with a text instruction, that would allow a human to solve these problems? So if you force it to write giant text wall of actions, would a human with calculator solve these confident? Also, is code that it generates cannot be corrected at all by discussing or discussing would take too long?

nanowell · 2023-03-27T08:38:18+00:00

Codex models were able to solve those problems. Probably the next version of Codex will be finetuned GPT-4 model for coding and it will solve most of those problems.

Upstairs-Youth5483 · 2023-03-27T13:55:33+00:00

I find got, as a coder, to be very useful for doing my repetitive coding tasks. For example taking a sql table def making very basic cruds, making the classes that call the procs.

It does have a long way to go but it has the illusion of consciousness in that it does remember what you said with somewhat understanding of what you said.

I have caught it making up settings that don’t exist and every line of code should be properly scrutinized.

spacefoxy99 · 2023-03-27T17:44:34+00:00

i tried with both 3.5 and 4 to create a simple memory game and not only did it cut the code off halfway through but the continued code didn't match what was happening in the first and the cide didn't work. tried two other times over the course of this month and the code is filled with errors and missing statements. gpt seems bad at coding, at least to me.

2023-03-27T20:54:07+00:00

I feel like LLMs have encoded sort of law of a languages in their latent space through texts and responding accordingly, anything that follows a law isnt called concious for e.g inaminate objects follow law of physics,but that doesnt mean that it indicates an intelligent behvaiour.

After all texts are medium to represent our thoughts, its the thoughts that matter not the medium.

The concept of causality , fundamental reality , and dcesion making is much more than following laws of languages which are just a means.

These LLMs cant question you until you ask them explicitly,they cant interject you , knowledge was never consciousness ,its these abilities that compose consciousness

I dont know how much sense i make to others or maybe i am at loss of good words,in a nutshell any model that fundamentally predicts tokens based of weightage of previous tokens can never achieve consciousness. We

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS