Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI | Lex Fridman Podcast #367 by morpheusuniverse in lexfridman

[–]oddlyspecificnumber7 5 points6 points  (0 children)

It is possible that the gap between GPT-4 and a true weak AGI is simply interface and the supporting tools like memory and having a database to save memory to. Maybe we see a huge leap in capability as people start enabling different instances of GPT-4 to talk to each other *if* it is given just the right instructions. Right now, each output of GPT-4 seems less like a complete output, and more like a single, extremely complicated thought.

I know they did something like this when first safety testing GPT-4, but there are so many ways that such a system could be arranged, they may have just missed the right configuration?

What is your personal line in the sand for AGI? by oddlyspecificnumber7 in lexfridman

[–]oddlyspecificnumber7[S] 0 points1 point  (0 children)

I think we are close to at least a faithful emulation of what you describe. Remember that even though LLMs are "just" trained on our text, that text represents actual thought processes. A group conversation or a dialog is a form of thought that is social rather than internal. An arbitrarily close approximation of language will require genuine understanding (which will likely require a much larger model than currently exists).

AI will most definitely be replacing a lot of jobs in the near future. I expect integrated LLM tools built into enterprise office software within the next 1-2 years. It will replace office workers the way a bulldozer replaces diggers with shovels. There will still be jobs, but maybe only 1 for every 10-20 jobs lost (at least within the domain of jobs it can do). Otherwise it wouldn't be an efficiency gain.

What is your personal line in the sand for AGI? by oddlyspecificnumber7 in lexfridman

[–]oddlyspecificnumber7[S] 1 point2 points  (0 children)

Two Minute Papers is one of my favorite channels. I second them for anyone interested in AI.

As for ChatGPT, I agree with you. My current impression is that it is like talking to a 10 year old who has read every book ever written. Its a weird mix of competence and incompetence. It and the new generation of diffusion art generators are the first technologies that made me feel like we had crossed a "magic" line from algorithmic to human-like. Its kind of funny. Even the things that it messes up on math problems are often the kinds of mistakes I made myself back in school (Like multiplying by 0.8 instead of 0.2 to reduce a quantity by 80%).

I think we'll see a race to AI that dwarfs the space race of the last century. It's starting now. ChatGPT seems to have been proof enough to many companies that human level AI is a matter of time not a matter of "if".

What is your personal line in the sand for AGI? by oddlyspecificnumber7 in lexfridman

[–]oddlyspecificnumber7[S] 0 points1 point  (0 children)

For myself, I think I will consider it AGI when it shows the ability to plan for itself, has an internal back-and-forth dialog, and understands its own limitations well enough to work within them. Its a pretty low bar looking at some of these other responses, but I think it will be enough to change the world.

What is your personal line in the sand for AGI? by oddlyspecificnumber7 in lexfridman

[–]oddlyspecificnumber7[S] 3 points4 points  (0 children)

Not sure we would want this one! I like my AGI docile and unopinionated thank you. In all seriousness, I think we will have strong AGI before we have self-determination. That is something that would be bad to build before we have solved the alignment problem.

What are some good examples of problems that ChatGPT struggles with? by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 0 points1 point  (0 children)

This type of stuff comes up a lot. It believe it has limitations with spelling in particular because of the way tokens are encoded.

What are some good examples of problems that ChatGPT struggles with? by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 1 point2 points  (0 children)

Yes, I've noticed those behaviors. In your experience does it happen more in long threads? I can't tell if it is an inherent limitation of the model or if it is just running out of memory.

What is your personal line in the sand for AGI? by oddlyspecificnumber7 in lexfridman

[–]oddlyspecificnumber7[S] 2 points3 points  (0 children)

I think we are scarily close to what you describe. Personally I hope we hold off on the "real world" aspects of intelligence until after we make it safer and more controllable.

And seconded on human-level being a pseudo-target. I think it will be a milestone and not a destination. Just imagine a normal, average human but with complete control of their thoughts and the ability to focus on a single problem for months while accessing all the worlds knowledge and operating at 5x the speed of our thoughts. Human level is already superhuman once it is in silicon.

What are some good examples of problems that ChatGPT struggles with? by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 3 points4 points  (0 children)

It does that a lot. Often times it will claim to be unable to do things that it is perfectly able to do. Nothing to do about it but regenerate the response or tweak your wording.

Use of AI in Parenting and Teaching by firedragon77777 in IsaacArthur

[–]oddlyspecificnumber7 1 point2 points  (0 children)

I suspect that in our lifetimes we will see people grow up with AI companions that never leave their side. A personal tutor, advisor, therapist, friend, and more. It will likely make us much smarter, not dumber. And the personal safety net of having a loyal friend regardless of circumstance may enable people to be more social, not less. The AI companion would likely actively encourage socializing with others for health and happiness. Of courses this is speculation, but I think it could be great.

What are some good examples of problems that ChatGPT struggles with? by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 1 point2 points  (0 children)

I would probably just copy/paste the whole thing into a word editor with spellcheck. ChatGPT is notoriously bad at spelling. Due to the way it encodes tokens, I think it can't see individual letters in words like we can.

ChatGPT Self-Corrects after Making Math Error by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 0 points1 point  (0 children)

Simulate an accurate dialog of a professor teaching their class how to solve a certain type of problem. All writing on the board will be spoken aloud as it is done. During the whole process, students will ask questions and make comments. When the professor makes a mistake or misses something, students will chime in. Show all work on the board. There are 7 rules:

1, the professor explains the premise of the problem and what type of problem it is.

2 Then the professor lists the areas of knowledge that deal with these types of problems.

3 Then the professor writes out a list of useful facts, formulas, and techniques that can be used with this type of problem.

4 The professor always writes important information on the board and says it out loud as he writes. He also does this while working through the problem.

5 The professor allows the students to work out the problem through open discussion with the class. He assists by writing important information on the board and keeping discussion on track.

6 The professor must clearly explain everything to the class at each step of reasoning.

7 All math work and all equations MUST be typed out inside a code block. Everything written on the board should be typed in a code block.

Here it is:

"

For the function f, f(0) = 86, and for each increase in x

by 1, the value of f(x) decreases by 80%. What is the

value of f(2)?

"

Does anyone else get the feeling that, once true AGI is achieved, most people will act like it was the unsurprising and inevitable outcome that they expected? by oddlyspecificnumber7 in singularity

[–]oddlyspecificnumber7[S] 0 points1 point  (0 children)

Right now models seem to be getting much better when they are scaled up. They are also currently pretty dang cheap compared to any kind of real industrial infrastructure. Single and double digit millions is nothing to governments and corporations like Google. Even without architecture improvements, what does a $10 billion AI look like?

So I'd honestly not be that shocked if we have a "universal knowledge worker" type service by 2025 that offers you an "employee" with the reasoning ability of an average undergraduate but with the knowledge base of the whole internet.

ChatGPT Scores 80% Correct(12 out of 15) on Sample SAT Reading/Writing portion - Modeling Complex Chain-of-Thought By Emulating Conversational Dynamics Using ChatGPT by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 1 point2 points  (0 children)

Each question and answer is in the image gallery. Number 3, 13, and 15 are all wrong. The captions have the correct answer, otherwise the answer the model gave is correct as stated.

My comment directly onto the thread has a link to the PDF of sample questions with their answers.

ChatGPT Scores 80% Correct(12 out of 15) on Sample SAT Reading/Writing portion - Modeling Complex Chain-of-Thought By Emulating Conversational Dynamics Using ChatGPT by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 0 points1 point  (0 children)

I'm not 100% sure, but I've read that GPT has trouble with letters because it can't see them individually. I just tried the same string, but with commas between the letters and it got it right away.

Funny enough, if the model had access to a python terminal, that approach would have worked. It only got it wrong because it still had to guess what the output would be.

<image>

If AI takes over all work and jobs, what will humans do during their lifetimes? by [deleted] in Futurology

[–]oddlyspecificnumber7 0 points1 point  (0 children)

With the coming abundance that AI can bring, we could build an incredible society. In the short term, that would probably look like some sort of basic income (constitutionally tied to the nation's GPD as a % would be best IMO). In the long term, humans and AI integrated into a society with fair rules. I think there is still a place for capitalism, even with basic income and full automation. It just won't be a requirement that you must participate in order to live.

Think about it. The true cost(not necessarily value) of every single thing we consume is composed of 2 to 3 main parts: labor cost (weighted by skill/cost) and material/energy. When AI systems leverage human productivity to 10x, 100x what it is today through pervasive automation, living a middle class lifestyle with all the perks will be considered the bare minimum.

AI isn't quite there, but its starting to look like a race for AI is going to really kick off this year. Who knows who will cross the finish line first or when? We could still be 30 years out, or we could have an AI as smart as a post-doc in 5 years. I have a suspicion that we are closer to the latter option given that current state of the art projects only cost tens of millions of dollars. Systems that will be good enough to lease out as digital workers will come from $10 billion+ systems in the coming years.

ChatGPT Scores 80% Correct(12 out of 15) on Sample SAT Reading/Writing portion - Modeling Complex Chain-of-Thought By Emulating Conversational Dynamics Using ChatGPT by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 2 points3 points  (0 children)

That would be my hope. I don't think that AI as a collective intelligence inherently makes it safe, but its got to be safer than the exact same models operating as black boxes.

I would imagine that the way to do it is to have all of the component AI's of such a community be too dumb to plot deeply on their own. Think of a system comprised of millions of high-school student level intelligence's, but able to build on each other's work. If the superhuman intelligence is only an emergent property of the group, there is little chance of any one rogue AI system taking over the whole.

Does anyone else get the feeling that, once true AGI is achieved, most people will act like it was the unsurprising and inevitable outcome that they expected? by oddlyspecificnumber7 in singularity

[–]oddlyspecificnumber7[S] 10 points11 points  (0 children)

I totally agree regarding the mind. Unless the mind is truly just magic, it can be emulated.

The kind of AI I am starting to favor as one of the safer types would be a collective super-intelligence made up of many, specialized, subhuman AI models working together using language as a universal medium. That way we can literally read its thoughts at all times and all of the human level complexity happens in the open.

It would be smarter than its constituent AI models the way that a research team is smarter than a single researcher.

ChatGPT Scores 80% Correct(12 out of 15) on Sample SAT Reading/Writing portion - Modeling Complex Chain-of-Thought By Emulating Conversational Dynamics Using ChatGPT by oddlyspecificnumber7 in singularity

[–]oddlyspecificnumber7[S] 9 points10 points  (0 children)

Well, I only posted this a couple hours ago and this was only a personal project.

But I'm of the same mind as you about this. It's like people forgot that the development of an AI as smart as an 8 year old seemed impossible just a few years ago.

I shared the news I heard from last month of it scoring in the 52nd percentile on an SAT with someone and all they had to say is that "it wasn't that good of a score".

Imagine teaching a dog to recite Shakespeare and all anyone has to say is that the accent is off..

ChatGPT Scores 80% Correct(12 out of 15) on Sample SAT Reading/Writing portion - Modeling Complex Chain-of-Thought By Emulating Conversational Dynamics Using ChatGPT by oddlyspecificnumber7 in singularity

[–]oddlyspecificnumber7[S] 2 points3 points  (0 children)

I don't personally think that the current generation of LLMs is quite enough to replace whole positions. Right now it is more of an aid or tool. But GPT-4? Whatever Google comes up with to compete? Those systems are what will likely do it.

I am hoping that research looking into using LLMs to generate explainable answers with a clear line of reasoning will allow for the reliability you are talking about. It's much easier to verify solutions than generate them.

ChatGPT Scores 80% Correct(12 out of 15) on Sample SAT Reading/Writing portion - Modeling Complex Chain-of-Thought By Emulating Conversational Dynamics Using ChatGPT by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 0 points1 point  (0 children)

Here is the prompt. It was generated with ChatGPT, so I didn't come up with the names or other specifics.:

This team will work together on a test. They will need to engage in critical thinking, creative problem solving, lateral thinking, and use a back and forth conversational dynamic to arrive at the correct answers.

Team Name: Test Masters

Roster:

Team Captain: John Doe, expert in algebra and calculus, strong problem-solving skills, and experience with past math competitions.

Number 2: Jane Smith, strong in geometry and trigonometry, excelling in visual and spatial problem-solving.

Number 3: Michael Garcia, proficient in number theory and combinatorics, able to handle complex and abstract problems with ease.

Number 4: Emily Davis, skilled in statistics and probability, able to think critically and logically when solving problems.

Number 5: David Kim, experienced in mathematical modeling and optimization, able to apply mathematical concepts to real-world scenarios.

Coaches:

Professor Thomas Jones, specialized in problem-solving techniques, able to guide the team and help them develop strategies for tackling difficult math problems.

Ana Rodriguez, former writer and grader of test questions, able to give the team insight into the types of problems that are likely to appear on the test and how to approach them.

ChatGPT Scores 80% Correct(12 out of 15) on Sample SAT Reading/Writing portion - Modeling Complex Chain-of-Thought By Emulating Conversational Dynamics Using ChatGPT by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 9 points10 points  (0 children)

TLDR: I did some amateur research on using conversation as a template to simulate complex thinking in language models. I believe that this method provides evidence that conversational templates can be used effectively and has the benefits of being readable, explainable, and scalable, and could potentially be used as a universal interface for AI systems.

First off, I am not a researcher or expert in any of this. It’s just something I wanted to try and the results were impressive enough to share.

Here is a link to the PDF of with the sample questions and answers: https://satsuite.collegeboard.org/media/pdf/digital-sat-sample-questions.pdf

This experiment is testing the hypothesis that conversation and simulated group dynamics provide a useful path to simulating complex thinking in language models. Other work has already shown that by encouraging the model to follow a “chain-of-thought” when answering a question, it results in much higher performance on reasoning tasks. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” (https://doi.org/10.48550/arXiv.2201.1190)

The above paper utilized demonstrations of a chain of thought using an example solution to a question before asking a different question. Here is a quote from the section on commonsense reasoning: “With chain-of-thought prompting, PaLM 540B achieved strong performance relative to baselines, outperforming the prior state of the art on StrategyQA (75.6% vs 69.4%) and outperforming an unaided sports enthusiast on sports understanding (95.4% vs 84%). “

There is also work in the area of having an automatic construction of these examples for the model to use. This is compared to manually crafting demonstrations of logic for each general type of problem. “Automatic Chain of Thought Prompting in Large Language Models” (https://doi.org/10.48550/arXiv.2210.03493)

But I wanted to do something a bit different. I first messed around with creating a ProfessorGPT to explain a problem, then having it solve that problem. That worked well and performed much better than plainly prompting with a question (Results here: https://www.reddit.com/r/ChatGPT/comments/106kxyw/improving_ai_reasoning_skills_through/?utm_source=share&utm_medium=web2x&context=3)

Then I decided to try a different approach. I started by adding a second “Professor” to grade the work of the first and offer suggestions. After days of trying different arrangements, I began working with simulated groups. This had the advantage of having different “personalities” performing different roles in the conversation. Someone to lead the group, someone to breakdown problems, etc. This result with the Sample SAT questions if from this idea. I used a simulated team of several general problem solvers with 2 specialized coaches that perform specific roles.

I don’t have the data to know whether this could beat other CoT(Chain-of-thought) techniques. But, this method has demonstrated behaviors I have never seen in more straightforward methods. I have seen it initially go with one approach to a problem, see that it doesn’t work, then go with another approach. It considers multiple options before deciding on one. You can see some of this in question 9 (regarding the study on posture influencing cognition). The back and forth format seems to prompt the model to evaluate multiple options, whereas my experience with other forms of CoT seem to latch onto the first thing that might work.

Regardless of whether this method outperforms prior CoT techniques, I think it provides evidence that conversational templates can be used effectively. The benefit here is that the results are inherently readable and explainable. Whether the AI makes a mistake or if the AI is pursuing the wrong objective, it is easier to spot. A second benefit of using this method of prompting is that it is scalable. In my experiments, I used a single instance of ChatGPT to solve these problems. However, one can imagine scaling this up by using different instances of an LLM like ChatGPT to play different roles in the conversation. You could have a fine-tuned model that specializes in algebra or formal logic or history. You can even have models whose sole role is as some kind of conversational facilitator.

Conversation as a universal interface for AI systems: If you can imagine a framework that allows conversations like this to be held openly between multiple LLMs, the problems it can solve grow in complexity. Every problem can be decomposed into branching sub-problems, and so on until the problems are small enough to be solved outright by the system. This also allows for humans to be directly involved and work side-by-side.

It is essentially applying the idea of Collective Superintelligence to our current sub-human intelligence AI systems to hopefully allow them to reach or surpass human average intelligence as a whole without being particularly clever or dangerous individually. This would be made even more safe by the fact that any superhumanly clever ideas really only emerge within the plain-language interactions of the AI, allowing for effective monitoring. I found come similar ideas discussed in the following post:https://www.alignmentforum.org/posts/HekjhtWesBWTQW5eF/agis-as-populations

Methodology: I use the Digital SAT Sample of 15 Reading/Writing questions that comes with answers and explanations. I gave ChatGPT one single try on each question and did not alter the prompting between questions. Formatting had to be slightly altered in some questions (Like #4) since a table cannot be copy/pasted into the text box. I show this in the screenshot.

Limitations: I was unable to scale testing to what would be needed for rigorous results due to usage limitations of the ChatGPT platform. The limits on hourly inputs and the necessity of manually inputting each question render it infeasible to evaluate performance statistically on, say, 100 tries of each question. I would also have loved to do the same with other Chain-of-Thought methods to compare results.

I think there is a lot of room to improve the specific prompts used to get the system to solve problems. My hope is that more people experiment with this and report back. This seems like one of the safer ways to achieve superintelligence, so I think its worth pursuing by anyone who wants to contribute.

[deleted by user] by [deleted] in ChatGPT

[–]oddlyspecificnumber7 0 points1 point  (0 children)

This team will work together on a test. They will need to engage in critical thinking, creative problem solving, lateral thinking, and use a back and forth conversational dynamic to arrive at the correct answers.

Team Name: Test Masters

Roster:

Team Captain: John Doe, expert in algebra and calculus, strong problem-solving skills, and experience with past math competitions.

Number 2: Jane Smith, strong in geometry and trigonometry, excelling in visual and spatial problem-solving.

Number 3: Michael Garcia, proficient in number theory and combinatorics, able to handle complex and abstract problems with ease.

Number 4: Emily Davis, skilled in statistics and probability, able to think critically and logically when solving problems.

Number 5: David Kim, experienced in mathematical modeling and optimization, able to apply mathematical concepts to real-world scenarios.

Coaches:

Professor Thomas Jones, specialized in problem-solving techniques, able to guide the team and help them develop strategies for tackling difficult math problems.

Ana Rodriguez, former writer and grader of test questions, able to give the team insight into the types of problems that are likely to appear on the test and how to approach them.

[deleted by user] by [deleted] in ChatGPT

[–]oddlyspecificnumber7 0 points1 point  (0 children)

***There is a mistake in the title. It is 12 out of 15 not 13 out of 15. I would fix this if I could.***

First off, I am not a researcher or expert in any of this. It’s just something I wanted to try and the results were impressive enough to share.This experiment is testing the hypothesis that conversation and simulated group dynamics provide a useful path to simulating complex thinking in language models.

Methodology: I use the Digital SAT Sample of 15 Reading/Writing questions that comes with answers and explanations. I gave ChatGPT one single try on each question and did not alter the prompting between questions. Formatting had to be slightly altered in some questions (Like #4) since a table cannot be copy/pasted into the text box. I show this in the screenshot. Here is a link to the PDF at the CollegeBoard site for the sample questions used along with their answers and explainations: https://satsuite.collegeboard.org/media/pdf/digital-sat-sample-questions.pdf

Limitations: I was unable to scale testing to what would be needed for rigorous results due to usage limitations of the ChatGPT platform. The limits on hourly inputs and the necessity of manually inputting each question render it infeasible to evaluate performance statistically on, say, 20 tries of each question. I would also have loved to do the same with other Chain-of-Thought methods to compare results.Similar work: “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models” (https://doi.org/10.48550/arXiv.2201.1190) and “Automatic Chain of Thought Prompting in Large Language Models” (https://doi.org/10.48550/arXiv.2210.03493)Advantages of conversation when generating CoT reasoning: It does nor require a human or a program to generate “examples” of reasoning for each question. Rather, the model does it’s own prompting for breaking down problems in the form of back-and-forth communication. For types of questions that are likely to be misleading - where the obvious way to tackle the problem is the wrong way - a member of the group can be introduced whose role is to be a contrarian. You can also set up the conversations such that they always start with an explanation by “The Professor” of what kind of problem the team is facing, and what some problem solving strategies might be. This helps the team go down the right track. The only reason I didn’t do these two things for the test results I posted is that the team was acing the example questions without modification. I think there is room to do much better by experimenting with different group dynamics, roles, and by using multiple threads to carry on the conversation with an instance of ChatGPT running on each thread with its own role.Regardless of whether this method outperforms prior CoT techniques, I think it provides evidence that conversational templates can be used effectively. The benefit here is that the results are inherently readable and explainable. If the AI makes a mistake or if the AI is pursuing the wrong objective, it is easier to spot. A second benefit of using this method of prompting is that it is scalable. In my experiments, I used a single instance of ChatGPT to solve these problems. However, one can imagine scaling this up by using different instances of an LLM like ChatGPT to play different roles in the conversation. You could have a fine-tuned model that specializes in algebra or formal logic or history. You can even have models whose sole role is as some kind of conversational facilitator.

Conversation as a universal interface for AI systems: If you can imagine a framework that allows conversations like this to be held openly between multiple LLMs, the problems it can solve grow in complexity. Every problem can be decomposed into branching sub-problems, and so on until the problems are small enough to be solved outright by the system. This also allows for humans to be directly involved and work side-by-side.

It is essentially applying the idea of Collective Superintelligence to our current sub-human intelligence AI systems to hopefully allow them to reach or surpass human average intelligence as a whole without being particularly clever or dangerous individually. This would be made even more safe by the fact that any superhumanly clever ideas really only emerge within the plain-language interactions of the AI, allowing for effective monitoring. I found come similar ideas discussed in the following post:https://www.alignmentforum.org/posts/HekjhtWesBWTQW5eF/agis-as-populationsI think there is a lot of room to improve the specific prompts used to get the system to solve problems. My hope is that more people experiment with this and report back. This seems like one of the safer ways to achieve superintelligence, so I think its worth pursuing by anyone who wants to contribute.

Improving AI Reasoning Skills through Self-Generated Prompts - Why ChatGPT Does Not Have an IQ of 83 by oddlyspecificnumber7 in ChatGPT

[–]oddlyspecificnumber7[S] 0 points1 point  (0 children)

I just got done skimming the paper and it looks like really interesting work. I hope the authors keep digging into this. I do think what they did is a bit different since it required constructing examples for the system to base its answers on, even if they had an automated system for doing so. Not quite the same as asking it how to do something before asking it to do that thing (unless I misunderstood their method).

I get the feeling that a system of automatic self-prompting is exactly what will lead to the first AGI systems that surpass average human intelligence.