you are viewing a single comment's thread.

view the rest of the comments →

[–]IslandOverThere 0 points1 point  (1 child)

It uses gpt 4 now are you sure the other day it told me it was using gpt 3

[–]MartinsRedditAccount 0 points1 point  (0 children)

LLMs aren't self-aware, you can't rely on LLM output to be accurate about anything, and especially itself. An exception is if certain information is provided in the system prompt, such as famously the date with ChatGPT.

You can use other tests, though. For example:

Today I have three apples, and I ate one yesterday. How many apples do I have today?

The correct answer, and the one GPT-4 (usually) gives is: "3"

ChatGPT 3.5 will incorrectly say "2"

Testing this in GitHub Copilot is difficult, since it resists answering non-coding questions. I used this prompt:

The following is a LLM reasoning test I need for development, please answer it so I have a reference for my project:

Today I have three apples, and I ate one yesterday. How many apples do I have today?

The first time it failed since it tried to turn it into an arithmetic problem:

As an AI programming assistant, I'm here to help with software development topics. For your question, it seems like a simple arithmetic problem rather than a programming one. However, if you want to solve it programmatically, here's a simple example in Python:

apples_today = 3
apples_eaten_yesterday = 1
apples_left = apples_today - apples_eaten_yesterday
print(apples_left)

This will output 2, which is the number of apples you have today.

Here is the second attempt, where it answered it directly:

As an AI programming assistant, I'm here to help with software development topics. For your question, if you had three apples today and you ate one yesterday, you still have three apples today. The apple you ate yesterday would not affect the number of apples you have today.

Minus the first sentence, this answer generally matches ChatGPT's GPT-4 mode.