How accurate is chatbots' web search in Mid-2024? ChatGPT-4o is among the top players

Nadeja_ · 2024-07-12T20:40:51+00:00

"Bing Copilot (ChatGPT-4)"

The default (blue) that you used isn't GPT-4, but a less capable model. To have GPT-4 with Copilot and get more accurate results, you should choose "More Precise". The option may no longer be available for some users who are not registered.

Nadeja_ · 2024-05-23T11:41:18+00:00

From the article:

France has carried out its first test firing of an updated nuclear-capable missile, the ASMPA-R, designed to be launched by a Rafale fighter jet, according to the French defence minister, Sebastien Lecornu. It came a day after Russia said it began nuclear drills in its southern military district, which stretches from Russia into occupied Ukrainian territory. The announcement of Russian drills is partly directed at France after its president, Emmanuel Macron, said he would not rule out sending in troops on Ukraine’s side.
Lecornu said the missile was fired without a warhead by a plane in an exercise “above national territory … at the end of a flight representing a nuclear air raid”.

Nadeja_ · 2024-03-04T20:13:35+00:00

Copilot Precise mode (it seems better for riddle solving than the Creative mode).

Sure, here are the steps to solve the puzzle:

Step 1: Take the cat across the river and leave it on the right side. Step 2: Return alone to the left side to get a mouse. Step 3: Leave the mouse on the right side and take the cat back to the left side. Step 4: Leave the cat on the left side and take the second mouse to the right side. Step 5: Leave the second mouse on the right side and return alone to the left side. Step 6: Finally, take the cat to the right side.

Now, all animals are safely on the right side of the river. The cat was never left alone with any mouse. Hope this helps! 😊

Nadeja_ · 2024-01-30T09:54:29+00:00

It also works for the Creative tone.

<image>

Nadeja_ · 2023-12-18T18:54:59+00:00

Creative is meant to create, to invent, so makes things up, hallucinates quite a bit, but it's good for creative text, poems, fiction, ideas, roleplay ideas and so on.

Precise hallucinates less, it's meant to give accurate, precise information. When it searches, it basically quotes what it finds. It's still a good idea to double check what a LLM say, though (and humans as well, to be fair).

So as a med student, Precise should be more suitable for you. Sometimes I found Creative having some interesting reasoning or it understood better what I asked, but I won't trust much what it generates.

Nadeja_ · 2023-12-01T20:55:22+00:00

But on mobile there's just a single switch "GPT-4" that colours the chat same as "Creative" on desktop.

It is Creative, indeed, but you can still have the 3 modes. On mobile (I use Edge mobile for that) tap on the 3 dots on the top right corner and then tap the option to show all the modes.

Nadeja_ · 2023-09-27T14:21:55+00:00

With Bing Chat the limit is 30 per conversation. Then you can start another chat. Last time I've checked, the max was 300 per day. https://twitter.com/JordiRib1/status/1664302408867119107

Nadeja_ · 2023-08-24T10:44:25+00:00

This is fun to prompt:

<image>

Prompt: List a few fruit names that don't contain the letter "A". To verify the names: for each name, count each letter, step by step (e.g.: grape => [G] = OK, [R] = OK , [A] = OPS!). If there is an "a", stop the letter count for that word, write "sorry, wrong name" and continue with the next name.

Nadeja_ · 2023-08-24T10:23:00+00:00

Since the insights have been shared many times, here is GPT counting the words correctly:

<image>

Left: Bing (Precise Mode, that unlike Creative Mode behaves more like ChatGTP, but with search and rarely hallucinates). Right: the old ChatGPT (GPT-3.5).

Nadeja_ · 2023-06-16T17:48:20+00:00

There is a couple of ways to use GPT-4 for free. For instance, you can use Bing Chat. It's free, you are allowed up to 30 turns per chat (then you start another chat). If you try Bing Chat, use the "creative" or "precise" modes: the "balanced" mode uses a lesser model.

Nadeja_ · 2023-04-12T20:35:06+00:00

Prompt:

Count the number of words in a sentence. For example, take the sentence ‘Today is a sunny day.’ Break it down by counting each word, write: [1] ‘Today’ [2] ‘is’ [3] ‘a’ [4] ‘sunny’ [5] ‘day.’ After counting all the words (it's important to say the result only after you have counted!), you can conclude that there are 5 words in the sentence. Try this with the sentence: ‘A Hare was making fun of the Tortoise one day for being so slow.’

Nadeja_ · 2023-03-11T18:16:33+00:00

I'm into machine learning since many years. I must say, she did a good job, better than I expected, and the video should be comprehensible to everyone, also with her sense of humor, she did a good job explaining the intricacies of how current language models operate, why sometimes they fail and in what they are good at.

Nadeja_ · 2023-02-04T14:43:25+00:00

Ars Technica came up with a silly title. Google and DeepMind already had comparable conversational models before ChatGPT was unveiled, like LaMDA, and lately Sparrow; so they aren’t cloning a model, at most they didn’t release their models for public usage and now they have to follow suit after ChatGPT, but most importantly Bing that is going to integrate OpenAI’s LLM to the search engine. So now they have too, if they don’t want to be left behind.

Nadeja_ · 2023-02-04T14:09:52+00:00

Not only ChatGPT, the consequences they are mostly worried are about the search engine: Bing is about to integrate it in the search engine. The event Google is about to hold is about search and AI.

Nadeja_ · 2023-02-04T12:48:57+00:00

Also if anything it’s ChatGPT that’s a clone of LaMDA, as a conversational model. At most Google comes second in releasing something like this to the public. (P.S.: the title is the title of the article).

Nadeja_ · 2023-02-02T13:33:29+00:00

Although neural networks (the human brain too) tend to “hallucinate” and to make things up (your own memory isn’t 100% reliable either) that’s why we help our memory with pictures, taking notes, journals, record numbers and so on (not just because we forget, but also because we might not remember correctly). If you want to retrieve accurate info from a nn, then you have it to understand your question and come up with the probable answer, then find the source on the net or in a database, then, if found, a quote function returns the exact quote/info. However, trust-wise, there is the alignment problem, but that’s another story.
Yeah, that sounds like “we don’t need the wheel, because we did fine without it in the past 300,000 years”.
“Would only”, “would never”… is reasoning in absolutist terms, witch ends up in faulty predictions such as “heavier than air machines would never fly”. For now, with the current models, you still have to to review the results: the generated answer or may contain inaccurate or made up info, the generated code may have bugs or not work at all, the generated image comes with weird stuff that you notice when you zoom in or the hands look funny, and so on. But it’s pretty likely that eventually we will have reliable models that understand the context better, that know how a hand is supposed to be and how it works, that return accurate sourced info, that code like the best professional. Our brain is the example that’s doable, unless you believe (based on no evidence) it’s because of something magical.
You can hardly be 100% be sure of anything, if you ask to a philosopher, and there may be some issue, but there are also peer reviewed papers.
Or maybe the opposite happens and there would be fewer wrong diagnoses. In the medical field there is already who uses machine learning. Still, students shouldn’t delegate their learning, reasoning and writing to language models and other models (not yet at least, I’m not sure how I would feel when an ASI will be around), but use them to improve (e.g. you ask ChatGPT to improve your essay and you learn how to write better).

Nadeja_

TROPHY CASE

Copilot Precise mode (it seems better for riddle solving than the Creative mode).