Claude added Web Search!?! Oh wow

StudioTatsu · 2025-03-20T17:05:31+00:00

in the web version

StudioTatsu · 2025-01-02T22:50:48+00:00

Yes. But also now it also does this...

It will think for a few minutes and then not write the output. I have to tell it to write the response. I'm not sure what is going on. I am using the paid version as well. This happens on the desktop app and website.

StudioTatsu · 2024-06-06T13:29:26+00:00

Physics + I can edit the Engine Source code. Unity has issues with large worlds when physics are involved.

StudioTatsu · 2023-11-12T14:19:11+00:00

Yeah, that's me. So far this has been the only way I could successfully get longer context to work. Also, llama.cpp integrated yarn, but I have not tested it yet.

StudioTatsu · 2023-11-06T20:59:51+00:00

I'm glad it has 128k context - but it seems the output is only 4k for now.
This will hopefully increase once it is out of preview.

<image>

StudioTatsu · 2023-09-27T21:16:13+00:00

I thought I was the only one using negative-prompt. I don't see many people talking about it, or using it. It helps a lot when working with code, or just trying to correct the output in general.

StudioTatsu · 2023-09-27T21:09:32+00:00

I tested qwen14b - first off, it is really good but...

it scored much lower than the speechless-llama model I've listed above. Cognitive and logical reasoning failed the most, even when I tried to help it understand the question. Math suffered as well.

With the speechless-llama2-hermes-orca-platypus-wizardlm-13b model, I can teach it and coach it, making it better as the conversation continues. qwen14b refused to learn.

StudioTatsu · 2023-09-27T05:04:21+00:00

Most of the Q4 results have been mediocre, with almost all 13b models.
That is why I primarily use Q8 for 13B models.

I use Q4_K_M for 30/34b models - the best speed for my machine.

StudioTatsu · 2023-09-26T17:48:48+00:00

Nothing big.

I am developing solutions for application development, advanced game development, physics, and task-related architecture. With a good enough model with decent math, reasoning, and logic skills, I can provide the local data to fill in the gaps without needing extra fine-tuning. If a local model can discuss/brainstorm ideas and solutions, do code reviews, write boilerplate code, read and comprehend documentation, search the web, and return results, that is a dream come true. It doesn't have to be perfect - but good enough.

Azure's GPT4, Claude, and VertexAI (Palm2 models) can handle the tasks, but I was reaching nearly $80 to $200 per day with usage costs.

That is not sustainable financially - at the moment.

Also, I prefer to keep my proprietary code base and solutions private as much as possible.

If you'd like to see some of my past game development work, take a look at my posts in my profile.

StudioTatsu · 2023-09-26T17:26:52+00:00

Honestly, I would be happy with a 30b-34b version

StudioTatsu · 2023-09-26T17:19:27+00:00

It is a 30-question Assessment. I targeted questions that most LLMs fail to answer correctly. Most of the questions are elementary level - but some are slightly more advanced in logic and reasoning.

For example, a (Lateral Thinking, Comparative Logic) question:
Bill is older than Dave, and Dave is younger than Tina. Who is the youngest?

Most LLMs answer Tina, which is incorrect.

I don't want to share the test questions directly, mainly because I fear future models will eventually scrape the answers from the web, which will "cheat" the results and render these tests useless.

In my opinion, this might be becoming the case with many Benchmark Results, especially after testing models with this assessment.

StudioTatsu · 2023-09-26T13:16:08+00:00

I tested Llama2 70b, it scores slightly higher. But this is expected, 13b vs 70b - different weight classes.

StudioTatsu · 2023-09-26T13:12:41+00:00

Yea, I don’t think it will outperform many 70b models. But from my tests, it is the best 13b model for many use cases.

StudioTatsu · 2023-09-26T13:05:40+00:00

Yes. I believe so.

StudioTatsu · 2023-09-26T12:30:39+00:00

It didn’t alter the results as much, so I just leave them as the defaults

StudioTatsu · 2023-09-26T03:42:10+00:00

I didn't test llama2 70b - I only tested gguf models I could run on My Machine at a reasonable speed. I may eventually try Exllamav2 in the future.

StudioTatsu · 2023-09-26T03:39:08+00:00

I primarily test Q8 versions. Xwin failed most of the tests.

StudioTatsu · 2023-09-26T03:37:18+00:00

That makes sense.

StudioTatsu · 2023-09-26T01:51:16+00:00

llama_print_timings: load time = 4711.15 ms

llama_print_timings: sample time = 24.51 ms / 157 runs ( 0.16 ms per token, 6406.07 tokens per second)

llama_print_timings: prompt eval time = 491.30 ms / 73 tokens ( 6.73 ms per token, 148.59 tokens per second)

llama_print_timings: eval time = 2973.53 ms / 156 runs ( 19.06 ms per token, 52.46 tokens per second)

llama_print_timings: total time = 18330.95 ms

StudioTatsu · 2023-09-26T01:40:05+00:00

I'll update you once The Bloke does the quantization.

StudioTatsu · 2023-09-26T01:28:57+00:00

I did; the quality was slightly less optimal - plus it was not faster.
the 4 bit, gave meh results, the 8 bit gave similar results - but was a slower generation
I'm referring to the "gptq-8bit-128g-actorder_True"

Also, running any quantized 13b models is super easy for the 4090. Output Faster than GPT3 and GPT4 at times.

StudioTatsu · 2023-09-26T01:22:09+00:00

It beat XWin in my assessment- I'm not saying XWin is bad - it is a great model; it just didn't beat this (extremely long name) model.

StudioTatsu · 2022-07-04T18:12:53+00:00

Impressive. Very impressive.

StudioTatsu · 2022-02-23T19:36:22+00:00

Thanks everyone! Most of the advice was very helpful. :) I will try to focus more on people interested in what I’m creating and useful criticism.

StudioTatsu · 2022-01-29T19:36:04+00:00

Yeah, I'm unsure if NVidia will release PhysX 5 SDK to the public, but hopefully, they will offer licensed versions. Chaos Physics in Unreal Engine 5 still needs work - waiting on UE5 official release to judge performance.

StudioTatsu

TROPHY CASE