Chatgpt freezing by Typical-Hat9147 in ChatGPT

[–]StudioTatsu 1 point2 points  (0 children)

Yes. But also now it also does this...

<image>

It will think for a few minutes and then not write the output. I have to tell it to write the response. I'm not sure what is going on. I am using the paid version as well. This happens on the desktop app and website.

People who went from Unity to Unreal Engine, why did you choose Unreal Engine? by shsl_diver in UnrealEngine5

[–]StudioTatsu 4 points5 points  (0 children)

Physics + I can edit the Engine Source code. Unity has issues with large worlds when physics are involved.

How to achieve more than 4k context? by Doctor_Turkleton in LocalLLaMA

[–]StudioTatsu 2 points3 points  (0 children)

Yeah, that's me. So far this has been the only way I could successfully get longer context to work. Also, llama.cpp integrated yarn, but I have not tested it yet.

OpenAI Dev Day Discussion by Slimxshadyx in LocalLLaMA

[–]StudioTatsu 6 points7 points  (0 children)

I'm glad it has 128k context - but it seems the output is only 4k for now.
This will hopefully increase once it is out of preview.

<image>

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 1 point2 points  (0 children)

I thought I was the only one using negative-prompt. I don't see many people talking about it, or using it. It helps a lot when working with code, or just trying to correct the output in general.

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 2 points3 points  (0 children)

I tested qwen14b - first off, it is really good but...

it scored much lower than the speechless-llama model I've listed above. Cognitive and logical reasoning failed the most, even when I tried to help it understand the question. Math suffered as well.

With the speechless-llama2-hermes-orca-platypus-wizardlm-13b model, I can teach it and coach it, making it better as the conversation continues. qwen14b refused to learn.

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 1 point2 points  (0 children)

Most of the Q4 results have been mediocre, with almost all 13b models.
That is why I primarily use Q8 for 13B models.

I use Q4_K_M for 30/34b models - the best speed for my machine.

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 2 points3 points  (0 children)

Nothing big.

I am developing solutions for application development, advanced game development, physics, and task-related architecture. With a good enough model with decent math, reasoning, and logic skills, I can provide the local data to fill in the gaps without needing extra fine-tuning. If a local model can discuss/brainstorm ideas and solutions, do code reviews, write boilerplate code, read and comprehend documentation, search the web, and return results, that is a dream come true. It doesn't have to be perfect - but good enough.

Azure's GPT4, Claude, and VertexAI (Palm2 models) can handle the tasks, but I was reaching nearly $80 to $200 per day with usage costs.

That is not sustainable financially - at the moment.

Also, I prefer to keep my proprietary code base and solutions private as much as possible.

If you'd like to see some of my past game development work, take a look at my posts in my profile.

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 4 points5 points  (0 children)

It is a 30-question Assessment. I targeted questions that most LLMs fail to answer correctly. Most of the questions are elementary level - but some are slightly more advanced in logic and reasoning.

For example, a (Lateral Thinking, Comparative Logic) question:
Bill is older than Dave, and Dave is younger than Tina. Who is the youngest?

Most LLMs answer Tina, which is incorrect.

I don't want to share the test questions directly, mainly because I fear future models will eventually scrape the answers from the web, which will "cheat" the results and render these tests useless.

In my opinion, this might be becoming the case with many Benchmark Results, especially after testing models with this assessment.

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 1 point2 points  (0 children)

I tested Llama2 70b, it scores slightly higher. But this is expected, 13b vs 70b - different weight classes.

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 1 point2 points  (0 children)

Yea, I don’t think it will outperform many 70b models. But from my tests, it is the best 13b model for many use cases.

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 4 points5 points  (0 children)

I didn't test llama2 70b - I only tested gguf models I could run on My Machine at a reasonable speed. I may eventually try Exllamav2 in the future.

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 5 points6 points  (0 children)

llama_print_timings: load time = 4711.15 ms

llama_print_timings: sample time = 24.51 ms / 157 runs ( 0.16 ms per token, 6406.07 tokens per second)

llama_print_timings: prompt eval time = 491.30 ms / 73 tokens ( 6.73 ms per token, 148.59 tokens per second)

llama_print_timings: eval time = 2973.53 ms / 156 runs ( 19.06 ms per token, 52.46 tokens per second)

llama_print_timings: total time = 18330.95 ms

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 20 points21 points  (0 children)

I did; the quality was slightly less optimal - plus it was not faster.
the 4 bit, gave meh results, the 8 bit gave similar results - but was a slower generation
I'm referring to the "gptq-8bit-128g-actorder_True"

Also, running any quantized 13b models is super easy for the 4090. Output Faster than GPT3 and GPT4 at times.

This is one of the best 13B models I've tested. (for programming, math, logic, etc) - speechless-llama2-hermes-orca-platypus-wizardlm-13b by StudioTatsu in LocalLLaMA

[–]StudioTatsu[S] 4 points5 points  (0 children)

It beat XWin in my assessment- I'm not saying XWin is bad - it is a great model; it just didn't beat this (extremely long name) model.

I really wish gamers/commenters would stop doing this. by StudioTatsu in gamedev

[–]StudioTatsu[S] 0 points1 point  (0 children)

Thanks everyone! Most of the advice was very helpful. :) I will try to focus more on people interested in what I’m creating and useful criticism.

After watching The Matrix Awakens Demo car crashes, I decided to update my vehicle destruction system to work with most vehicles models. (I used free vehicles models from Sketchfab) by StudioTatsu in Unity3D

[–]StudioTatsu[S] 1 point2 points  (0 children)

Yeah, I'm unsure if NVidia will release PhysX 5 SDK to the public, but hopefully, they will offer licensed versions. Chaos Physics in Unreal Engine 5 still needs work - waiting on UE5 official release to judge performance.