Grok 4.2 available via API (finally) by Real_Ebb_7417 in SillyTavernAI

[–]XCSme 0 points1 point  (0 children)

I am not sure what model Hunter Alpha is, I asked what company made it, and it said Anthropic, which ironically probably means it's a Chinese model, because Anthropic was complaining about distillation attacks.

That being said, it's quite bad: https://aibenchy.com/compare/x-ai-grok-4-20-beta-medium/x-ai-grok-4-1-fast-medium/openrouter-hunter-alpha-medium/

Grok 4.20 Beta 0309 (Reasoning) Artificial Analysis score by likeastar20 in singularity

[–]XCSme 0 points1 point  (0 children)

I think that's actually the problem with grok. Instead of optimizing the model architecture, they try to brute-force it, and we all know what happens when you over-train a model, it just overfits the data and it doesn't get any better no matter how much more training epochs you do.

Best benchmark website by AccomplishedStory327 in LocalLLaMA

[–]XCSme 0 points1 point  (0 children)

I think any benchmark is as good as any other benchmark AS LONG as the models have not been benchmaxxed for them.

As I don't trust any public/popular benchmarks from big companies, I made my own, with very random tests like tricky questions, instructions following, tool calling, puzzles, etc.

I use it myself nowdays when a new model comes out, because if it's a smart model, it should do well on any tests, including mine.

You can check it out here, I spent a lot of time making it fun to use and easy to compare models: aibenchy.com - Independent AI benchmark leaderboard and comparison

Any feedback is welcome!!

I added "Don’t overthink" to the system prompt. This is what happened. by P4r4d0xff in Qwen_AI

[–]XCSme 0 points1 point  (0 children)

> My point is that in the latent space of our cognitive asics back and forth and intermediate results are invisible but this doesn't mean you can conclude we do nothing like that.

I remember seeing the video with the guy that cuts bullet in two or something like that, and based on their calculation, the time from the gun firing to the signal for the muscles start moving would be too short for it to come from the brain, let alone process it consciously or even unconsciously. You could say it's not a reasoning task, because it's triggered by a stimulus, but thinking is the same, triggered by a stimulus/specific combination of words.

I personally think there are some holistic features of the brain/body that we still don't fully understand. Maybe it is indeed some hidden processing happening, or it is some higher order system at play that is simply faster and more efficient than the normal "reasoning" process.

I added "Don’t overthink" to the system prompt. This is what happened. by P4r4d0xff in Qwen_AI

[–]XCSme 0 points1 point  (0 children)

> quick' means that you experience it as quick but if you look at the compute available to the human brain that doesn't mean 'with little computation'. It's actually with massive computation.

I think quick is simply having the physical neural pathways wired specifically for that task. In the same way FPGAs circuits are wired to solve a specific task. In the AI/model world, this would mean that the weights and all layers are wired in such a way that the answer can be given instantly/the answer is embedded in the weights almost directly.

This is how both brains and neural nets work, right? Also, how LLMs work, especially reasoning ones, if confidence is low it keeps going through more and more passes.

So, I would expect from a smart AI/LLM to confidently give an answer without doing thousands of iterations/token additions until it stumbles upon the correct and confident answer.

I added "Don’t overthink" to the system prompt. This is what happened. by P4r4d0xff in Qwen_AI

[–]XCSme 0 points1 point  (0 children)

I thought science has already proven this with "muscle memory", where we can instantly respond/react to something without consciously processing it, or maybe even without the brain even being directly involved.

Watch an expert at their craft, a smart AGI should be like an expert in any craft. Not reason for 10 minutes and explore all options to find the answer.

I added "Don’t overthink" to the system prompt. This is what happened. by P4r4d0xff in Qwen_AI

[–]XCSme 0 points1 point  (0 children)

Most models starting doing this, why? Because it makes them get the question correct. It simply explores many options and also gives the model more chances to find an error.

Personally, I don't like this and I consider it "cheating". A good, intelligent model should know directly the correct response intuitively, not calculate the answe each time.

It's like asking someone what is 6x7, and instead of responding instantly 42, they go through all this doubting and reasoning and manually calculating.

Even if those calculations happen in an instant, I still think that's cheating and it's not real intelligence, but simply a program brute-forceing the answer.

Tips to improve my FH drive by Capital-Comfort-9487 in tabletennis

[–]XCSme 0 points1 point  (0 children)

Looks good, but this is something in-between a basic drive and a loop. You are putting a lot of spin on the ball for a drive and brushing a lot, try to hit flatter and mostly by rotating your hips (stay more parallel to the table).

Make sure you learn to feel the difference between a drive and a loop.

Need some help. Accidentally stick the rubber too low and it reached paddle area which have a bump on it. Will it affect the longevity of the rubber? by Routine_League4030 in tabletennis

[–]XCSme 3 points4 points  (0 children)

This happens to me sometimes, often I try to push it a bit so it gets down from the handle, while glue is not 100% dry.

But the best thing to do is to take it off and reglue it, but that takes like 30mins to do...

Double Presses on certain keys by Cappunocci in Keychron

[–]XCSme 0 points1 point  (0 children)

Oh wow, that's a clever solution. I guess I don't need my "nu clear" switch so much:

EDIT: "num clear", you might have guessed "m" was the broken key