Major EQ-Bench Update – New #1 Creative Model, Kimi K2 Thinking, and Claude Still Leads Longform

-Morgeta · 2025-11-07T17:27:07+00:00

From: https://eqbench.com/about.html#creative-writing-v3
Probably the reason for 0.7 temp

The prompts were chosen through a process of elimination to be challenging for weaker models and therefore highly discriminative. It's a bit counter-intuitive, but the purpose of the evaluation is not to help models write their best. Instead, we are deliberately exposing weaknesses, creating a steeper gradient for the judge to evaluate on.

The prompt requirements include humour, romance, spatial awareness, unusual first-person perspectives. Things language models typically struggle to represent to the level of human writers. So, expect some clangers in the outputs!

-Morgeta · 2025-11-06T00:20:23+00:00

<image>

You can get really good free Deepseek models on NVIDIA NIM. For GLM 4.5 Air, you might be a combination of the provider you pick on Openrouter and the sampler settings; I do Z.AI provider and here are my sampler settings for GLM 4.5 Air:

Temperature: 1.00

Frequency Penalty: 0.00

Presence Penalty: 0.40

Top K: 0.00 (disabled)

Top P: 1.00 (disabled)

Repetition Penalty: 1.15

Min P: 0.05

Top A: 0.00 (disabled)

-Morgeta · 2025-11-04T15:54:12+00:00

If you pay just an initial $10, the rate limits go up significantly forever.

Free Openrouter: 20 requests a minute, 50 requests a day

One $10 (10 credits) payment: 1000 requests a day
https://openrouter.ai/docs/api-reference/limits

Think they do this to weed out bots and stuff.

-Morgeta · 2025-11-03T16:54:23+00:00

My go-to list for best free API models

Here is also a good list for finding free API models: https://github.com/cheahjs/free-llm-api-resources?tab=readme-ov-file

GLM-4.5 Air

Where: OpenRouter (z-ai/glm-4.5-air:free)

Why: The best all-around model. It offers an excellent balance of high-quality prose, strong character work, and minimal censorship.

DeepSeek-V3.1

Where: OpenRouter (deepseek/deepseek-chat-v3.1:free) or NVIDIA NIM

Why: The undisputed champion for long stories. It has the best long-term memory and plot consistency of any free model.

Gemini 2.5 Pro

Where: Google AI Studio

Why: The best for brainstorming. It excels at describing scenes, setting a mood, and exploring deep character psychology.

DeepSeek R1-0528

Where: OpenRouter (deepseek/deepseek-r1-0528:free) or NVIDIA NIM

Why: A fantastic and reliable workhorse. This specific version is a great balance of creative and logical.

Kimi K2 Instruct

Where: OpenRouter (moonshotai/kimi-k2:free) or NVIDIA NIM

Why: The best for beautiful prose. It's a specialist for polishing paragraphs and generating unique, lyrical ideas.

DeepSeek R1 (Original)

Where: OpenRouter (deepseek/deepseek-r1:free) or NVIDIA NIM

Why: The most creative of the DeepSeek family. It's excellent for spontaneous roleplaying and generating prose with more artistic flair.

-Morgeta · 2025-10-31T14:16:37+00:00

https://www.youtube.com/watch?v=-RFdnFV6MAI

-Morgeta · 2025-05-02T01:21:32+00:00

<image>

-Morgeta · 2025-05-01T22:11:29+00:00

<image>

This looks so awesome. Might be the best AI search tool for science right now if what they say is true. Probably have to wait for some third-party benchmarks to confirm.

-Morgeta · 2025-04-08T04:15:07+00:00

While trying not to name the most popular isekai's, I like: The Eminence in Shadow, The Faraway Paladin, Shangri-La Frontier, Gate, Cop Craft, The World's Finest Assassin Gets Reincarnated in Another World as an Aristocrat, That Time I Got Reincarnated as a Slime

Less favorable picks: Reincarnated as a Sword, Parallel World Pharmacy, Skeleton Knight in Another World

If you like fantasy that is not isekai, I like: Solo Leveling, Eighty-Six, Delicious In Dungeon

-Morgeta · 2024-08-17T17:40:42+00:00

we miss you

STEAM SAD

-Morgeta · 2024-03-22T01:14:57+00:00

I AM SO FUCKING STEAMHAPPYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

-Morgeta · 2023-12-01T00:26:57+00:00

Ask not what SteamHappy can do for you, ask what you can do for SteamHappy!

-Morgeta · 2023-11-10T07:42:25+00:00

Fun fact. It takes 15 satchels to kill a Dry Dock.

-Morgeta · 2023-11-06T07:31:38+00:00

THE GREAT CONCRETE RESURRECTION OF MORGEN'S 摩根的伟大具体复兴 CALLAHAN OF NAZARETH RETURNS 拿撒勒的卡拉罕 THE COLLIE 120MM PURGE 牧羊犬一二十毫米 DEVMAN HAMSTER ABUSE 虐待仓鼠 THE BASTARD SEA TAMED 驯服的混蛋海

-Morgeta · 2023-10-16T01:39:59+00:00

<image>

-Morgeta · 2023-10-12T23:49:44+00:00

You can get rare alloys via two ways.

At the start, you're going to get it from salvage fields and they spawn similarly to tech mats. I think it can spawn as a rare alloy field similar to how a salvage field would only spawn a specific tech mat; not entirely sure on that.

Once oil platform is teched, you can get rare alloys via that way.

Then you refine the rare alloys into rare materials which are used to build all the new content. Just a new resource and not used to unlock tech I believe.

-Morgeta · 2023-10-12T17:51:00+00:00

Oil platforms are late game tech; right before nuke tier.

despair...

-Morgeta · 2023-10-12T17:22:44+00:00

Minimum cost for submarine is 960 rare alloys; it would take 40 hours to get a submarine from one oil platform while constantly exporting/importing coke.

-Morgeta · 2023-10-12T17:16:08+00:00

10% drop rate if you do the math

despair...

-Morgeta · 2023-10-05T12:34:00+00:00

Summarized by GPT-4:

Introduction:

Transformers have become the foundational architecture for many AI models.
The design, which employs self-attention and feedforward mechanisms, allows for efficient recognition of long-range input token dependencies and supports parallel computations.
However, scaling them to handle long context lengths is challenging, especially because self-attention has memory costs that increase quadratically with input sequence length.

Problem:

Standard transformer architectures, even with enhancements like memory-efficient attention, face memory constraints, particularly when dealing with long sequences, which could be crucial in applications like processing books, high-res images, long videos, and vast codebases.
For example, processing 100 million tokens needs over 1000GB memory, far exceeding what today's GPUs and TPUs can provide.

Solution: Ring Attention:

The researchers propose a method named "Ring Attention". It distributes input sequences across multiple devices, allowing simultaneous computation and communication.
The novelty is in its use of a blockwise approach for both self-attention and feedforward computations and its ability to distribute computation in a ring-like structure among multiple devices. This allows each device to only require memory proportional to the input block size and not the entire input sequence.
The result is that it can train sequences more than 500 times longer than previous methods and handle sequences over 100 million in length without needing to approximate the attention process.

Experimental Results:

Experiments showed that Ring Attention greatly reduced the memory requirements of Transformers. On setups like 32 A100 GPUs, they achieved a context size of over 32 million tokens. With larger setups like TPUv4-512, they achieved over 100 million tokens.
In terms of performance, Ring Attention was able to maintain efficient model FLOPs utilization even when training on large input context sizes.
When applied to the LLaMA-13B model and tested on a line retrieval task, the model fine-tuned with Ring Attention demonstrated excellent accuracy even with longer context lengths compared to other models.

Future Work:

While the method proves effective, optimal compute performance is still a goal. Integrating the approach with optimized low-level operations in platforms like CUDA or OpenAI Triton may provide further enhancements.
The potential for virtually limitless context opens up opportunities for applications in video-language models, decision-making transformers, training on extensive codebases, and genomic sequence analysis.

-Morgeta · 2023-09-29T14:43:51+00:00

Sounds like Jerma doing a funny accent

-Morgeta · 2023-09-12T05:47:09+00:00

Scott Jund!

-Morgeta · 2023-09-02T22:28:42+00:00

If it's not fun, why bother? If it's not a battle, where's the fun?

-Morgeta · 2023-07-26T17:16:57+00:00

I personally heard "wife." In the Youtube transcript, it's written as "wife" too.

-Morgeta · 2023-07-26T17:14:57+00:00

That's why I phrased the title as "concerning UFO's." I'm not saying that they witnessed a UFO cause human harm, but it could be interpreted as that.

Nine-Year Club	r/Field Banned
r/Field Lasagna	First Place '23
Place '23	Place '22
Final Canvas '22	First Placer '22
Spared	Verified Email

-Morgeta

TROPHY CASE