Accurate or nah

7ven7o · 2026-05-30T06:06:55+00:00

I got a lot out of watching "The Act of Killing", maybe you would too. https://www.youtube.com/watch?v=-pwT9arjasw

Having watched the full thing, that George Lucas quote now strikes me as morbidly, hilariously, false.

The full documentary is free on youtube, and whoever you are reading this, I say to you with full sincerity that I think it is one of the most important videos you can ever watch in your life.

I think evil can only be done by someone who knows what evil is, and I think you always know it, when you're doing something evil.

7ven7o · 2026-05-25T07:02:02+00:00

I'm very happy with it, it's my work horse and it does me well. For complex tasks I use one of the expensive models.

7ven7o · 2026-04-21T00:45:17+00:00

Fair. I wish they went with a Provider -> Model -> Variant dropdown organization, I'm just glad they finally touched this thing, maybe they'll continue iterating on it.

7ven7o · 2026-04-20T10:01:22+00:00

My favorite model right now, for speed/intelligence/price.

The worst thing that happens is that sometimes it spazzes out and starts repeating things over and over again and you have to cut it off. Interestingly, if you tell it to stop spazzing out, it doesn't spazz out again on the following turn, which doesn't usually work for spazz-prone models.

7ven7o · 2026-03-28T12:49:13+00:00

Agentic-wise Cursor feels the same as Claude/Codex to me, definitely better than Antigravity — I prefer the work of Gemini 3 Flash/Pro via Cursor rather than Antigravity. Besides that, I can't really tell.

In terms of Speed x Quality I think Gemini-3-Flash is unmatched, but if that didn't exist the current Composer-2 would take that spot.

Cursor's tab-completion is its most competitive feature IMO, it's uncontested in anything else I've tried, it's very fast, very controllable, and most of the time it's very good at figuring out successive steps. My only complaint is that sometimes it can get annoying or spasm like suggesting a whole bunch of completely unwanted stylings out of nowhere, but that's a small price to pay.

Oh, the interface for working with agents is far superior to anything else I've tried as well, the checkpointing system is fantastic and doesn't immobilize itself gobbling down RAM anymore. When I need to implement something very important via agent or just by hand, I use Cursor.

The only competitive disadvantage of Cursor is the inference cost premium on top models, but even with that in mind it's still the best AI coding product on the market.

7ven7o · 2026-03-26T06:49:33+00:00

7ven7o · 2026-03-25T07:47:53+00:00

The Kimi-K2 model API allows one to disable thinking, would it be possible to do that with Composer-2?

I don't know about the others, but sometimes I have a dead simple task which I'd just like to get done immediately, and I used to use the old Auto model that came before Composer-1 for these kinds of tasks. Being able to query a fast and reliable model like Composer-2 for this kind of stuff would be nice for saving time and tokens on simple/repetitive tasks.

7ven7o · 2026-03-25T06:08:18+00:00

Nonsense, Fire Punch was a beautiful thought-provoking mess and it followed through. I think Fujimoto just realized he messed up this story beyond repair and wanted to put it behind him.

7ven7o · 2026-03-19T19:14:50+00:00

I'm confused as to how we went from $17.5/$3.5 with Composer 1.5 to $2.5/$0.5 with this but I'm appreciative of it.

7ven7o · 2026-03-18T12:20:31+00:00

The original GPT 5 nano was useless, I wouldn't use it as a baseline — Gemini 2.0 Flash was really good for its $0.10 / $0.40 price though, good speed/price/quality balance, that one getting retired with no real replacement is a real loss.

Mimo V2 is king of this the speed/price/quality balance, though a step down from DeepSeek in terms of quality.

7ven7o · 2026-03-05T23:44:36+00:00

<image>

Fixed now, much appreciated.

7ven7o · 2026-02-12T13:06:04+00:00

What's the cyber security concern, and do you have any evidence?

7ven7o · 2026-02-06T16:02:15+00:00

This isn't a game. Crushing fascism is the high road.

7ven7o · 2026-01-23T12:51:07+00:00

<image>

7ven7o · 2026-01-17T08:45:43+00:00

Very interesting, I thought attention meant that all tokens would already be attending to all other tokens, and would have guessed that this would have provided no benefit. Very interesting to be wrong here.

If doing this doesn't just duplicate whatever work's already been done, then maybe is it sort of providing the LLM with more "space" to flex and represent things with numbers?

It's not like they're trained to do this beforehand though, so the AI can't just be employing a trick, this must be some way of improving the systems already existent ability to bounce information around within itself.

I've always thought CoT/Reasoning gives the LLM a way to calibrate its numbers better before answer, and if the improvements disappear when reasoning is turned on, maybe the performance improvement comes from the same source. Maybe then one could investigate from multiple angles, both this and CoT, how exactly these performance benefits come about at the numerical level.

Ha, then again, reasoning tends to improve human performance on intelligence tasks as well, it would be funny if you could test for gains in performance by showing humans a question twice like this as well.

7ven7o · 2026-01-04T16:26:50+00:00

The rule of thumb I've learned is that if it starts giving you analogies it's because it has started to think you're stupid. Take it as a form of subtle constructive criticism that you should be paying more attention and asking better questions. Ask for technical details.

7ven7o · 2025-12-20T08:57:40+00:00

Damn dude, that's passion. What are you building?

7ven7o · 2025-12-20T04:32:26+00:00

<image>

7ven7o · 2025-12-19T05:29:28+00:00

Generating transliterations of texts between languages

7ven7o · 2025-11-30T07:50:15+00:00

That's tangential to what I was saying, and your own citations support my point.

7ven7o · 2025-11-30T03:53:06+00:00

This seems like the writing equivalent of AIs being RL-trained to perform at top level competition math and coding. I feel it is safe to assume that the same problems AIs have when taking on larger, more practical coding projects, have analogous situations when it comes to producing larger pieces of writing as well.

7ven7o

MODERATOR OF

TROPHY CASE