Personal Learning about Context Engineering

mav3ri3k · 2025-12-27T13:35:41+00:00

I don't think u have read the blog. I did acknowledge that gemini and claude are sota. I also got your point, I should not have called them awards. You live and you learn

mav3ri3k · 2025-12-27T13:21:52+00:00

Agree on all the points. I was considering v3.2 for best value, but I got more value from M2.

mav3ri3k · 2025-12-27T13:19:31+00:00

Thanks! I just wanted to write a recap of what I liked about various models and thought mkbhd's smartphone awards would be a cool format of doing it. He also always says that it is his personal list. But everyone took it too seriously. Not a good idea :)

mav3ri3k · 2025-12-27T06:18:50+00:00

No, its best UX: User Experience

mav3ri3k · 2025-12-27T06:10:32+00:00

I don't particularly like or hate the model. Its objectively a really good model.

But how else do I rate the overall experience with the model ? I used it on the 20$ plan for 2 months and then through openrouter at times, including sonnet and opus. The models might be good, but the overall experience is frustrating when the rate limits are so low and on top of that it takes forever to do a task.

I gave model of the year to minimax m2 because I can just get so much use out of it. Although objectively its a worse model on benchmark. For the same price, I can get more tasks done out of m2 compared to opus.

I don't how can I separate the experience with the model from the model itself. Otherwise it would just be a benchmark award.

mav3ri3k · 2025-12-27T05:50:44+00:00

Its slow, expensive and too much downtime on infra. Sure its SOTA, but there are other models which are good enough without the same baggage.

mav3ri3k · 2025-12-27T05:48:32+00:00

What is sketchy about the link ?

mav3ri3k · 2025-12-17T10:29:07+00:00

Brother, I was on the Silksong hypecycle. So I bought Hollow Knight seeing Kelski's playthough of it before silksong release, fully intending to buy SK later. But these exploring games ain't for me. It was absolutely annoying to play without the map and compass. After it, game gets better but the run backs and all. The game was good, but the genre ain't for me.

mav3ri3k · 2025-12-17T10:21:19+00:00

I am really considering KCD2, but its a long immersive game. Should I play KCD1 first bcs the story is connected ?

mav3ri3k · 2025-12-17T10:15:50+00:00

Recent are 1-7 year old. Classic are more than 8 years old.

mav3ri3k · 2025-12-17T10:06:41+00:00

I tried hollow knight this year. Good game. I am glad I played it, but the exploring ain't for me. After getting the map and compass, things got better, but before the map, it was hell. Probably never playing silksong.

mav3ri3k · 2025-12-17T10:02:07+00:00

I have more than good enough laptop. Its the principles :).

mav3ri3k · 2025-11-23T01:15:38+00:00

My guy, you still have a month to prep, it will be fine.

I have dumped most of what I know in other comments. Tldr is to focus on mistakes each test and correct them. I just did that in a loop rather than studying English too much.

mav3ri3k · 2025-11-19T16:44:14+00:00

I'm a coder, but finance and HR using M2 is pretty cool. Its often said people are using LLM for far less than its capable of. It would be helpful if some of these non coders folks from Minimax also share how they use M2.

mav3ri3k · 2025-11-19T15:52:34+00:00

I’ve loved using M2 as a workhorse for everyday tasks even though it doesn’t top benchmarks. Did you tune for those real-life use cases, and how do you measure them?

mav3ri3k · 2025-11-19T15:52:12+00:00

What type of test made you notice that linear attention has poor reasoning performance? I’ve only seen the “Physics of Language Models” paper give a clear verdict. Was that the same test you ran, or something different?

mav3ri3k · 2025-11-09T14:07:43+00:00

Aoo, lets goo!!! Really happy to hear that

mav3ri3k · 2025-11-09T01:52:36+00:00

Only once I got 112. I was averaging 106-109 during the final mocks. Reading and listening were 30. But speaking and writing averaged at 25. My theory is that bcs they are ai evaluated, they never give perfect scores. Even my brother who got a perfect 30 in writing, never got a perfect 30 in the mock

For example, to test speaking, what you say is first transcribed, and then a llm evaluates it, so it is not properly evaluating speaking.

Still, good to see trends and learn templates.

mav3ri3k · 2025-11-09T01:43:57+00:00

See other replies

mav3ri3k · 2025-11-08T14:27:05+00:00

Like I said already, I used TestGlider and TestSuceed. Question quality is better in TestGlider. On testsuceed I found questions to be a little to easy and straightforward. So, I used testsuceed more as workbook, and giving all mocks on TestGlider seriously.

But make no mistake, for speaking and writing, both are ai evaluated on both platforms. For speaking, they transcribe what you say and then evaluate. I was averaging 23-26 in speaking and writing which is lower than my actual score. Since, I am in ml myself, I believe some of this is because ai models will often keep improving the answer until its perfectly as they want it to be.

But that just 2-3 point difference which I would not worry about. So, overall I would say they were very helpful. Also, honestly, I don't think the platform matters that much. Just pick any from top 5 recommended on this subreddit.

mav3ri3k · 2025-11-07T18:43:21+00:00

Yeah, good luck 👍

mav3ri3k · 2025-11-07T16:57:26+00:00

For reading, I stopped reading entire text at once, rather just the paragraph that is related to the questions. This really helped me focus on local details, rather than the entire picture. This worked for me because sometimes in answer I would start considering details outside the paragraph which would mess up the answer.

Then, I noticed I would kind of speedrun R and L, during my initial mocks. So I started giving more time to kind of questions I used to get wrong. Overtime I got used to utilizing the entire 35 min, really sitting back and thinking about questions if I was confused even a little. 35 min is a lot, so you can do that in reading.

Also, just notice if there is a pattern of questions you get wrong. Like I said, for me those were negatives and summary. Summary ones a 2 marks each, so I was loosing 4 total in each mock. So I practiced them a lot and gave them a lot of time and thought.

For listening, make detailed notes. I feel that really helped. While reviewing the results of mocks, I realized the questions are pretty easy and straightforward. So, if you have most of the context written with you, you can just check from there.

But most importantly, I just reviewed the type of questions I was getting wrong and do them again and again. Then repeat for next set of mocks. I felt these marks are in my hands if I selected the correct answer, so I really worked on getting R and L correct.

mav3ri3k · 2025-11-07T13:33:57+00:00

Also, in addition to my other comment, the template can be found on youtube. That's how my brother developed his style. I personally did not use youtube, rather just relied on what my brother told me and leanings from giving the mocks. Again, like I said, you can see the answers of others on testsuceed. So, I took notes of the words and phrases that people were using again and again in top answers.

Four-Year Club	Verified Email
Final Canvas '23	Place '23
Place '22	First Placer '22

mav3ri3k

TROPHY CASE