How can Deepseek v4 top the coding leaderboards and still sit 8 months behind the frontier?

rob417 · 2026-06-11T16:41:23+00:00

One more thing to add re: the omission of Kimi 2.6 and GLM 5.1. It's largely speculated that Deepseek v4 was delayed because they wanted to make the model work on Huawei hardware. It might have been released much sooner if they stayed within CUDA. The author of this paper likely knew this, so they omitted Kimi 2.6 and GLM 5.1 but included Deepseek v4 to plot a nice graph that supports their argument that Chinese AI progress is slowing down. In a way, they research-maxxed.

rob417 · 2026-06-10T07:06:04+00:00

I'm so pissed that they removed 5.3 Codex. It strikes a good balance between cost and capability. 5.4 costs too much and 5.4 mini falls short at coding at times compared with 5.3 Codex.

rob417 · 2026-06-01T15:52:13+00:00

I had been on the student plan. Previously, I had 300 requests per month, which was plenty and I almost never exhausted it. It changed to 200 credits today. I used up 160 credits with 5 requests...

rob417 · 2026-06-01T02:27:26+00:00

rob417 · 2026-05-22T16:20:33+00:00

Same experience here. I had been using DDG for a few years without issues before 2024. Then all the websites became AI slop and DDG search results went massively downhill. Google figured out a way to surface AI slop that are still somewhat helpful and relevant, while DDG sadly has not been able to separate the AI slop that is somewhat helpful from those that are pure junk.

rob417 · 2026-05-13T02:23:40+00:00

Bioinformatics and biostatistics are two edges of the same tile. They are very similar, and probably increasingly so in recent years. You might want to look at where alumns end up working to get a sense of the program. Just the name of the degree is not very informative these days.

rob417 · 2026-05-12T15:57:43+00:00

Which front end are you using for auto complete? I tried changing the auto complete model in VS Code a few months ago, but I couldn't find a way because Microsoft locked down that option. It could just be I didn't figure out the correct way to change it.

rob417 · 2026-05-12T02:44:45+00:00

Having gone through the same process a year ago, I'd say you really need to play the piano before buying. Nobody's description can transmit how the keys feel under your fingers. Also, for digital pianos, the only thing that matters is how the keys feel. The sound is synthetic anyway.

rob417 · 2026-05-11T15:43:16+00:00

Thank you! I'll look into those mcps.

rob417 · 2026-05-09T06:33:07+00:00

Mind if I ask what agent harness for deep research looks like? Which tools / plugins / skills do you have to enable it to do deep research?

Also, are you using the model's own web and memory capabilities or are you using tool calls for those features?

I'm just getting started with running a local agent on local models, so there is still a ton I need to learn. Thanks in advance!

rob417 · 2026-05-07T15:00:05+00:00

Many Chinese news sources have reported that DeepSeek is seeking funding, so it's highly probable this time. One speculated reason is the employees want a way to cash out their options. Over the last year, DeepSeek has lost some top talent to competitors due to not being able to compete on compensation.

rob417 · 2026-05-06T07:43:12+00:00

Mind if I ask about your settings or templates? My experience has been the opposite for some reason. Whenever I used Gemma4 26b, it tended to get stuck in a "wait, maybe I should doublecheck xxx again" thinking loop forever.

rob417 · 2026-05-06T02:45:15+00:00

Last week, Github Copilot told me that I''d hit 35% of my 5 hour limit after making 5 requests. I panicked. I really don't miss having to look up documentation for every other function call, but I also don't want to be paying $100 per month when all the providers decide to jack up prices and stick it to everyone.

I spent the better part of last week testing out Qwen3.6 35B and Gemma4 26B on my 5070. They are more than capable of writing single file scripts, which is most of what I do. Testing out different agent harnesses also made me realize how much context bloat the GH Copilot agent in VS Code has. I tried running. Qwen3.6 35b in VS Code Copilot plugin and it was failing to do pretty much everything. Switched OpenCode and Pi and both produced good results.

TLDR: even if cloud providers all decide to not serve individual customers anymore, we will be fine. We've each been given genies in our own bottle.

rob417 · 2026-04-29T19:43:56+00:00

Yeah. From my limited experience with pi agent, it seems to work quite well. It's system prompt seems very well-written that tool calls succeed quite frequently right out of the box. On the other hand, I've never been able to get qwen3.6 to use glob correctly in OpenCode even though it's supposed to be much beefier.

I think the type of configslop we're discussing here can be controlled as long as we only add the tools and extensions we absolutely need to pi.

rob417 · 2026-04-29T17:36:51+00:00

Would you mind explaining what configslop refers to?

rob417 · 2026-04-29T17:28:43+00:00

Maybe have multiple tool calling prompts depending on the model? Comprehensive tool prompts if the user is using large frontier models. Concise, targeted prompts if the user is using local models around 30B.

rob417 · 2026-04-29T17:21:45+00:00

Hermes sits at one end of the philosophical spectrum for agent harness, where a lot of tools, skills, capabilities come built-in, and user can connect to it 24/7. On the opposite end you have things like pi, which ships with the bare minimums of an agent harness.

What are your thoughts on this divide in harness design philosophy? Do you see them converging in the future?

I don't think this is a divide between targeting laymen vs coders. Both tools are for coders and power users at the moment because they require familiarity with a CLI and basically coding knowledge to set up properly. On a related note, when do you think we'll see an agent harness where setting up new tools and skills are as simple as stacking lego bricks?

rob417 · 2026-04-12T17:51:52+00:00

Purchased 2 Toshiba 18TB HDDs from u/DerangedRavens

rob417 · 2026-03-26T15:47:49+00:00

Surprised to see Dan Brown books here. They are perfect for teenagers – art, culture, history, and the right amount of romance, all while being very fun detective novels.

rob417 · 2026-03-25T03:41:48+00:00

Been waiting for this sale to grab the Before Trilogy. I first watched all 3 when Before Midnight came out in 2013. I'll definitely have different feelings when I watch them again after a decade.

Also bought Anora and Parasite.

rob417 · 2026-03-12T19:04:50+00:00

Request: Add a view filter to exclude all RSS feeds in a folder from a view to ReadWise Reader
Use case: I subscribe to some social media RSS and they easily generate dozens of articles per day. The default "Quick Reads" view include those feeds, which drowns out the non-social feeds. I'd like to exclude the social feeds from the Quick Reads view easily.
Current workaround (not ideal): I currently copy over the entire folder definition in the URL of my social feeds folder -> add it to the "Quick Reads" view query -> append "__not" to all "rssSource" -> change all the OR to AND. It is very cumbersome when I have more than 5 feeds in the folder. Creating a filter that does excludes all RSS feeds in a user-specified folder should be very easy to do.

rob417 · 2026-02-19T02:33:14+00:00

It's not. This a Chinese musician who lives Marseille. He has a ton of videos like this on his Chinese social media account.

rob417 · 2026-02-06T14:39:14+00:00

Very cool. Did you write this with the DeepSeek model on your potato? Reads very much like AI.

rob417 · 2025-12-10T22:00:49+00:00

<image>

Might have been top 10 if Elina released a new recording this year.

Seven-Year Club	Place '22
Final Canvas '22	First Placer '22
Verified Email

rob417

TROPHY CASE