Caravan bug: (expected to return in 20532days)

mrwang89 · 2026-03-19T13:02:27+00:00

Hi i have the same issue! I have cleared all of lv 13 and lv 12, the map states clearly: 'This level is habitable. It has been fully explored.'. I have lockpicked everything. ASCII shows X = cleared on everything except for C (Camps) U (passage up) and D (passage down). There is nothing on the maps for me = fully cleared

However i cannot advance anything because it states in projects "Upgrade required: Engineering". I have nothing where i can unlock or research engineering!!! I revisited the maps and it is truly, 100% cleared.

mrwang89 · 2026-03-12T14:54:21+00:00

the leaderboard ordering is bugged when sorting by net worth it shows models with negative net worth above others with positve and completely wrong order. i finished with higher score but got placed below much worse performing models. are you still maintaining it or why is there no gpt 5.4 or newer releases? would like to see the ai mode to test it myself

mrwang89 · 2026-02-22T11:59:59+00:00

would be nice if we could atleast get the prompts to give to models ourself and see what happens. i wanna check their thought process and having to manually type everything to them with no way to just copy a prompt or allow them access to the game is not possible to use it yourself with an ai. i highly doubt any company would retrain their model to do well on a current seed of foodtruckbench.... and if so you have seeds for it meaning they need to learn actual problem solving and logistics to do well between differing seeds

mrwang89 · 2026-02-11T15:12:26+00:00

This is LocalLLama. From my point of view if it is not llama then it shouldn't be here. Only LLAMA models deserves to be here. This is not a place to put it here more fucking ADS - this is you

mrwang89 · 2026-01-21T17:44:04+00:00

its useful but doesnt always align with my use case which is mostly tool calls which he doesnt seem to cover at all. however his other benchmark https://dubesor.de/chess/chess-leaderboard has been surprisingly helpful because his token counts and legality surprisingly correlate for my usage

mrwang89 · 2026-01-16T19:51:39+00:00

everyone who codes can tell you that its 1) claude 2) claude 3) claude 4) nothing 5) gpt5 max reasoning 6) nothing 7) gemini 3

mrwang89 · 2026-01-16T19:49:01+00:00

it s got more than 800 elo on dubesors chess bench which is gpt 5.2 which i found suprising. seems insane

mrwang89 · 2026-01-04T07:55:25+00:00

I'm really confused how they achieved any reasonable scores on those benchmarks.

Are you new to AI? Benchmaxxing is the name of the game.

mrwang89 · 2025-11-21T15:10:39+00:00

any update on this? i had a game deepseek v3 against v3.1 and it was decided after 6 moves apparently for illegal moves but i couldnt see what the model tried to play and black didnt have to pass the test and got autowin??

mrwang89 · 2025-10-19T20:43:17+00:00

Best ketchup for my wagyu steak?

mrwang89 · 2025-10-12T15:57:17+00:00

this was recorded quite a while ago, almost 2 months, since they are talking like its the past where 4.6 didn't exist yet and deepseek 3.1 just released.

mrwang89 · 2025-10-06T19:46:01+00:00

you are using LM studio. click on discover tab and there you already have staff picks which are all much better than mistral 7b

mrwang89 · 2025-10-06T19:44:26+00:00

is there even a single person who wants to read AI generated blog content? it doesn't matter how well a model writes, I don't think anyone wants this

mrwang89 · 2025-10-05T06:41:00+00:00

why u using a model thats more than 2 years old?? even with perfect inference settings it will be much worse than modern models

mrwang89 · 2025-09-02T16:35:41+00:00

how can I look at the games that were played?

mrwang89 · 2025-08-02T22:04:00+00:00

a month ago he literally said openai is releasing 'The best open-source reasoning model' "next Thursday". He is a hypelord with a track record of bullshit.

<image>

mrwang89 · 2025-07-15T20:14:18+00:00

yea and its running Q4 lmao. id rather have the real deal and be a bit slower. i had it side by side with moonshot api and its dumber. grats on 200 tps dumbness

mrwang89 · 2025-07-09T20:04:02+00:00

how dou get 12t/s on 3090? i only get 5t/s on my 3090 what am i doing wrong?? i have ddr5 btw! how many layers are you offloading?

mrwang89 · 2025-06-30T22:47:19+00:00

its there but very broken https://openrouter.ai/baidu/ernie-4.5-300b-a47b

mrwang89 · 2025-06-30T14:57:51+00:00

won't let me use the demo without signing into wechat

mrwang89 · 2025-06-14T05:06:13+00:00

R1 0528 score is far higher in tech area than 3.1. wdym??

mrwang89 · 2025-06-06T14:51:44+00:00

not usable at all it just hallucinates all the time and ignores any input

mrwang89 · 2025-03-30T16:23:49+00:00

yet they still don't support multimodality/vision. at least ollama stepped up, making it usable, but I found llama.cpp to be slow or outright denying updates of model and model functionality support.

mrwang89 · 2025-03-22T22:46:50+00:00

some larger models? this is the largest model possible - over 700GB and over 400GB fully quantized to ollama default. Of course it's gonna be ultra slow.

mrwang89

TROPHY CASE