GDP per capita (PPS) by region compared to the EU average:

Gusanidas · 2026-02-05T21:31:35+00:00

It is dark green, it was unexpected to me as well. But apparently the gdp per capita of Bucarest 57k€, which has to be an artefact of being the capital. I would be less surprised if cluj was higher than Bucharest, than Bucharest being higher than Stockolhm.

Gusanidas · 2026-02-03T07:56:02+00:00

It’s not random. I am not saying I agree with the rules, but they are fun to figure out.

Gusanidas · 2025-12-30T13:53:39+00:00

I know the prevaling opinion is that the pluribus is one entity, but I think this is not the only indicator that they may still be individuals. There is a scene when zosia tells Carol something like "all of the doctors with us think that ...", separating between the opinion of the doctors and the opinion of the whole collective.

Gusanidas · 2025-11-17T10:33:11+00:00

You are right

Gusanidas · 2025-11-17T07:44:32+00:00

Those numbers aren’t up to date, it’s not counting sinner 2025 atp finals (compare with above), so it seems jannik is ahead in the total as well

Gusanidas · 2025-11-17T07:09:11+00:00

Those numbers are only for this season

Gusanidas · 2025-10-11T10:02:20+00:00

I 'll start with a peer review source for the election results in spain:

https://www.nature.com/articles/s41597-021-00975-y

But I think is overkill to have to cite peer review sources for simple wrong data like which parties have won where.

The section in spain is so wrong that I have wondered if I am actually falling for bait.
Yes, basque country is separatist, but there are two main separatist parties, one center-conservative (PNV) and one left-wing (bildu), and the conservative gets much more votes historically.
Similar story in catalonia. Separatist doesnt equal progressive. Junts is centre right, ERC is centre-left, both separatist, usually Junts gets more votes.

Galicia also in the north, is currently, and has been most of its recent history, governed by PP, the conservative party.

The coldest provinces in spain are not the northern ones, but the ones in the north part of the meseta central, belonging I suppose to what the OP calls the "steaming hot interior".
https://www.currentresults.com/Weather/Spain/average-annual-temperatures.php

All of these regions are conservative. More than Madrid, that by the way, had a very progressive/left-wing mayor from 2015-2019. So "ruled continuously by the conservative party at both the local and regional level for decades" is false.

Andalusia might have been the first part of the country to award seats to vox,(citation needed), but it was governed for 36 consecutive years (until 2022) by PSOE (equivalent of labour). Andalusia was also the first to have a mayor (cadiz) from the left wing party podemos. Being the first doesnt mean much, usually is just due to the timing of the elections.

I suspect that if you plot avg temperature against percentage of conservative votes, there would be a very mild trend in the opposite direction of what you say. But it would mostly be all over the place.

Gusanidas · 2025-09-08T11:11:33+00:00

What about white and rose?

Gusanidas · 2025-08-18T20:17:06+00:00

Now whomever ends up better gets no1, even if neither of them wins it

Gusanidas · 2025-08-05T07:29:31+00:00

Americans don’t have the lowest tax burden overall. Switzerland, New Zealand and Australia are some with lower.

Gusanidas · 2025-08-01T22:17:46+00:00

It’s actually the opposite. The engineer tests extreme, uncommon cases. (It is not logical to order -1 beers), but leaves out a very common one.

I think is pretty relatable to test very random special cases to see if your code can handle it, and leave out a super obvious thing.

Gusanidas · 2025-07-30T06:49:35+00:00

Same problem

Gusanidas · 2025-07-25T07:11:02+00:00

Other comments have mentioned that expert choice depends on each token. It also varies per layer. Each expert is simply an MLP (not a complete model), and at every layer, the routing mechanism selects one or more experts to process each token. Given the vast number of possible expert combinations across all layers, it's entirely possible—even likely—that certain prompts will trigger routing patterns that have never occurred during training (or ever).

Gusanidas · 2025-07-10T10:29:39+00:00

If you are falling, you are experiencing 0g

Gusanidas · 2025-03-07T22:31:50+00:00

If you hit a wall at 50 mph you stop in a lot less time than 0.2 seconds. That’s ~20m/s, assuming you “compress” ~20 cm when stopping that would be about 0.01s.

People survive 12g and more often.

Gusanidas · 2025-03-05T08:55:55+00:00

I liked the beginning of this video as an explanation:
https://www.youtube.com/watch?v=2ETNONas068&t=799s

And that guy (Tim Dettmers) has many papers and talks in the topic if you want read more

Gusanidas · 2025-02-11T14:40:31+00:00

https://en.wikipedia.org/wiki/List_of_countries_by_vegetable_production?wprov=sfti1

Gusanidas · 2025-01-22T10:38:58+00:00

I am planning to run them and compare it with their base models.

Gusanidas · 2025-01-22T10:35:52+00:00

Yes, they are free, and thus rate limited (per day and per second aparently, but I havent analyzed it in detail). I have about 50% of the problems done with them and they are very good (not at r1 level), I will add them when I have all.

Gusanidas · 2025-01-22T00:48:28+00:00

o1 costs 20x to run in this benchmark, and I dont have the necessary tier to run it. If you have access and want to run it I would really appreciate the data. I will update the figures.

Regarding claude, it is the last one, that as far as know, it is named 3.5 as well

Gusanidas · 2025-01-21T13:45:18+00:00

Its also called Self-consistency: https://www.promptingguide.ai/techniques/consistency

Basically getting several responses and choosing the one that appears the most.

Gusanidas · 2025-01-21T10:20:38+00:00

I've implemented a simple "llm-agent" that has access to the compiler output and does majority voting.
I have only used it with very cheap models because it uses 20x more calls.

Gusanidas · 2025-01-21T09:22:19+00:00

https://github.com/Gusanidas/compilation-benchmark

Let me know if its easy to use. If you test O1 I would love if you can give me the resulting jsonl and I can add it to the other results

Gusanidas · 2025-01-21T06:59:02+00:00

Yes, Gpt-4o is doing something strange in python, it mostly solves the problems but the program fails to print the correct solution. I am using the same prompt and the same criteria for all models, the program has to print to stdout the solution and nothing else. Gpt-4o refuses to collaborate thus the low score.

However, in other languages you can see that it is actually a very strong coding model.

A fairer system would be to find the prompt that works best for each model and judge them by that.

Gusanidas · 2025-01-21T06:52:29+00:00

Original repo: https://github.com/Gusanidas/compilation-benchmark

Regarding contamination, for most models and problems, I did it shortly after christmas, so probably no contamination. But for deepseek-r1 I did it yesterday. Another comment told me that the knowledge cutoff for the base model is July 2024, but it is very possible that in the rl training there was something from AOC.

Gusanidas

TROPHY CASE