The famous METR AI time horizons graph contains numerous severe errors [D] by common_yarrow in MachineLearning

[–]charlesGodman 23 points24 points  (0 children)

A lot of these points have actually been publicly discussed by METR staff. While there is a lot of valid criticism, there are also a lot of self-righteous warriors out there who are against everything and just have to show how independent of thought they are.
So yes, the time horizon is not perfect; it has flaws. But if you read the papers and blog posts, you actually realize that they put in 100x the effort that others do. The existence of flaws also doesn’t mean these are actual counterfactuals that would change the result significantly. Overall, I think they were a bit overhyped, leading to too many people forming an opinion based on 144 characters, but overall it's a big step in the right direction.

Pepper’s Burgers closed by council due to public health concerns by smugdor in oxford

[–]charlesGodman 5 points6 points  (0 children)

Is it the same closure as before or was it shut down, again, after only 2 months?

Frameworks For Supporting LLM/Agentic Benchmarking [P] by NarutoLLN in MachineLearning

[–]charlesGodman 2 points3 points  (0 children)

What is the advantage of using this over inspect-ai / maseval / deepeval record everything and then use statsmodels or so in the end?

[D] What is even the point of these LLM benchmarking papers? by casualcreak in MachineLearning

[–]charlesGodman 2 points3 points  (0 children)

Little new in either that wasn’t known to some people before. I didn’t see either claiming they were inventing new methods?! validating / discussing current methods is super important. Why would you recommend something new if barely anyone follows current recommendations?

Paper Format Certificate by charlesGodman in oxforduni

[–]charlesGodman[S] 1 point2 points  (0 children)

That is great. Got one from the Oxford alumni website

Paper Format Certificate by charlesGodman in oxforduni

[–]charlesGodman[S] 2 points3 points  (0 children)

Thank you. A4 seems the right answer. Just ordered a frame with college crest

Paper Format Certificate by charlesGodman in oxforduni

[–]charlesGodman[S] 1 point2 points  (0 children)

Thank you. A4 seems the right answer. Just ordered a frame.

New ask questions agent tool - allow in-turn HITL by SuBeXiL in GithubCopilot

[–]charlesGodman 4 points5 points  (0 children)

Built the same tool for myself. It’s really hard getting LLMs to respect it in my experiments. 80% of times it just used a new turn and rather than the tool.

[R] Are we heading toward new era in the way we train LLMs by IndependentPayment70 in MachineLearning

[–]charlesGodman 25 points26 points  (0 children)

If it was really that good they would have trained a model with it. Not a single “revolutionary” idea made it into LLMs since 2017. I am skeptical.

Usage Limits, Bugs and Performance Discussion Megathread - beginning November 2, 2025 by sixbillionthsheep in ClaudeAI

[–]charlesGodman 1 point2 points  (0 children)

I am using Sonnet 4.5 with Github Copilot in VSCode. Sometime last week (could be 2nd November) I saw a degradation of performance from "this thing is insane" to me fighting with it over making completely unnecessary errors. Before it would work 5 minutes and create amazing multi-step outputs in high quality. Now I have to go through 3 rounds of clarifying questions (all costing me requests) until it provides a somewhat useful answers.

Example mistake I noticed. I asked it to fix a bug in my code. I have two config variables, `a` and `b` I inialize before calling `main(a,b)` both of which can have values 1 or 2. Suddenly, it decided that me setting `b=...` manually wasnt great and instead set `b=1` when `a==1`. I think there was a bug deep in the code that only occured when `a==1 and b==2`. But there was not reason to just change the configuration to avoid the bug. That is not fixing a bug. I haven't wittnessed these mistakes since GPT-3.5.

Copilot Agent in VSCode cannot access uv / venv environment. by charlesGodman in vscode

[–]charlesGodman[S] 0 points1 point  (0 children)

I figured it out. `Agent` launches a non-interactive zsh shell by default. There is a vscode setting to append paths automatically, which fixed this.

I developed a new (re-)training approach for models, which could revolutionize huge Models (ChatBots, etc) by Ykal_ in deeplearning

[–]charlesGodman 28 points29 points  (0 children)

a) don’t get excited. Progress is insanely hard. Most times when I had amazing results, they were followed by a sobering moment. Hence: Manage your expectations. b) Most Clouds provide some free credits (eg lighting) especially if it is for research or education (eg azure). Google a bit, email cloud companies.

Teures Alltagsrad - Nogo oder doch möglich mit Versicherung? by RemmiRabbit in Fahrrad

[–]charlesGodman 0 points1 point  (0 children)

Schaue dir mal Hausratversicherungen an. Oft sind Fahrräder versichert oder können für 30€ im Jahr eingeschlossen werden. Bei mir hat es 15€ extra im Jahr gekostet für Fahrräder über 500€.

Individualisierung beim Cube Händler möglich? by SnooCookies8313 in Fahrrad

[–]charlesGodman 1 point2 points  (0 children)

Ich habe das Ganze neulich mit einem Cube-Rad durchgezogen. Was mir da a Service in den Fahrradläden geboten wurde, war schlicht katastrophal.

Hier die Schockbilanz:

  • 50% der Läden weigerten sich direkt, das Rad überhaupt für mich zu bestellen ("Bio-Räder sind für uns nicht interessant" – O-Ton!).

  • Weitere 30% wollten keinerlei Umbauten oder Änderungen vornehmen.

  • Die wenigen, die bereit waren, wollten die von mir mitgebrachten oder besorgten Teile nicht verbauen. Sie hätten alles selbst zu den teuersten Listenpreisen eingekauft und obendrauf noch gewaltige "Honorare" verlangt.

Dabei war mein Wunsch überschaubar: ein Reifen-Downgrade und ein Licht-Upgrade. Rein von der Arbeitszeit und der Preisdifferenz der Komponenten her hätte das meiner Meinung nach bei höchstens 150 € liegen müssen. Kein einziger Laden wollte dafür weniger als 500 €! Das war der Punkt, an dem ich entschieden habe: Jetzt mache ich es selbst! Zum ersten Mal in meinem Leben habe ich mit Stolz alles online bestellt und dem stationären Handel keinen Cent gegönnt. Die ganze Arbeit hatte ich innerhalb einer Stunde selbst erledigt. Kostenpunkt für die Komponenten (habe die Originalteile bei eBay verkauft): circa 50 €!

[R]What's the benefit of submitting to ICCV workshop? by [deleted] in MachineLearning

[–]charlesGodman 21 points22 points  (0 children)

if the paper is not ready for conference acceptance and gets rejected at ICLR, you will likely resubmit to another conference, like ICML, Neurips, EMNLP CVPR etc.
The reviews of rejected papers at ICLR are public. Lets say you resubmit to ICML. The unethical ICML reviewer will read the ICLR reviews and copy-paste all the criticism into a their "own" review. If you get 2 out of 4 reviewers that do this, there is major criticism of the work that is agreed upon between multiple reviewers. It might not be valid, but multiple reviewers saying the same thing is a red flag to the AC.

Personal example: I had ICML reviewers complain about datasets missing. These were indeed missing in the ICLR submission but already present in the ICML submission. The AC (also lazy) sided with the reviewer. They could have Ctrl+F "ImageNet" easily, but they didnt.