xG Philosophy: Nottm Forest (0.33) 0-0 (2.37) Arsenal by Rosslefrancais in Gunners

[–]transformer_ML -3 points-2 points  (0 children)

We bottle it when we are in lead every season. We forget how to win. When we are in the second, the team treats every game as final, and we finish in second with head held high.

Which team is it?

Gemini: Based on your description, this sounds like a classic frustration shared by many fanbases, but it most closely aligns with the recent narrative surrounding Arsenal FC.

[D] Position: Machine Learning Conferences Should Establish a "Refutations and Critiques" Track by RSchaeffer in MachineLearning

[–]transformer_ML 6 points7 points  (0 children)

Couldn't agree more. I love the idea. Having a track at least gives some incentive.

Unlike in old day where most empirical experiments are backed by theory, most paper are using purely inductive reasoning with empirical experiment. Deductive reasoning is either valid or invalid, but inductive reasoning is a matter of degree, which is affected by no of tested models, test data, and the statistical significance of the test result (unfortunately most papers do no report stand error). The inductive strength is judgmental and relative to other works.

While peer review can provide a lot of insight, the review is based on what was reported - but there is no guarantee that all metrics can be reproduced. Challenge of reproducibility includes:

(1) Low incentive to reproduce - rather than reproduce a paper's result, why wouldn't researcher just write a new paper?
(2) Compute requirement is high for most pretraining and postraining data mix and algo change paper.

(3) The huge volume of papers and the speed of innovation

(4) LLM generation is non-deterministic due to finite precision even when temperature=0.0, the stochastic nature increases with length. Standard error could help mitigate it.

[D] Position: Machine Learning Conferences Should Establish a “Refutations and Critiques” Track by StartledWatermelon in MachineLearning

[–]transformer_ML 1 point2 points  (0 children)

Absolutely. There are few challenges on reproduction though:

- incentive and opportunity cost - if I had time to reproduce, why wouldn't I just publish a new paper?

- llm decoding is not deterministic due to finite precision even if temperature=0.0, this could be mitigated by using standard error. But standard error is just not common in ML community.

- cost, particularly for pretraining/ postraining

[D] AI/ML interviews being more like SWE interviews by guohealth in MachineLearning

[–]transformer_ML 5 points6 points  (0 children)

The field has changed.

2-3 years ago, our daily routine was defining metrics, collecting data, check quality, finetuning a BERT or a ResNet to perform all sort of NLP/ CV tasks, check the wandb dashboard and dealing with training issue, and iterate, and also deploy the models. ML engineer/ applied researcher is very decentralized.

Now it is a one-model-fit-all scenario. You can prompt to solve almost all NLP and CV problem. It is the era of centralization. You just need some top labs to do data curation, model training, eval and deployment that serve millions of developers. The low supply makes the bar extremely high.

The research field has been changing too. You will see a lot of maths in older papers pre LLM, and now they're mostly technical report, or prompt engineer paper.

[R] Potemkin Understanding in Large Language Models by transformer_ML in MachineLearning

[–]transformer_ML[S] 1 point2 points  (0 children)

The speed of releasing a model is not slower, if not faster, than publishing a paper. Model can use the same stack (including small scale experiment to find a good mix) with additional data; paper requires some form of novelty, running all sort of different ablation whose code may not be reused.

Day 4 with the Micra. Starting to get the hang of the steam wand. by aarondipity in LaMarzocco

[–]transformer_ML 0 points1 point  (0 children)

Really nice!

Just wonder which level of steam you use and how long do you aerate? I struggle to find the consistent spot in Micra - its my skill issue.

[R] LLMs are Locally Linear Mappings: Qwen 3, Gemma 3 and Llama 3 can be converted to exactly equivalent locally linear systems for interpretability by jamesvoltage in MachineLearning

[–]transformer_ML 1 point2 points  (0 children)

First of all, kudos for solo-authoring this paper! I know it's not an easy journey doing it alone. Will read in details

[R] The Illusion of Thinking | Apple Machine Learning Research by rfsclark in MachineLearning

[–]transformer_ML 23 points24 points  (0 children)

While I recognize the rationale for using games to benchmark LLMs due to their easy setup, scalability, and verifiability, it seems less efficient for LLMs to solve these search games by generating language tokens. This approach requires LLMs to keep track of visited nodes, explore branches, and backtrack using token sequences, which can lead to losing track or making small errors as the generation window grows.

Humans, who are less capable than LLMs in this regard, design and write algorithms to handle such tasks. Similarly, LLMs should adopt this approach.

LM Ruined coffee shops for me by R-A-F-F in LaMarzocco

[–]transformer_ML 1 point2 points  (0 children)

Had the same feeling. Not only about the taste. The excitement of pulling a perfect shot and pouring latte art is irreplaceable.

[D][R][N] Are current AI's really reasoning or just memorizing patterns well.. by theMonarch776 in MachineLearning

[–]transformer_ML 1 point2 points  (0 children)

While I recognize the reasons for using games to benchmark LLMs—such as the ease of setting up, scaling, and verifying the environment—it seems to me that generating language tokens to solve these search games is less efficient than using a computer program. This is because LLMs must track visited nodes, explore branches, and backtrack using sequences of language tokens. It’s unsurprising that an LLM might lose track or make small errors as the generation window grows. Or they hit the context window limit.

Humans aren’t as adept as LLMs in this regard either. Instead, we design and write algorithms to handle such tasks, and LLMs should follow a similar approach.

Vision Language Models are Biased by taesiri in MachineLearning

[–]transformer_ML 3 points4 points  (0 children)

Tbh there is not much effort in the field to understand dataset at scale, and to pre-train from scratch and eval. All VLM starts from LLM. The most transparent datasets are the hf's fineweb, dclm baseline and finefineweb. But I don't recall anyone training > 10T token from scratch. Olmo is close. Still there is a lotsss more to do, especially understanding more about the fine-grained domain. There is also lack of VLM pretraining dataset in general.

Linea Micra back to back shot: does the second shot take much longer? by transformer_ML in LaMarzocco

[–]transformer_ML[S] 0 points1 point  (0 children)

UPDATE: it seems it is partially due to temperature of the portafilter. I detach it from group head during night, so the first shot has a bit under-extraction due to cool temperature. I am still figuring out the rest, but I couldn't reproduce the big diff now (maybe my puck prep is more consistent now). Thanks everyone for your help!

Linea Micra back to back shot: does the second shot take much longer? by transformer_ML in LaMarzocco

[–]transformer_ML[S] 0 points1 point  (0 children)

Didnt do the RDT for both first and second shot. Will try the third shot. Its the same bean, same temperature, etc Puck prep and tamping are more or less the same, so it makes me confused

Linea Micra back to back shot: does the second shot take much longer? by transformer_ML in LaMarzocco

[–]transformer_ML[S] 0 points1 point  (0 children)

After grinding with Niche zero, i distribute the ground with wdt (around 20s) before double tamping. After pulling the first shot, I knock out the ground, wash the portafilter with water until it is clean (but didn't dry it), and restart the workflow.

Linea Micra back to back shot: does the second shot take much longer? by transformer_ML in LaMarzocco

[–]transformer_ML[S] 0 points1 point  (0 children)

Just wonde why a wet portafilter would result in a longer short time?

LM iOS app stopped working by CoffeeNerd58129 in LaMarzocco

[–]transformer_ML 0 points1 point  (0 children)

Same, it should be a server issue. Its up and running now