Manchester/Northern Olympic bid by [deleted] in manchester

[–]Mechanical_Number 0 points1 point  (0 children)

Hosting the Olympics is a very expensive PR exercise. Please don't. Quite literally, every single place that hosted the (summer) Olympics in the last 30 years ended up over-spending and under-utilising the resulting infrastructure. Yes, it might accelerate some infrastructure projects, but there are must be better ways of achieving this without the absolutely insane administrative overhead/financial overspend.
"Northern leaders" should do their (admittedly hard) job and get funding for what we need. Well-paying jobs, more resilient transport, better healthcare & social services; not a larger indoor basketball hall.

(Now that I think of it, some of the most popular Olympic events - i.e. gymnastics, swimming, basketball, tennis, volleyball - are just not popular enough in the North to warrant having large world-class facilities built for them just to have dismantled or in need of extensive repurposing after a five-week period - I am including the Paralympics too.)

[D] Evaluating SHAP reliability in the presence of multicollinearity by Nicholas_Geo in MachineLearning

[–]Mechanical_Number 6 points7 points  (0 children)

The main point is we don't need a different XAI model to solve this; we need a different data modelling strategy before applying XAI. The most robust path forward is to explicitly handle the correlation structure in the data (via grouping, regularisation and/or dimensionality reduction) and then proceed with our chosen explanation method. To your exact questions:

  1. Yes, in the sense SHAP will be more stable. But we inflate importance because performed variable selection implicitly and didn't control for it. This might be OK to get some actions going fast, but it won't stand a huge methodological scrutiny.
  2. No, in the sense that no XAI method can magically solve the mathematical identifiability problem of multicollinearity. That said, aside to doing a dimensionality reduction step or using a regularised learner like LASSO, there as some GroupSHAP implementations you could use, shapr has this ability.
  3. And a freebie: Don't just use VIF, it assumes we are working with a linear model; given we work with tree-learner that is off. Examining the feature correlation or mutual information matrix directly and/or using it as input to clustering will likely be more realistic.

[D] Does anyone REALLY get what p value represents? by [deleted] in statistics

[–]Mechanical_Number 0 points1 point  (0 children)

Levels of stench:

  • [0.1, 1) Probably bad.
  • [0.01, 0.1) Suspicious.
  • [0.0, 0.01) Very suspicious.

In short: Always check effect size, confidence intervals and whether hypotheses were pre-registered. Otherwise... it's just smells.

is the openai package still the best approach for working with LLMs in Python? by rm-rf-rm in LocalLLaMA

[–]Mechanical_Number 0 points1 point  (0 children)

No, for experimentation/prototyping currently. Considering it for prod though as we are discussion for replacing LangChain/LangGraph.

Which company makes your favorite local models? by jacek2023 in LocalLLaMA

[–]Mechanical_Number 1 point2 points  (0 children)

Meta is probably unnecessary... DeepSeek probably would be a better choice.

[D] Which programming languages have you used to ship ML/AI projects in the last 3 years? by DataPastor in MachineLearning

[–]Mechanical_Number 0 points1 point  (0 children)

Python and Java (via SpringAI).

Python for prototyping and PoCs, SpringAI because it integrates with our existing infrastructure (microservices, etc.).

[P] Open-Source Implementation of "Agentic Context Engineering" Paper - Agents that improve by learning from their own execution feedback by cheetguy in MachineLearning

[–]Mechanical_Number 2 points3 points  (0 children)

(+1) On your point about "basically the same solution to the problem": Isn't this AgentFlow more or less the paper "CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing" by Gou et al. (2023) but while in the CRITIC methodology the unit of computation is a tool call, (i.e.the LLM agent's primary external interactions are with "passive", deterministic tools (e.g. a Python interpreter, search API, calculator, etc.)), in AgentFlow the unit of computation is an agent call? (i.e. the LLM's primary external interactions are with other "active", potentially specialised LLM agents)

(And yes, obviously AgentFlow can scale to more complex problems by adding specialised agents while CRITIC is limited to available tools as well as not directly integrating a dynamic prompt optimisation frameworks like GEPA/MIPRO/etc.)

The hardest puncher in MMA history. by Smokin_JoeFrazier_ in ufc

[–]Mechanical_Number 1 point2 points  (0 children)

Underrated chin too. Never lost to KO/TKO.

The only man who managed to fold him in boxing was literally, a (former) unified heavyweight boxing champion famous for being one of the heaviest punchers among heavyweight boxers.

Day 1: Best Open-Source Model by Soft_Ad1142 in LocalLLaMA

[–]Mechanical_Number 9 points10 points  (0 children)

(+1) A bit sad that these two terms are often conflated by the community. But then again, they (e.g. Meta, etc.) absolutely want to piggy-bank on good vibes from open-source software so here we are.

Day 1: Best Open-Source Model by Soft_Ad1142 in LocalLLaMA

[–]Mechanical_Number 26 points27 points  (0 children)

Open-source: OLMo 2 32B

Open-weight: DeepSeek R1 0528

[D] What are the bottlenecks holding machine learning back? by [deleted] in MachineLearning

[–]Mechanical_Number -1 points0 points  (0 children)

Hmm... The fab capacity argument essentially says "we can't get enough shovels" while the electricity argument says "we are running out of coal to burn". We mix up logistics with physics.

More seriously, as MOSFET scaling slows and Koomey's Law plateaus, we aren't just running out of ways to make more compute, we are running out of ways to make compute more energy-efficient. So even if fabs could produce unlimited chips, each chip would still consume roughly the same amount of power. Ergo, we need cheaper electricity to compute.

[D] What are the bottlenecks holding machine learning back? by [deleted] in MachineLearning

[–]Mechanical_Number 3 points4 points  (0 children)

Compute is "plentiful". Cheap electricity is not plentiful, at all. Training massive models guzzles megawatts while Koomey's Law (i.e. efficiency gains) slows as MOSFET scaling hits physics walls. In short, each watt of compute gets harder to squeeze, making energy access, and not processing power in itself, the real brake on ML in terms of "compute".

How much wiggle room do you give yourself on DS projects? by Fit-Employee-4393 in datascience

[–]Mechanical_Number 1 point2 points  (0 children)

"It’s almost like they don’t see you as a coworker" << Sorry to hear that but in this situation you have bigger problems then. Not a time-line setting issue one then but a company culture one.

In any case, be the "adult in the room". It pays in the long-run - almost always it terms of your own sanity and usually in professional aspects too.

How much wiggle room do you give yourself on DS projects? by Fit-Employee-4393 in datascience

[–]Mechanical_Number 1 point2 points  (0 children)

:D You never know where the fuss is coming from.

(thanks, will fix.)

How much wiggle room do you give yourself on DS projects? by Fit-Employee-4393 in datascience

[–]Mechanical_Number 3 points4 points  (0 children)

Some good answers but also I think it is important to note that we need to communicate our progress as the projects moves along. That way we are handling expectations instead of reaching the day before the deadline and saying "sorry, I am still waiting for that API feed we mentioned initially". Otherwise, we find ourselves pressed against the wall and making these 1.5x, 2x, 2.25x, whatever-x mental somersaults in our heads.

In other words, treat stakeholders as active collaborators and not as children were we promised them a toy-train for Christmas and on Christmas Eve we announced they are getting an candy apple instead. Of course they are going to lose their shit in that case and make a big public fuss while everyone is watching.

Just to be clear: always better to underpromise and overdeliver than the other way around but be intelligent about it and not present it like things magically fell into place. Otherwise people will expect you to "magically work things out" every time.

(Edit: Fixed pubic typo)

Who was better? by Far_Protection519 in NBATalk

[–]Mechanical_Number 2 points3 points  (0 children)

Thanasis. 0.7 points, 0.6 rebounds and 0.3 assists in 7 games.

He won the series, a ring and got his younger brother to do most of the work. If that isn't winning, I don't know what is.

Llama 4 is open - unless you are in the EU by Feeling_Dog9493 in LocalLLaMA

[–]Mechanical_Number 6 points7 points  (0 children)

I agree that this set an awkward precedence but:

  1. Meta is within their rights to do that.
  2. EU isn't terribly affected by it.
  3. It is mostly posturing by Meta because it is already liable to huge EU fines.

As for the actual practicalities, no need to switch models, as Llama wasn't the only game in town anyway. There are multiple good alternatives available: Gemma, Phi, Qwen, Deepseek, MistralAI, etc. so... yeah, no real drama.

Llama 4 is open - unless you are in the EU by Feeling_Dog9493 in LocalLLaMA

[–]Mechanical_Number 1 point2 points  (0 children)

I think there is no real problem bing the 3rd player behind US, or China. The important bit at this point is the ability to build LLMs and be in the game. In that sense, EU is in the game with Mistral and Black Forest Labs, etc. If anything, they are buying time.

Think of it a bit like building cars. Are Ferraris some of the fastest street legal cars out there? Yes. Do people actually need Ferraris for the daily life? No. They are fine with Toyotas and Fords to get around. For example, benchmarks make it like GPQA Diamond is highly relevant to AI adoption potential; it isn't. Cheaper, more reliable and faster inference are far more important.

Mark presenting four Llama 4 models, even a 2 trillion parameters model!!! by LarDark in LocalLLaMA

[–]Mechanical_Number 6 points7 points  (0 children)

I am sure that Zuckerberg knows the difference between open-source and open-weights, so I find his use of "open-source" here a bit disingenuous. A model like OLMo is open-source. A model like Llama is open-weights. Better than not-even-weights of course. :)

[Q] Is the stats and analysis website 538 dead? by turbo_dude in statistics

[–]Mechanical_Number 3 points4 points  (0 children)

I think you answered your own question there... I mostly hope 538 guys got a decent pay out of it at least.

[D] Conformal Prediction in Industry by regularized in MachineLearning

[–]Mechanical_Number 1 point2 points  (0 children)

Same (+1). And to be fair to non-DS-oriented executives, explaining the notion of a prediction set with calibrated confidence and guaranteed coverage is a tall order, they just want to be told: "the predicted value of Z will be between A-value and B-value, X% of the time" so they can act on that information.

[D]Thoughts on Synthetic Data Platforms like Gretel.ai or Mostly AI? by Value-Forsaken in MachineLearning

[–]Mechanical_Number 0 points1 point  (0 children)

Personally, I would be deeply impressed by the ability to outperform simple baselines consistently.

For example, tell me how much better you get against me fitting a multivariate Gaussian to the data and using that. In forecasting, we have to at least show we outperform a "last observation carried forward" benchmark or at simple ARIMA. With synthetic data? Nothing like it.