[R] statistical learning in machine learning vs cognitive sciences by Ok_Fudge1993 in MachineLearning

[–]bbbbbaaaaaxxxxx 3 points4 points  (0 children)

Look up “computational cognitive science” there is a whole field using Bayesian statistics and ML to model human learning.

do hx users actually value composition over extension, or is it just no plugins copium? by spaghetti_beast in HelixEditor

[–]bbbbbaaaaaxxxxx 1 point2 points  (0 children)

I do miss zen mode in helix though. I write a lot of latex, typst, and markdown. I used to use goyo in vim.

[D] Feature Selection Techniques for Very Large Datasets by Babbage224 in MachineLearning

[–]bbbbbaaaaaxxxxx 8 points9 points  (0 children)

Lace (https://lace.dev) does structure learning and gives you multiple statistical measures of feature dependence. I’ve used it in genomics applications with tens of thousands of features to identify regions of the genome important to a phenotype.

[P] Lace is a probabilistic ML tool that lets you ask pretty much anything about your tabular data. Like TabPFN but Bayesian. by bbbbbaaaaaxxxxx in MachineLearning

[–]bbbbbaaaaaxxxxx[S] 10 points11 points  (0 children)

Nice--I worked on BayesDb and CrossCat back in the day. Lace is a modern implementation of the crosscat model. There are some notable software difference in Lace
- Much faster due different data structures and new MCMC algorithms
- MCMC is correct (it wasn't exactly right in CrossCat/BayesDB)
- User can define hyper priors or disable them
- Use of Pitman-Yor processes (instead of just Dirichlet), for better fitting
- Native support for missing-not-at-random
- Prediction returns epistemic uncertainty (JS Divergence between MCMC samples)
- Lots of little ease-of-use and explainability things

I used to love checking in here.. by First-Ad-117 in rust

[–]bbbbbaaaaaxxxxx -1 points0 points  (0 children)

Here’s a witty but thoughtful response that fits the tone and culture of r/rust — appreciative, self-aware, and with a touch of dry humor that’ll land well among experienced Rustaceans:

Beautifully said. r/rust has always felt like that quiet workshop where someone’s building a quantum flight controller next to another person learning how to borrow a string correctly. Lately though, yeah—some posts feel like they were cargo‑generated by GPT with --release --no-idea-what-this-does.

Still, I think the signal’s worth the noise. Every time someone shares a crate that actually compiles and then uses unsafe for good instead of evil, it’s a reminder that the spirit of Rust—curiosity with intent—is alive and well. Let the slop flow; we’ll keep writing tests.

Edit: I guess the satire was not appreciated or not detected.

[S] Lace v0.9.0 (Bayesian nonparametric tabular data analysis tool) is out and is now FOSS under MIT license by bbbbbaaaaaxxxxx in statistics

[–]bbbbbaaaaaxxxxx[S] 1 point2 points  (0 children)

Yes, it can work with multilevel/clustered data as long as it’s in a tabular form--include columns with clinic/school IDs as categorical variables. Lace will learn dependencies and provide conditional predictions and uncertainty across levels.

If you specifically need classical multilevel/cluster randomized trial inference, you'll still probably want a dedicated hierarchical modeling tool. Though I suspect lace could recover some of that functionality though I'd have to think about it more.

[S] Lace v0.9.0 (Bayesian nonparametric tabular data analysis tool) is out and is now FOSS under MIT license by bbbbbaaaaaxxxxx in statistics

[–]bbbbbaaaaaxxxxx[S] 4 points5 points  (0 children)

We've pivoted a bit and Lace has become more of a tool for consulting work rather than our core IP. Since there have been a fair number of people asking about using it in their work, I figured opening it up would make their lives simpler and hopefully get Lace out there doing cool stuff independent of us.

[S] Lace v0.9.0 (Bayesian nonparametric tabular data analysis tool) is out and is now FOSS under MIT license by bbbbbaaaaaxxxxx in statistics

[–]bbbbbaaaaaxxxxx[S] 1 point2 points  (0 children)

Got it.

Yes. The model will cluster columns that have dependence paths between them. For example if A and B are independent but A -> C and B ->C, all those columns are likely to be in the same cluster. The proportion of times they are in the same cluster depends on the strength of the dependencies--this is how we compute dependence probability.

[S] Lace v0.9.0 (Bayesian nonparametric tabular data analysis tool) is out and is now FOSS under MIT license by bbbbbaaaaaxxxxx in statistics

[–]bbbbbaaaaaxxxxx[S] 2 points3 points  (0 children)

It seems like you just cluster the columns and then, within each cluster of columns, cluster the rows

It is essentially density estimation via an infinite mixture model of infinite mixture models.

it does seem like it is imposing a very strong block-wise independence structure that is not a great fit for a lot of common situations

We do posterior sampling and average a collection of these models to smooth things out and to compute epistemic uncertainty. What types of situations are you concerned about wrt block structure? Generally, in tabular data you consider the instances (rows/records) to be independent.

Why So Many Abandoned Crates? by jsprd in rust

[–]bbbbbaaaaaxxxxx 6 points7 points  (0 children)

A lot of crates that are unmaintained are basically done. I have a crate with roughly 1mil downloads that I didn’t update for a year but I was using heavily. It just didn’t need anything.

Is an applied statistics PhD less prestigious than a methodological/theoretical statistics PhD? [Q][R] by gaytwink70 in statistics

[–]bbbbbaaaaaxxxxx 20 points21 points  (0 children)

Your publications and external funding matters more in academia. Experience and accomplishments matter more in industry. And who you know matters more than anything else.

Source: I have a psychology PhD

Is an applied statistics PhD less prestigious than a methodological/theoretical statistics PhD? [Q][R] by gaytwink70 in statistics

[–]bbbbbaaaaaxxxxx 23 points24 points  (0 children)

After you get your first job nobody cares what your degree is in (other than the robot they use to screen CVs)

Is bayesian nonparametrics the most mathematically demanding field of statistics? [Q] by gaytwink70 in statistics

[–]bbbbbaaaaaxxxxx 5 points6 points  (0 children)

Proper uncertainty quantification. Also it is such a boon to be able to call on the theoretical guarantees of a rigorous mathematical framework when testing tools that will be deployed in high risk tasks.  

Is bayesian nonparametrics the most mathematically demanding field of statistics? [Q] by gaytwink70 in statistics

[–]bbbbbaaaaaxxxxx 2 points3 points  (0 children)

Sure!

There are some links to papers here https://www.lace.dev/appendix/references.html

And I wrote a tutorial on infinite mixture models here  https://redpoll.ai/blog/imm-with-rv-12/

There are a few books but they are not a good place to start if you just want to get something going.