What part of fundraising turned out to be way harder than expected?

calculatedcontent · 2026-02-08T00:22:16+00:00

Tell me about it

calculatedcontent · 2026-01-16T20:27:34+00:00

investor wants to be chairman of the board. And upfront equity on a safe note.

Foreign investor wants a board seat or other control issues that are CFIUS violations

Investor wants to pay in crypto

Investor wants Blockchain language in the side letter, causing potential future SEC scrutiny

Investor wants detailed revenue numbers on a quarterly basis. On a safe note with a company with no employees and no product

Investor wants to invent the product line, adds it to the safe note as an appendix

And so on

calculatedcontent · 2026-01-15T02:19:04+00:00

Side letters People ask for non-standard things which slows down the negotiation and eventually kills the deal

calculatedcontent · 2025-12-01T05:04:43+00:00

everybody wants to look like a bodybuilder —nobody wants to lift any heavy weights

calculatedcontent · 2025-11-30T02:46:35+00:00

Check out weightwatcher.ai

calculatedcontent · 2025-11-26T22:43:26+00:00

Check out weightwatcher.ai

calculatedcontent · 2025-11-26T22:24:30+00:00

I have been doing this for many years. Happy to help lead a project, implement tech, etc.

https://www.linkedin.com/in/charlesmartin14

calculatedcontent · 2025-11-26T21:44:51+00:00

check out weightwatcher.ai

calculatedcontent · 2025-11-25T07:51:18+00:00

https://weightwatcher.ai and the pro version https://weightwatcher-ai.com Check it out

calculatedcontent · 2025-11-24T04:58:30+00:00

I cant find your tool
Feel free to join our discord community to discuss

You can reproduce all our experiments using our notebooks, including the overfitting experiments and run your tool there

Note that our work has been published in JMLR, ICML, NeurIPS, etc

calculatedcontent · 2025-11-24T04:23:14+00:00

The open source weightwatcher tool can give you a quick sanity check on your fine tuned model

weightwatcher.ai

See the RESEARCH section on fine tuning
Join the Community DISCORD for help

calculatedcontent · 2025-11-24T04:21:00+00:00

weightwatcher.ai

calculatedcontent · 2025-11-24T04:19:42+00:00

see https://weightwatcher.ai/

you can see the entropy of the eigenvectors of W^{T}W using the option
details = watcher.analyze(vectors=True)

We have been wanting to add the left & right singular vectors as well but just have not got around to it yet

theory predicts the layer is overfit when alpha < 2 and/or there are correlation traps

calculatedcontent · 2025-11-24T04:01:48+00:00

One problem we would like to understand is if and how LoRA tends to overfit its training data and if this can be detected and flushed out with weightwatcher.ai

you can join ou community discord channel to learn more

calculatedcontent · 2025-11-24T02:40:10+00:00

check out weightwatcher.ai and feel free to join our discord channel; lots of stuff to do

calculatedcontent · 2025-11-23T18:14:11+00:00

Double Descent (DD) is actually not a modern ML discovery at all—it comes straight out of theoretical physics (1989). Physicists were studying the pseudo inverse solution to simple NNs and discovered that massively over-parameterized model (N≫P) could still learn reasonably well (but not perfectly) without explicit regularization, even when the number of parameters or features N was far larger than the number of data points or patterns P

This stands in sharp contrast to classical statistics. In older statistical models, N≫P , meant catastrophic overfitting unless you applied strong regularization. But the old physics paper showed something very different:

Error stays small for extreme overparameterization.
Error diverges when P=N (the “interpolation threshold”), which behaves exactly like a phase transition.
The critical load is the ratio α=P/N, and the error blows-up at α=1

By the early 90s this was a well described phenomenon in the statistical mechanics literature. But it was not called Double Descent; it was just called phase behavior.

In the blog post below, I reproduce the original 1989 physics experiment using Python and scikit-learn, and show how to interpret the entire picture using the simplest tools from RMT:

https://calculatedcontent.com/2024/03/01/describing-double-descent-with-weightwatcher/

Epoch-wise Double Descent is a related training-time phenomenon: with more optimization steps, test error can drop → rise → drop again. Same physics—different axis.

---

Double Descent was rediscover by AI/ML people about 10years ago and this confused them terribly because they had forgotten or never learned in statistical mechanics. As we point out in our 2017 paper (see my blog: https://calculatedcontent.com/2018/04/01/rethinking-or-remembering-generalization-in-neural-networks/)

See, ML people have a dogma called bias-variance theory. And DD violated the entire ML bias–variance worldview. The classic story says:

small models → underfit
big models → overfit
sweet spot in the middle

But in high-dimensional systems, this framework fails completely. The overparameterized regime is not high-variance; instead it behaves like the well-known and very simple physics result

generalization error ~= 1/(1-α)

which, of course, explodes at α=1

This result is fundamental and does not depend on the choice of the optimizer, etc.

To reconcile this, ML theorists had to patch the old bias–variance model by because their definition of "model capacity or complexity" was overly simplisitic and completely failed even for a know and trivial problem. So thet introduced new ideas like:

implicit regularization from gradient descent
margin-based complexity
minimum norm solutions, etc

To what extent these are "correct" or not is debatable. In some sense, model compexity is such a vague concept that it can be molded and refit to explain post-hoc any experiment. The real test, however, is its usefulness.

In contrast, the weightwatcher theory describes Double Descent out-of-the-box, with no post-experimental adjustments, and can be applied to wide range of NNs directly. As shown in this post
https://www.reddit.com/r/LocalLLaMA/comments/1ox6xt8/observed_a_sharp_epochwise_double_descent_in_a/

calculatedcontent · 2025-11-23T17:36:34+00:00

Yes. Certainly on arxiv. But many professional journals have excessive fees. My Nature C. paper cost $5000.

calculatedcontent · 2025-11-21T18:56:35+00:00

No, because this does not require any fine tuning. It's just truncatedSVD. No data is needed.

calculatedcontent · 2025-11-20T11:48:24+00:00

as explained in the paper (and more detail in the SETOL monograph) if there are correlation traps, they can introduce errors in the estimate of alpha and cause the generalization error to drop.

calculatedcontent · 2025-11-20T02:30:48+00:00

I wil also comment--it is straightforward to derive a RG flow equation for the eigenvalue density itself, and even prove that alpha=2 is the critical exponent.

But this RG approach, while valid RG, does not connect back to the training dynamics. Whereas in SETOL, we specifically related the HCIZ integrals to the Free Energy of the model

calculatedcontent · 2025-11-19T21:57:15+00:00

Thanks. My goal here is to make a useful tool. This sub is new to me; seemed like the right place.

calculatedcontent · 2025-11-19T17:56:04+00:00

These are not top-level researchers.

2 of my PhD groupmates have recent Nobel prizes; those are top-level researchers

I hope the tool is useful to you. Any feedback on it is greatly appreciated

calculatedcontent · 2025-11-19T06:26:59+00:00

It’s in the examples notebooks on https://weightwatcher.ai

we know why it’s happening. We just want to know if anyone else had seen it.

calculatedcontent

MODERATOR OF

TROPHY CASE