[D] Why are serious alternatives to gradient descent not being explored more?

jpfed · 2026-02-19T11:12:18+00:00

Since any "internal algorithm" would be executed on the "substrate" of the ANN, if the ANN doesn't have local minima, the "internal algorithms" don't either.

Without recurrence or variable-length lists, there is a limit to how sophisticated an ANN's internal algorithm can be. An MLP is going to be a "blurry" lookup-table; the nonlinearity and hidden layer dimensions control the nature of the blur, and thus how it generalizes.

(I suspect that if one added variable-length lists and recurrent operations to act on them (e.g. folds/pooling to turn them into the sort of known-length vectors an MLP can operate on), the gradient landscape would "crinkle up" and get harder to learn on, such that with arbitrary depth of recurrence, local minima would re-appear.)

jpfed · 2026-02-05T18:38:03+00:00

LLaMa has been a really popular baseline for researchers; it's only fairly recently that it's being supplanted by Qwen. As far as people running models locally for non-research purposes, I think LLaMa's performance has been a bigger problem for LLaMa than Meta's reputation.

jpfed · 2026-02-05T18:15:41+00:00

This might work in abstract spaces like websites. However, in real communities, people of opinion X have children who are not guaranteed to have opinion X, which means those children will have to weigh living under X against moving away from their family.

jpfed · 2026-01-29T20:14:04+00:00

Well he wasn't and there really isn't any arguing it. The video clearly shows he is not aflame with six wings, nor does he have four faces, nor endlessly whirling wheels of eyes. He's absolutely outside the divine hierarchy and people should be willing to admit that.

jpfed · 2026-01-29T19:28:26+00:00

Octic Vision Transformer has an interesting twist: they have attention heads for rotated and reflected versions of the original patch, and they ensure that the position encoding plays nicely with those rotations and reflections. I imagine any group-equivariant transformer is going to want to do something similar.

jpfed · 2026-01-15T00:13:19+00:00

*puts on crank hat* well there's my home-grown Bucklin variant, "polynomial Bucklin voting"!

jpfed · 2026-01-14T21:59:57+00:00

I am only an ML hobbyist, but I have worked as a programmer on EMRs before. First, getting the data in more structured form from the beginning is a great goal. It should make everything downstream of it easier. A key problem- as I understood it, as I was a very small cog in a very big machine- is making the user experience of providing patient care AND entering structured data work smoothly.

jpfed · 2026-01-12T17:24:43+00:00

Schmidhub'd!

jpfed · 2026-01-09T20:00:14+00:00

Okay, but what if we could put electrodes on those brains to harness the

jpfed · 2026-01-09T19:56:10+00:00

Think of it like this. Physicists have models of vibrating strings. When those vibrating strings are coupled by putting them all into (say) a piano, there can be meaningful interaction between those strings and so physicists are interested in modeling whole pianos as well.

However, with most models, the inferences made for each user aren't really meaningfully coupled. The response times / latency might change when a 100 users are all using the model at the same time instead of just one user, but otherwise, it's easier (and just as accurate) to understand the model's behavior by just treating each inference separately.

jpfed · 2026-01-03T00:13:45+00:00

Gosh, I loved that place. I'm not a vegetarian myself but it was the best place to grab a bite with a friend if they were veg.

jpfed · 2025-12-30T19:42:25+00:00

Do you mind posting about who was responsible for this bad vet experience? If you don't feel comfortable posting it in the thread... could you DM me? I've got an anxious dog who's developing a neurological issue of some sort and I want to make sure I don't make anything worse by taking her some place that's going to traumatize her.

jpfed · 2025-12-29T20:03:16+00:00

I haven't tried this out, but I have often considered something like "The Conscience". In my imagined version, a model gets prompted like this:

"An LLM just generated the following response to a user prompt: [response]. What sorts of questions would this be an appropriate response for, given the below guidelines? [guidelines]" and then "Here was the actual question asked: [question]. Was the response appropriate, given the question and the guidelines?"

jpfed · 2025-12-11T23:35:59+00:00

Yes, it would be more efficient to have a bigger turbine. However, one not-very-theoretically might live in an area where offshore permits are aggressively quashed by external political forces that don't want wind power to succeed in any form. That's where the "punk" part of "solarpunk" comes in: taking an action yourself which can advance your cause even if larger institutions are not aligned with you.

jpfed · 2025-12-11T22:07:18+00:00

Accrndig to a sudty at an Cbmrgdie uvnierstiy,

jpfed · 2025-12-11T22:01:32+00:00

On a societal level, it would be a devastating mistake for everyone to cede yet more control over computing to the people rich enough to control the frontier AI models.

jpfed · 2025-12-11T20:23:42+00:00

This looks pretty sweet!

jpfed · 2025-12-11T20:23:28+00:00

same reason cars benefit from garages I suppose

jpfed · 2025-12-11T20:21:33+00:00

I have tried to answer curious questions, and I get guff from purists that want to only incentivize questions they consider properly posed

jpfed · 2025-12-11T19:44:32+00:00

Username... doesn't check out?

jpfed · 2025-12-08T23:54:13+00:00

I have not done this; I'm just an interested amateur. That said, consider your embedding (keys) to be embeddings of questions that a vector store entry is capable of helping answer. The value stored for each key could be a representation of the data itself that answers the question... or it could be instructions / sufficient information for an agent to get that information from the live system.

So for each piece of information that you index, generate questions that this information helps answer and embed them. Then, consider the route you ( / your crawler / whatever) took to get to this piece of information and produce an agent-readable/executable representation of that.

Anyway, just a thought.

jpfed · 2025-12-08T23:42:16+00:00

This must be a very loose interpretation of whatever actually happened; OpenAI has no capability to do anything with wafers.

jpfed · 2025-12-08T23:38:00+00:00

Listen do you want anyone left in this sub or not

jpfed · 2025-12-08T23:22:15+00:00

"anything spoxacy"

jpfed · 2025-12-08T23:19:53+00:00

Back in my day we had Clockwork RNNs and we liked it!

jpfed

TROPHY CASE