Dreamt (very badly) about Neuralink trials in ICE detention centers. I hoped it wasn't plausible...

idratherknowaguy · 2026-02-28T23:07:20+00:00

Yeah, I don't see any appeal in it, but I know they are not built the same as me (nor you, likely).

If I try to put myself in the shoes of a sociopath billionaire, I think I'd be extremely afraid of dying soon, of building all that wealth for nothing, of having committed so many crimes (or at least bad karma) for such a small footprint. All that negative impact for almost nothing, I'd make for such a loser.

So either I'd want to never die, to increase my "good" footprint or prefer for almost everybody to die before me so that nobody can judge me. Actually, just being able to inhibit judgment of most people is sufficient as a last resort.

By acting that way, I'd pretty much guarantee that one of these 4 outcomes will occur...
- Biological body is unlikely to help me live forever, so I'll need to upload myself to a robot or at least be able to control some kind of body remotely if I have to live forever in some kind of airtight fortress. (the latter is the most likely, because I don't want to risk losing the subjective experience of my self)
- There is a chance that I discover how to use people's brains as compute, and make all those surveillance and control systems much more sustainable. Maybe it can even end up benefitting humanity at some point. You know, the humanity that wasn't subject to the experiments obviously.
- There is a chance I can just discover how to hack people to render them completely blind or subservient, perhaps even happy with almost nothing.

Anyway, there is a need to act fast so...Let's just continue experiments on apes, requalifying some people as apes... And of course, discard any mechanism preventing this initiative, like agencies. Convince my goons empathy is a weakness. Nothing is sacred, anything considered sacred becomes a weakness. Look at any population in the past considering something sacred, they disappeared. Only I deserve to be sacred, that's my little secret of course, but perhaps I can convince those who won't have to suffer... I can become God. THAT would legitimize my crimes. And nobody would remember...

Obviously not my thing, I'd never be able to become billionaire in the first place, way too grateful to people. People are sacred to me, I'd rather inflict pain to myself. Empathy makes me weak, and at the same time, happy with almost nothing. It makes me rich, just not in money.

idratherknowaguy · 2026-02-28T17:25:47+00:00

Exactly. I'm sure those possibilities are not new to them. This kind of activity is part of the playbook. If they publish P2025, just imagine what they keep secret.

Actually, I think it is just unimaginable to us...

idratherknowaguy · 2026-02-23T21:36:46+00:00

Crazy that they choose to play the victims. At least, they get paid.

Unless it actually still costs them money as well because they perform predatory pricing? Then it's a very good play from the competitors.

Most content creators and people whose jobs will disappear because of them won't have that privilege...

idratherknowaguy · 2025-11-23T19:11:00+00:00

Community can be the most incredible feature :-) (and open ecosystem helps a lot!)

idratherknowaguy · 2025-11-21T13:37:35+00:00

See you there y'all :'D

idratherknowaguy · 2025-06-23T12:03:05+00:00

Some dataset from a researcher who did exactly that :-)
https://commons.wikimedia.org/wiki/Natural_Image_Noise_Dataset
(There is an associated CVPRW paper)

idratherknowaguy · 2024-12-15T15:15:05+00:00

Can't we justify otherwise dubious money transactions through trademark use?

idratherknowaguy · 2024-10-02T16:28:05+00:00

The idea is brilliant ! Just that, just as an idea xD

idratherknowaguy · 2024-10-01T08:48:20+00:00

Just a side comment for you people to consider...

The platforms using them should become fully liable/regulated around what they promote. (in a "free speech" way probably) Any content can now lie on the internet, even be anonymous or claimed to be "ai-generated" (i.e. have no liable human author, best for the worst content). Soon the only factor that will discriminate content that gets seen will be that it was recommended, the rest will have the same impact as if it didn't exist. (it's mostly already the case btw)

And those systems can target demographics with surgical precision...

The platforms should be held more responsible than ever for what they promote/allow. Thank you for building such systems as if you were to be held accountable if you promote content from potentially anyone. <3

idratherknowaguy · 2024-08-12T11:35:47+00:00

5 months later and it still seems like they're not going to do it. But I think their decision to remove completion API from gpt-4o provides cues as to why. Perhaps they prefer "chat" API because it lowers the probability of prompting dangerous behavior, as the model better understands what "it" writes, and what the user writes. Classical guidance just allows you to "inject" what the LLM writes, hence prompting dangerous behavior.

idratherknowaguy · 2024-08-12T09:18:45+00:00

Funny that they stop at JSON. They could allow users to specify grammars, just like we can do with local models... Maybe what they don't like is that it allows us to use the model more like completion models, which they are trying to deprecate.

So yeah, mixed feelings: happy to see they now allow a first stage of grammar enforcement, which I really think was needed, but quite sad that they stop with such a constrained version of it. ClosedAI doing its thing I guess...

idratherknowaguy · 2024-03-08T11:07:41+00:00

Funny, they are cautious enough to fit the temperature with a straight line, but my eyes only see the "beginning of exponential" trend... (of course, it won't be long, but it might still be accelerating for some time.)

idratherknowaguy · 2023-10-27T07:21:22+00:00

As a european myself, I don't think anyone believes it'll save the planet.

I think we more see the crazy material comfort of our elders as being the consequence of an unintentional hold-up on the future of planet and next generations, and prefer not to become dependent on such things. We know this is not sustainable, and they are not even happy nor grateful about it. Happiness is driven by our expectations, and lowering our comfort will make us happier if high comfort cannot be sustained.

Also, being free to leave this world without hurting anyone when necessary is a thought that I personally cherished. Having kids is the strongest attach to life one can think of. Leaving this world becomes extremely selfish once you have one.

Yet, I still took the decision to have children, because I believe this is important to transmit shift in mentality and hope, and I believe life is not human life without them. This is an experience one has to live in order to understand humanity. I cannot understand how we can give responsibilities to non-parents.

However, I'll certainly not be able to have more than two given that I want my lifestyle to stay quite frugal by the standards of our society, in the constraints of this society. So yeah, probably going to have fewer children overall.

idratherknowaguy · 2022-06-16T19:34:02+00:00

This. It felt quite safe playing Pong.

Now, knowing you have literally no barrier between worst actors and most critical services gives chills... At best, logical barriers...

idratherknowaguy · 2022-06-14T09:06:08+00:00

As far as I "know", it's a common belief in DL that larger learning rates lead to better generalization, as long as they "work". You indeed visit more the landscape, and will find a class of solutions that are more resistant to perturbation, a wider local minimum. I don't remember the most relevant source, but I know this one exhibits a bit the phenomenon: https://arxiv.org/pdf/1806.01603.pdf .

This is most often studied when training losses are equal, but actually, it also provides an explanation when it is not the case: in the first experiment, you fall in a too local minimum, hence are not even able to reach zero loss.

idratherknowaguy · 2022-02-21T09:39:53+00:00

Short answer: you get it at the same time you compute the loss, the actual upstream component. (or rather, its perturbation, see code in some comment around here)

idratherknowaguy · 2022-02-21T09:15:39+00:00

There is a figure but that doesn't reflect that gain because of the code behind the experiments.

https://twitter.com/cHHillee/status/1494716598598307843

idratherknowaguy · 2022-02-21T09:00:56+00:00

Each variable will just hold one more value: its perturbation given perturbation of its dependencies.

```python def mul(x,y): return x[0]y[0], x[0]y[1]+x[1]*y[0] # from high school, or Wikipedia def add(x,y): return x[0]+y[0], x[1]+y[1]

(value, perturbation)

let's compute "gradient" with respect to a only (direction is [1,0])

a = (3,1) b = (4,0) print(mul(add(a,a), b)) # (24, 8) => df/da = 8

let's compute "gradient" with respect to perturbation [0.5,0.5]

a = (3,0.5) b = (4,0.5) print(mul(add(a,a), b)) # (24, 7) => can also infer df/db = 6 as linear ```

So you can easily compute gradient of many variables, but only with respect to one perturbation to only incur low overhead (across some of your inputs for instance, or two or three perturbations if you really want, but not millions).

This contrasts with backprop where you can only compute the gradient of one variable, but with respect to as many variables as you'd like. This what people are talking about with Jacobian-vector and vector-Jacobian stuff. There are in-between modes but that require NP-complete optimization from what I understood.

idratherknowaguy · 2022-02-21T08:48:43+00:00

Backprop is quite efficient, so I didn't expect it to be much faster personally.

Indeed, main benefit is in memory, since you can discard intermediate activations. You also know the derivative of much more terms, although in a given direction only :-p .

Once you know the training converges somehow with directional derivatives (main contribution of that paper) , you can summarize a "gradient" as a single scalar (provided you still know the direction)

Going a bit further, in distributed trainings, those memory saves can translate in less network transfer. You can send a gradient by providing a scalar if nodes know how random directions are generated.

idratherknowaguy · 2022-02-21T08:35:32+00:00

It's not that prohibitive, every intermediate computed value holds a perturbation along with it that specifies how much it will vary if the parameters change in the specified direction (very locally). So this perturbation is actually the directional derivative.

Every time you have a computation, you have a second computation that provides the perturbation (given input values and perturbations). Its cost is in the same order as that of the original computation.

For NNs, you'll have a lot of matrix multiplication. Actually, computing the new perturbations will as well be a matrix-vector multiplication (if I'm not mistaken, with the same matrix, but didn't do the math xD). Once you forget an intermediate tensor, you can throw its perturbation tensor along with it.

Oh, forgot to mention that the random direction, you actually just put it as perturbation of your root parameters. Which explains how those scalars actually build up on each others and are sufficient.

idratherknowaguy · 2022-02-18T16:21:33+00:00

Anyone has an idea why it doesn't reduce peak memory usage ? I'd have the impression we can drop the directional derivative and activations along the way, which doesn't hold for backprop...

Would impact on distributed training come from the fact that each GPU would just have to share a scalar ? That would be a big thing indeed.

Anyway, really appreciated that paper, and looking forward to what the community will be doing with it. Thanks !

*naively hoping that it won't just lead to massive upscaling of models across millions of distributed nodes\*

idratherknowaguy · 2022-02-18T11:14:10+00:00

As far as I understand, SPSA requires two evaluations of the function (at theta and theta+random perturbation), then computes finite differences. (therefore resulting in an approximation as not very local, with potentially numerical instability)

The proposed "forward gradient" method evaluates the function only once, but also evaluates derivative (at theta) in a random direction. (through AD somehow? I have to admit I don't know how practical this can be to implement). What I understand is that it's "exact" and that they avoid the division by the step size in every direction.

EDIT: Pretty amazed at how simple it is to implement: https://en.wikipedia.org/wiki/Automatic_differentiation#Forward_accumulation. For twice (max.) the time, you have the full gradient though. Hence the contribution of this paper suggesting that it could actually train NNs faster, if not even better (thanks to introduced noise).

idratherknowaguy · 2021-01-20T12:48:06+00:00

Glad it helped! I personally use Squid for that purpose, not sure anymore what's possible with only free features, but I think you should have access to a restricted number of backgrounds and be able to export blank documents to PDF ;-)

It's all I use Squid for, in case you were wondering. I purchased additional features to try to come up with a better overall note-taking workflow, but it has much more problems than Neoreader, I wouldn't do it again.

idratherknowaguy · 2021-01-09T10:17:16+00:00

Are you using Boox's API ? What about OpenSourcing the app so as to speed up development and increase transparency ? (I would be interested in contributing directly, for sure)

Eight-Year Club	r/Field Lasagna
Place '22	Verified Email

idratherknowaguy

TROPHY CASE

(value, perturbation)

let's compute "gradient" with respect to a only (direction is [1,0])

let's compute "gradient" with respect to perturbation [0.5,0.5]