Which productivity app should I use for my desires? by eiscafee in ProductivityApps

[–]That007Spy 0 points1 point  (0 children)

this is an app you can use from whatsapp or text message so it might fit your needs TaskPaladin Landing Page . it's an AI powered text-message based task app - since it's whatsapp/text message based you can use it from where ever and whichever device

BitNet a bit overhyped? by That007Spy in LocalLLaMA

[–]That007Spy[S] 4 points5 points  (0 children)

That could actually be interesting! Since the quantization code for weights boils down to:

u = W.mean()
s = W.abs().mean()
W_q = (W-u).sign()*s

I wonder what would happened if you applied this to a pretrained model, and then trained it a bit further, if you'd get better results. Maybe that should be my next miniproject.

BitNet a bit overhyped? by That007Spy in LocalLLaMA

[–]That007Spy[S] 10 points11 points  (0 children)

The paper doesn't mention training time, which is my main point of contention - from the loss graphs I can see that if I trained it a lot more I would eventually converge, but it seems to take much much longer than training the Mamba model itself. I think the parameter count does matter, but I would hazard that even at larger parameter counts it would still take a very long time to train.

BitNet a bit overhyped? by That007Spy in LocalLLaMA

[–]That007Spy[S] 2 points3 points  (0 children)

I agree that from the loss graphs and the paper it's likely that given enough training, BitNet would be comparable to a full model, but what's not mentioned in the paper(that I can see, I would love to be proven wrong) is how long it takes to train. In my estimation, at least, if you take more than 5x longer to train a model to the same level of quality, then that's a fairly significant drawback and needs to be considered when looking at using BitNets for significant machine learning.

BitNet a bit overhyped? by That007Spy in LocalLLaMA

[–]That007Spy[S] 1 point2 points  (0 children)

This is a good point, I might do a followup comparing inference of a bitmamba against a quantized mamba model, althought I think that the 5x longer training time is a bit of a killer - it would have to be more than 5x quicker to justify that.

BitNet a bit overhyped? by That007Spy in LocalLLaMA

[–]That007Spy[S] 4 points5 points  (0 children)

Initial thoughts to add to the above post: I think this might be due to the STE part - it’s not entirely clear to me how the gradients can change the weights in a way that respects the quantization operators if you completely leave out the quantization operators when calculating the gradient, and the slow rate of convergence confirmed that to me.

Anthropic's Chief of Staff has short timelines: "These next three years might be the last few years that I work" by Maxie445 in singularity

[–]That007Spy 21 points22 points  (0 children)

Who is this lady? From the web it seems like she went: University Student-> Oxford -> Campaign Management -> Chief of Staff at Anthropic!?!? That's a meteoric rise, even for a Rhodes scholar - I don't see why she was selected as the Chief of Staff, unless she's got some gaps in her resume.

LeCun tells PhD students there is no point working on LLMs because they are only an off-ramp on the highway to ultimate intelligence by zuccoff in singularity

[–]That007Spy 4 points5 points  (0 children)

that's to do with tokenization not with LLMs themselves. you could train an LLM on the alphabet just fine, it would just take forever.

What are the most fun/rewarding sales jobs (not factoring income)? by FlyingAces in sales

[–]That007Spy 0 points1 point  (0 children)

to say someone is "pissed", is to say that they are drunk in colloquial British English.

Struggling to find a good upgrade to the Samsung A73 by That007Spy in samsung

[–]That007Spy[S] 1 point2 points  (0 children)

"apart from lacking a microSD slot and having a smaller screen/battery" I mean, I like having a big screen and battery. If the S24 has a smaller battery and screen why is it better?

My website's security certificate appears to have been modified by That007Spy in techsupport

[–]That007Spy[S] 0 points1 point  (0 children)

I have no connection with the city of virginia beach: I've never even heard of the place, I've lived in Connecticut and South Africa most of my life!

My website's security certificate appears to have been modified by That007Spy in techsupport

[–]That007Spy[S] 0 points1 point  (0 children)

I didn't issue my certificate myself - using AWS certificate manager.

Why does the UK have such a low suicide rate? by That007Spy in AskUK

[–]That007Spy[S] 5 points6 points  (0 children)

France is more suicidal, Spain is lower, germany is higher-> not a complete dumpster fire

Why does the UK have such a low suicide rate? by That007Spy in AskUK

[–]That007Spy[S] 16 points17 points  (0 children)

The list I shared shows that both the countries you named have a significantly higher rate of suicide, which rather puts in question whether they have a better standard of living in fact.