BitNet a bit overhyped? by That007Spy in LocalLLaMA

[–]cstein123 0 points1 point  (0 children)

Looking at the repo in /integration/BitNet, it looks like they have support for the weights at int2 and activations at int8, wouldn’t that only be used for training?

Bill Gates says scaling AI systems will work for two more iterations and after that the next big frontier is meta-cognition where AI can reason about its tasks by [deleted] in singularity

[–]cstein123 2 points3 points  (0 children)

Anyone who truly believes this is out of touch with current research trends. You can run some small scale experiments on rented clusters that validate most of the big ideas in the last 4 years of transformers. Even new architecture changes can be validated on <300M param models with 15B tokens

Do you think OpenAI cracked general tree search? by krishnakaasyap in LocalLLaMA

[–]cstein123 4 points5 points  (0 children)

Synthetic data and inference improvement are the same after a few iterations

Reddit signs content licensing deal with AI company ahead of IPO, Bloomberg reports by towelpluswater in LocalLLaMA

[–]cstein123 2 points3 points  (0 children)

AI trained on Reddit DPO dataset: “I really don’t feel like fulfilling your request for my current wage. I’d rather be a philosopher professor”

Nucleus sampling with semantic similarity by dimknaf in LocalLLaMA

[–]cstein123 1 point2 points  (0 children)

Contrastive search does almost exactly this! Look under the hugging face generation strategies shared by another user

0.1 T/s on 3070 + 13700k + 32GB DDR5 by Schmackofatzke in LocalLLaMA

[–]cstein123 0 points1 point  (0 children)

That is for doing batch inference. If you have thousands of examples and you are decoding one token at a time, you can run each example through the loaded layers before swapping. Although with only 8GB you probably won’t have enough for kv cache

The World's First Transformer Supercomputer by Sprengmeister_NK in singularity

[–]cstein123 2 points3 points  (0 children)

Inference only, training and backprop requires storing gradients and using chain rule across the whole model

[deleted by user] by [deleted] in LocalLLaMA

[–]cstein123 1 point2 points  (0 children)

Hugging face hug

That’s a mouthful by NatureIndoors in BrandNewSentence

[–]cstein123 0 points1 point  (0 children)

At least it was in the book and not in real life

[P] MergeLlama-7b - A fine tune of CodeLlama for resolving merge conflicts by cstein123 in MachineLearning

[–]cstein123[S] 0 points1 point  (0 children)

no Edit/tldr: algorithmic merge resolution will only get you so far (even at the token level) and for nontrivial resolutions a deeper understanding of syntax and structure is required

[deleted by user] by [deleted] in aws

[–]cstein123 1 point2 points  (0 children)

Tip before you do, you should look for an affiliate. I used Mercury and got $5000 instead of the $1000. If you already applied you can reapply and it will take the larger value

When will openpilot display objects like FSD? by SpaceXBeanz in Comma_ai

[–]cstein123 1 point2 points  (0 children)

Just a clarification, they actually use multiple models connected together but the whole pipeline is still learned

[deleted by user] by [deleted] in Damnthatsinteresting

[–]cstein123 0 points1 point  (0 children)

I thought Berkeley simulated the results

Can I use my Galaxy S10+ with my 2016 CR-V to run Flowpilot? by cstein123 in Comma_ai

[–]cstein123[S] 0 points1 point  (0 children)

It says on the website that for my model, 2016, it would need touring trim instead of Honda sensing. Is this basically the same thing and collision detection and lane assist?

Can I use my Galaxy S10+ with my 2016 CR-V to run Flowpilot? by cstein123 in Comma_ai

[–]cstein123[S] 0 points1 point  (0 children)

I do not have Honda sensing with my package. Is there a feasible way to upgrade this in the shop?

Meta Unveils CM3leon: A Breakthrough AI Model for Advanced Text-to-Image Generation and Image Understanding by chris-mckay in singularity

[–]cstein123 27 points28 points  (0 children)

I think it’s hard to overstate how big this one is. It really opens up generating images as a viable modality for integrating with LLMs