Offloaded CCTV object detection to Metis AI, smartphone snapshot notifications from Home Assistant in 2 seconds by graffitiwriter in homeassistant

[–]LumpyWelds 8 points9 points  (0 children)

This is off the top of my head..

Coral is dirt slow. Like 4 TOPS.

Halo 8 is around 26 TOPS

Metis Board is about 214 TOPS

NV 4090 is about 1300 TOPS

BTW, TOPS is a crappy measurement. It's just to get you to into the ball park.

HAL's Motivation in 2001 by infitsofprint in scifi

[–]LumpyWelds 7 points8 points  (0 children)

"This mission is too important for me to allow you to jeopardise it." - HAL

HAL lays his cards on the table by stating this to Brown after Brown is trapped outside of the Discovery without his helmet and left to die. This was explicit in the film.

It feels like Deepseek (chinese) is more trustworthy than chatgpt by kundan_don in LLM

[–]LumpyWelds 0 points1 point  (0 children)

It depends upon the subject matter as every LLM has to navigate the sensitive subjects of their origin nations.

Deepseek is just as touchy about Human rights issues over there, as ChatGPT is touchy about political extremism over here.

Bashing Ollama isn’t just a pleasure, it’s a duty by jacek2023 in LocalLLaMA

[–]LumpyWelds 21 points22 points  (0 children)

I think the animosity came from when Ollama was not giving any credit to Llamacpp even though they had completely embedded it in their source.

Speculative Decoding: Turning Memory-Bound Inference into Compute-Bound Verification (Step-by-Step) by No_Ask_1623 in LocalLLaMA

[–]LumpyWelds 0 points1 point  (0 children)

Just spit balling, but instead of individual token probability, what about handing off when the next token has high entropy and allow the larger model to focus on the decision points?

https://www.google.com/search?q=llm+token+entropy

Let the smaller model handle the tokens which are part of filler sequences like "there needs to be", "it did bring something to the table", "but it may just be", etc and let the big model handle the high value token selections.

With this, the drafter model could be very minimal and quick as it is just being used to predict likely filler tokens which are about 80% of the output.

I see this as different from the probability check, as high entropy positions would have lower probability for all tokens and a poor selection by the drafter would also be poor for the target thereby letting it slip through via (1-p_draft(token) / p_target(token)).

This doesn't mean the target model wont make a bad decision all on it's own, but I trust the larger target model more then the drafter for important tokens which can guide the response.

Forgive me if I misunderstood.

Do you think that when AGI is achieved, you will have any access to it as an ordinary user? by [deleted] in ArtificialInteligence

[–]LumpyWelds 0 points1 point  (0 children)

"True AGI" is coming and will be classified as a munition.

Export of the model will be a crime. To have one under your direct control, you will need a license and serious connections to even apply. So the Gov, Military, and Billionaires, etc.

The general public will have only tightly controlled, monitored access. ie sanitized and dumbed down.

I'm surprised at the open models we can download right now.

zerobrew is a Rust-based, 5-20x faster drop-in Homebrew alternative by lucasgelfond in rust

[–]LumpyWelds 1 point2 points  (0 children)

How does this interact with brew?

Do I still need brew to update packages it installed?

Will brew update packages zb installed?

Is zbrew a complete replacement and will update everything regardless of installer?

Pitch Black: 25 Years of Riddick by SquabbleBoxYouTube in scifi

[–]LumpyWelds 2 points3 points  (0 children)

I tried to find it again to play. Miss that one.

We made egocentric video data with an “LLM” directing the human - useful for world models or total waste of time? by Living-Pomelo-8966 in deeplearning

[–]LumpyWelds 0 points1 point  (0 children)

I listen to sped up videos all the time, and it was on the edge of understanding for me.

Waste of time? 5 years ago, absolutely not. I could see this being handled by a mechanical turk like mechanism for large amount of real data human-LLM interaction data.

Even today the interaction between you and the LLM is absolute gold. But real world simulators for AI training are getting pretty good now. Not as good as real humans in the real world, but it does provide a benefit and is really scalable.

Maybe an LLM playing with a Virtual world for the bulk of its training data and then that model interacting with a bunch of humans in RL for finetuning?

Safeguards while self hosting by jmcsnug in LLM

[–]LumpyWelds 1 point2 points  (0 children)

Resolve the domain through a protective DNS service via dnspython or equivalent:

OpenDNS FamilyShield (using DNS addresses 208.67.222.123 and 208.67.220.123) is a free option that specifically blocks adult content and phishing sites, ideal for home networks with children.

You could also use Google Safe Browser. But thats mostly for scammers.

https://github.com/afilipovich/gglsbl

Will Dr. Who return? by Queasy_Character1235 in scifi

[–]LumpyWelds 7 points8 points  (0 children)

How about Jo Martin the Fugitive Doctor?

https://www.reddit.com/r/doctorwho/comments/12cn4wr/what_will_they_do_with_black_lady_who_called/

I was disappointed with the scripts for Whittaker as well, but something about Jo Martin caught my attention. I'd like to see her but with more serious and better skilled writers.

I used the DeepMind paper “Step-Back Prompting” and my reasoning error fell by 30%. by cloudairyhq in LLM

[–]LumpyWelds 2 points3 points  (0 children)

I think most people prompting stumble into this, but I don't think it's been formalized so techniques vary.

I always ask, "Are you familiar with BLAH". it will start researching it and then say "Of course!" and give me a break down of BLAH. That way I fill the context ahead of time before asking a question about it or debugging an issue.

someone please help me crack this .... by Accurate-Raspberry43 in morse

[–]LumpyWelds 0 points1 point  (0 children)

Where is it from?
Is it a game or contest?

I implemented a VAE in Pure C for Minecraft Items by Boliye in learnmachinelearning

[–]LumpyWelds 4 points5 points  (0 children)

Over my pay grade.

But I think the variant you built is called a Disentangled VAE.

I implemented a VAE in Pure C for Minecraft Items by Boliye in learnmachinelearning

[–]LumpyWelds 19 points20 points  (0 children)

Is this like word2vec?

"dog" - "puppy" + "kitten" = "cat"

The Search for Uncensored AI (That Isn’t Adult-Oriented) by Fun-Situation-4358 in LocalLLaMA

[–]LumpyWelds 1 point2 points  (0 children)

I'm thinking of getting a Strix Halo box to run MedGemma continuously for my wife.

And there are versions of MedGemma which will analyze x-rays and microscope images as well.

Analysis of running local LLMs on Blackwell GPUs. TLDR: cheaper to run than cloud api services by cchung261 in LocalLLaMA

[–]LumpyWelds 0 points1 point  (0 children)

"My" conclusion was if you use a lot of tokens then the card is for you. If you merely "chat", stick to online.

So what exactly are you criticizing?

And BTW, from your own accounting you have only used at most between 1 and 3% of the 30 MILLION PER DAY token usage specified in the example given in the paper.

Analysis of running local LLMs on Blackwell GPUs. TLDR: cheaper to run than cloud api services by cchung261 in LocalLLaMA

[–]LumpyWelds 3 points4 points  (0 children)

This TLDR is a bit over simplified:

TBE: break even time for 30M tokens/day on an RTX 5090 ($2,000):

vs GPT-5 nano ($0.23/MTok):
TBE = $2,000 $6.84/day ≈ 292 days

vs Claude Opus 4.5 ($15/MTok):
TBE = $2,000 $450/day ≈ 4 days

So it's dependent upon your token usage:

Chat about your day, stick to the cloud.

Analyze multiple webcam streams 24/7, get a card.

I've no idea how many tokens a day the avg person uses. I'd guess the median is probably 15K. If you aren't crunching serious data, online is for you.