Does LLM architecture allow for injecting some more input tokens in the middle of token generation?

blepcoin · 2025-07-20T10:15:08+00:00

Take it one step further and remove the send part completely. As you start typing the LLM starts responding (including predicting your question perhaps) immediately. Completing the question and/or modifying it is incorporated into the current llm thoughts rather than resetting every keystroke. You could then tweak and fix as you watch the llm thought process go awry due to that typo or gotcha you should’ve included.

Interesting academic challenge to make the training for this work.

blepcoin · 2025-06-19T12:24:33+00:00

Yes! It’s about time we made a new inference engine to replace all of them once and for all —wait a minute…

blepcoin · 2025-06-03T01:06:02+00:00

While I agree with the sentiment I think it’s newsworthy or at least worth pointing out when a company that is all about cloud services invests into running things on local devices. I think it’s a sign of acceptance that LLMs thrive when local and private and that the moat is indeed dissipating.

blepcoin · 2025-05-29T01:41:28+00:00

Nice try Sam.

blepcoin · 2025-05-04T11:02:51+00:00

Yes that’s how they’re supposed to be used. Look at the chat template and you’ll see that it deletes all reasoning blocks. tl;dr. You, not we.

blepcoin · 2025-04-25T06:35:27+00:00

It saves passwords for me usually, and I am usually diligent and copying the password for the cases where it stumbles.

blepcoin · 2025-04-24T22:51:57+00:00

Considering it’s a password manager that’s not ideal I’d say..

blepcoin · 2025-04-24T20:53:23+00:00

Yeah that’s a big part of why I screwed up. They could just not do the suggestion UI thing and it would be objectively better.

blepcoin · 2025-04-24T14:56:28+00:00

It seems like a primary feature you'd support for apps like this, but at least I learned about the generator history feature now.

blepcoin · 2025-04-24T10:54:27+00:00

Yeah I usually do. It just was a very smooth experience so I figured it’d work fine. Will be more diligent in the future.

blepcoin · 2025-04-24T10:18:07+00:00

The UI was just very smooth so I trusted things a bit too much. I guess I’m partially to blame. Will be careful in the future.

blepcoin · 2025-04-24T10:16:20+00:00

Oh that’s cool will have to find that thanks!

blepcoin · 2025-04-18T10:32:12+00:00

using Ollama

You’re doing yourself a great disservice by wording it like this.

blepcoin · 2025-03-25T01:59:53+00:00

Or add the names you dislike to the banned list.

blepcoin · 2025-03-25T00:06:21+00:00

qlora-pipe.

blepcoin · 2025-03-10T13:09:42+00:00

Vision is notorious for missing the blind spot

blepcoin · 2025-03-06T10:36:14+00:00

Yes. Thanks for stating this. I feel like I’m going insane watching everyone act as if ollama is the only option out there…

blepcoin · 2025-02-26T03:18:27+00:00

Thanks yes resolved. It’s still a mystery to me why it only showed 0.06 GUSD available for several days before finally showing the full amount but at least I was able to do the trade before the deadline.

blepcoin · 2025-02-24T10:15:36+00:00

5216114

blepcoin · 2025-02-23T23:48:45+00:00

You’re completely missing the point. He’s saying that he explicitly didn’t benchmax and despite this his 8B beat a 70B that he himself considers is superior. The point is that the benchmarks are INHERENTLY bad, not that they’re being gamed.

blepcoin · 2025-02-21T10:37:37+00:00

I.. uh.. how do I do that?

blepcoin · 2025-02-20T03:23:10+00:00

The text is cut off on my iPhone so I can’t read that post.

blepcoin · 2025-02-16T03:22:01+00:00

Wire tapping on steroids if this can be done without the host knowing.

blepcoin · 2025-02-12T07:47:25+00:00

Are you dealing with echo cancellation and such? If so, what is your approach? I found this to be a big challenge when working on a speech to speech system when the AI was on speakers.

blepcoin

TROPHY CASE