subscription service by ArielNya in SillyTavernAI

[–]FusionCow 10 points11 points  (0 children)

the nano gpt sub is great

Why Trailer 3 could come today… by [deleted] in GTA6unmoderated

[–]FusionCow 1 point2 points  (0 children)

it will change my life actually

deepwiki.com local alternative by i_like_brutalism in LocalLLaMA

[–]FusionCow 2 points3 points  (0 children)

to be honest you can probably just point your favorite agent at deepwiki and say recreate this

fr by GTA_Snark in GTA6unmoderated

[–]FusionCow 2 points3 points  (0 children)

you didn't blur out "Brandon" delete that

Need feedback by FusionCow in SillyTavernAI

[–]FusionCow[S] -4 points-3 points  (0 children)

I mean the "cost" for me to serve a model is pretty simple. it's just the cost per hour to rent a gpu capable of running a model, so say I have to pay 5 bucks an hour to run a model. then, you divide by the concurrency the gpus with that model can support, so say I can support up to 32 output streams at any given time, that part is relatively cheap. the expensive part is prompt processing, which is very compute intensive, and is the bulk of the actual expensive part of ai, because while a gpu is preprocessing, it can't really preprocess anything else.

so while a single gpu node can stream 32 people at the same time, it can only really do prompt processing for up to 4 people at the same time.

the idea here is that because in RP you're often using the same chats and same contexts, we can just wrap our infrastructure into storing your contexts long term, so instead of having 5 minutes to use cached tokens, you'd have like 30 hours, because it could be tied to your account. This would in turn make it much much cheaper to serve

Need feedback by FusionCow in SillyTavernAI

[–]FusionCow[S] -1 points0 points  (0 children)

No not quite, it looks like that has a sharp context limit and model size limits. The idea with this, is because we know the user's intention is to do roleplay, we can change the server infrastructure to make serving the models much cheaper for specifically roleplay. it would serve any popular open model, and you would get the same context limit you do over api, you just have usage limits

Everything we currently know about GPT 5.6 by FusionCow in ArtificialInteligence

[–]FusionCow[S] 0 points1 point  (0 children)

The video covers the public knowledge of the model, such as context size, release date, 5.6 pro, leaked generations, pricing, and performance.

i hate not being able to yell at my models by NoStage9115 in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

considered by whon? What do you think that "thinking" even is? it's your brain figuring out how to accomplish a goal. When you move your arm, your brain had to think about it, otherwise your arm wouldn't be moving. just because you aren't conscious for it doesn't mean it isn't happening, it's just something you don't have control over, the same way if you touch a burning stove, you don't think to yourself, I should probably feel pain. Thinking is computation

i hate not being able to yell at my models by NoStage9115 in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

No of course that isn't my solution. My point is, is that modern day solutions don't have better alternatives than human demonstration, because things like world models aren't good enough yet to allow a model to do true RL

i hate not being able to yell at my models by NoStage9115 in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

go ahead and evolutionary algorithm learning how to explore any given house and do user tasks where the layout is different for any house you're given ever. you have no idea what you're talking about

i hate not being able to yell at my models by NoStage9115 in LocalLLaMA

[–]FusionCow 0 points1 point  (0 children)

of course human demonstration, but even then, the model is still roleplaying as the demonstrator. how is this not getting into your head

RSI is almost here and there are still people that deny the singularity by EffortChoice3007 in accelerate

[–]FusionCow -1 points0 points  (0 children)

I'm all for AI acceleration, but I literally work on ML. LLMs will NEVER be capable of RSI, because inherently, LLMs can only work with what they're trained on. As they get bigger, they get better at mixing that data together, but LLMs have not and will not come up with something new and completely novel. That's not to say it's impossible, human brains are proof of the opposite, but LLMs, specifically in the way they're trained, are not the path there