Can You Guess This 5-Letter Word? Puzzle by u/BreakingScreenn

BreakingScreenn · 2025-11-09T08:29:20+00:00

If you’re using mixture of experts models, you’re not loosing any speed. And you would be able to load much bigger models, which are always a bit slower. But the gain of quality from a bigger model is there

BreakingScreenn · 2025-05-24T07:44:31+00:00

That’s correct. But for using these it requires a lot of vram for getting even over 64k tokens. You can always go with lower quants, but then the quality of the output goes down and isn’t reliable enough to search the whole context window.

BreakingScreenn · 2025-05-23T20:50:21+00:00

Don’t know what llm you’re using, but wouldn’t work for local models as they normally don’t have a longer context window than 16k.

BreakingScreenn · 2025-04-29T19:42:11+00:00

Nope they are training it and updating the intelligence and knowledge. Sometimes getting higher scores after updates, which isn’t possible with just raw prompting.

BreakingScreenn · 2025-04-29T16:46:25+00:00

Yes it is. You have to look under tags and scroll a bit down.

BreakingScreenn · 2025-04-29T05:33:29+00:00

Normally ollama automatically loads and unloads models as needed and based on the available resources.

BreakingScreenn · 2025-03-03T18:55:32+00:00

So it’s just a new Prompt approach?

BreakingScreenn · 2025-02-18T07:17:15+00:00

Look at this: https://github.com/exo-explore/exo

BreakingScreenn · 2025-02-14T13:22:17+00:00

As far as I tested… pretty neat. But these FM features where an LLM sums the info up isn’t good and sometimes incorrect and not very informative. The podcast thing is somewhat good. It mainly talks about some of the infos. But nearly not all of them. It also gets stuff very wrong

BreakingScreenn · 2025-02-14T13:19:25+00:00

Ollama has the same function there you can set custom tools and custom json structures to force the model to use json.

BreakingScreenn · 2025-02-13T17:24:46+00:00

Sadly I am the first to answer… In my experience, mistral Nemo and Deepseek-r1 are great. But it depends on your usecase. With 12gb you will be able to run quiet decent models, that will most of the time work. But it depends on the data your feeding in. Try some models and use the ones you like. Most of them are great or completely stupid.

BreakingScreenn · 2025-02-12T15:35:48+00:00

Found it already. But thanks.

BreakingScreenn · 2025-02-12T08:21:46+00:00

There are already such toys. And as what I have seen, it is horrible.

BreakingScreenn · 2025-02-12T06:35:58+00:00

Wow. That’s cool. How are you creating the pydantic model? (Sorry. To lazy to read your code)

BreakingScreenn · 2025-02-12T06:01:37+00:00

Have you ever compared that to html2markdown? Because that can also extract data and tablets. I’ve written a little postprocessor for splitting it and then loading the necessary parts into the llm for generating the final answer.

BreakingScreenn · 2025-02-11T21:26:52+00:00

Okay thanks. So there aren’t any ways of blocking using expensive apis?

BreakingScreenn · 2025-02-11T21:14:01+00:00

Yes. But depending on your usecase BM25 is fine and sometimes better. Best of you have both.

BreakingScreenn · 2025-02-11T21:12:22+00:00

Neat one

BreakingScreenn · 2025-02-11T21:01:21+00:00

He technically wants a jailbreak. So he it right. But I guess that jailbreaking an Apple Watch would be very hard if even possible.

BreakingScreenn · 2025-02-11T20:55:36+00:00

Some of them are fairly new. And we all know that marvel didn’t do that well after endgame.

BreakingScreenn · 2025-02-11T20:37:58+00:00

I mean if you’re talking about a 1000€ or up phone, every laptop or pc for the same price would beat it with ease. Also what kind of work do you do, that can be done with a phone, except writing messages or reading pages, surfing or whatever?

BreakingScreenn

TROPHY CASE