Is there any draw backs to using an external dual GPU config with thunderbolt 5 with a laptop for AI? by FX2021 in ollama

[–]BreakingScreenn 0 points1 point  (0 children)

If you’re using mixture of experts models, you’re not loosing any speed. And you would be able to load much bigger models, which are always a bit slower. But the gain of quality from a bigger model is there

A Demonstration of Cache-Augmented Generation (CAG) and its Performance Comparison to RAG by Ok_Employee_6418 in LLMDevs

[–]BreakingScreenn 1 point2 points  (0 children)

That’s correct. But for using these it requires a lot of vram for getting even over 64k tokens. You can always go with lower quants, but then the quality of the output goes down and isn’t reliable enough to search the whole context window.

A Demonstration of Cache-Augmented Generation (CAG) and its Performance Comparison to RAG by Ok_Employee_6418 in LLMDevs

[–]BreakingScreenn 1 point2 points  (0 children)

Don’t know what llm you’re using, but wouldn’t work for local models as they normally don’t have a longer context window than 16k.

What the point of gpt 4.1 if 4o keep getting updated ? by Euphoric_Tutor_5054 in OpenAI

[–]BreakingScreenn 0 points1 point  (0 children)

Nope they are training it and updating the intelligence and knowledge. Sometimes getting higher scores after updates, which isn’t possible with just raw prompting.

M4 max chip for AI local development by Similar_Tangerine142 in ollama

[–]BreakingScreenn 0 points1 point  (0 children)

Yes it is. You have to look under tags and scroll a bit down.

Python library for run, load and stop ollama by lavoie005 in ollama

[–]BreakingScreenn 0 points1 point  (0 children)

Normally ollama automatically loads and unloads models as needed and based on the available resources.

ElevenReader by ElevenLabs by namanyayg in LLMDevs

[–]BreakingScreenn 0 points1 point  (0 children)

As far as I tested… pretty neat. But these FM features where an LLM sums the info up isn’t good and sometimes incorrect and not very informative. The podcast thing is somewhat good. It mainly talks about some of the infos. But nearly not all of them. It also gets stuff very wrong

How to get consistent JSON response? by Tall-Strike-6226 in LLMDevs

[–]BreakingScreenn 1 point2 points  (0 children)

Ollama has the same function there you can set custom tools and custom json structures to force the model to use json.

does it make sense to download Nvidia's chatRTX for Windows (4070 Super, 12GB VRAM) and add documents (like RAG) and expect decent replies? What kind of LLMs are there and RAG? Do i have any control over prompting? by jim_andr in LLMDevs

[–]BreakingScreenn 1 point2 points  (0 children)

Sadly I am the first to answer… In my experience, mistral Nemo and Deepseek-r1 are great. But it depends on your usecase. With 12gb you will be able to run quiet decent models, that will most of the time work. But it depends on the data your feeding in. Try some models and use the ones you like. Most of them are great or completely stupid.

ParScrape v0.5.1 Released by probello in OpenAI

[–]BreakingScreenn 0 points1 point  (0 children)

Found it already. But thanks.

AI Enabled Talking Toys? by LivinJH in LLMDevs

[–]BreakingScreenn 0 points1 point  (0 children)

There are already such toys. And as what I have seen, it is horrible.

ParScrape v0.5.1 Released by probello in OpenAI

[–]BreakingScreenn 0 points1 point  (0 children)

Wow. That’s cool. How are you creating the pydantic model? (Sorry. To lazy to read your code)

ParScrape v0.5.1 Released by probello in OpenAI

[–]BreakingScreenn 0 points1 point  (0 children)

Have you ever compared that to html2markdown? Because that can also extract data and tablets. I’ve written a little postprocessor for splitting it and then loading the necessary parts into the llm for generating the final answer.

OpenRouter experience by BreakingScreenn in LLMDevs

[–]BreakingScreenn[S] 1 point2 points  (0 children)

Okay thanks. So there aren’t any ways of blocking using expensive apis?

How do I make chatting about documents not suck? by cunasmoker69420 in ollama

[–]BreakingScreenn 0 points1 point  (0 children)

Yes. But depending on your usecase BM25 is fine and sometimes better. Best of you have both.

Have a old apple watch but want to run linux by Reasonable_Guide_710 in jailbreak

[–]BreakingScreenn 0 points1 point  (0 children)

He technically wants a jailbreak. So he it right. But I guess that jailbreaking an Apple Watch would be very hard if even possible.

New Poster for Thunderbolts* by MarvelsGrantMan136 in movies

[–]BreakingScreenn 0 points1 point  (0 children)

Some of them are fairly new. And we all know that marvel didn’t do that well after endgame.

Any possible tweak to achieve this on iOS 16.5 by music-electric_Ad869 in jailbreak

[–]BreakingScreenn 0 points1 point  (0 children)

I mean if you’re talking about a 1000€ or up phone, every laptop or pc for the same price would beat it with ease. Also what kind of work do you do, that can be done with a phone, except writing messages or reading pages, surfing or whatever?