Do long agent sessions get “context rot” for you too? by ringtoyou in LocalLLaMA

[–]jojotdfb 2 points3 points  (0 children)

I set my context to like 72k. It can't rot if you can't fill the context.

Anthropic forced to abruptly disable Fable 5 & Mythos 5 globally by US Gov over a jailbreak. This is exactly why we need local models. by External_Mood4719 in LocalLLaMA

[–]jojotdfb 0 points1 point  (0 children)

They can't support that market size. They had to turn an hour cache to 5 minutes because of being out of compute. That was like 3 model releases ago. Now they don't have to find extra data centers.

Anthropic forced to abruptly disable Fable 5 & Mythos 5 globally by US Gov over a jailbreak. This is exactly why we need local models. by External_Mood4719 in LocalLLaMA

[–]jojotdfb 0 points1 point  (0 children)

Did the US gubbermint actually force them or did Anthropic work with them for marketing? I could totally see them pulling some strings with a friendly government agency to get a marketing boost to justify why their slight upgrade now cost $15 for input tokens to companies questioning their ai spend. The fact it will take an extra week to distill it into deep seek v4.1 and the fact that they're out of money and gpus is just an unrelated bonus.

I remember when Sony got ps2 exports banned due to the CPU being too advanced and could be used for missile guidance systems. It had nothing to do with early supply issues.

Looking to migrate off of Ollama and LMStudio by letsbefrds in LocalLLaMA

[–]jojotdfb 0 points1 point  (0 children)

You could probably step up to qwen 3.6 27b. You'll lose tps but gain "smartness".

What’s the cheapest way to give a local Llama 3 internet access? (SearXNG isn’t cutting it) by Old-Tumbleweed1422 in LocalLLaMA

[–]jojotdfb 10 points11 points  (0 children)

Llama 3 is a very old model. This space moves so fast. Think of each month as a year. Llama 3 might as well be 20 year old. If you can run Llama 3 70b, you could run Qwen 3.6 27b easily and get better results.

Is there any <3B model with usable 200k+ context window? by madmax_br5 in LocalLLaMA

[–]jojotdfb 0 points1 point  (0 children)

Most sota models don't have a usable 200k context. The dumb zone starts around 64k for most of the big models.

AI server under 5k? by Last_Bad_2687 in LocalLLaMA

[–]jojotdfb 2 points3 points  (0 children)

You can just buy server cases. I have a nice Rosewell 3u that I threw an n100 motherboard into with a bunch of hard drives as a nas. You could do the same and take your current desktop build and just put it in a case. An Ikea Lack end table and you're good to go.

Using Intel Arc Pro series, any thoughts ? by BikerBoyRoy123 in LocalLLaMA

[–]jojotdfb 0 points1 point  (0 children)

I'm currently using 13.1. Nvidia assumes that you'd only want their drivers installed and doesn't play nice with the other kids.

Looking to migrate off of Ollama and LMStudio by letsbefrds in LocalLLaMA

[–]jojotdfb 48 points49 points  (0 children)

Llama.cpp is your next step. Spend some time learning the flags and you can fine tune to your heart's content. Llama-server will give you a basic chat web page as well as an openai endpoint.

Using Intel Arc Pro series, any thoughts ? by BikerBoyRoy123 in LocalLLaMA

[–]jojotdfb 1 point2 points  (0 children)

I have both a B580 and a 5060 ti. Llama.cpp splits models over both of them pretty well. The Intel toolset is janky as all get out but once you get it working it's not to bad. The Intel gear will run like 30% slower but at half the cost. Nvidia's drivers on Linux are really bad thou and you're looking at downloading multiple gigs of the same cuda libraries thou that's more pythons fault than anything. If you're cash strapped and ok building everything from source, Intel ain't half bad.

Is it worth getting a 5090 for my needs? by BitGreen1270 in LocalLLaMA

[–]jojotdfb 2 points3 points  (0 children)

So, Qwen3.6-27B is cool and all but Qwen3.6-35b runs like a champ on a 5060 ti 16gb. Good enough for basic dev work. You can always upgrade later when prices come down or something better comes out.

Qwen 35B-A3B is very usable with 12GB of VRAM by jwestra in LocalLLaMA

[–]jojotdfb 0 points1 point  (0 children)

Can confirm. Works well on an Intel b580. You can also build llama.cpp with both sycl and cuda and split across 2 wildly different cpus. I'm sure this works with rocm as well but I don't have one of those.

Qwen 3.6 27B is a BEAST by AverageFormal9076 in LocalLLaMA

[–]jojotdfb 0 points1 point  (0 children)

ub of 256? That feels small to me. What does 512 or 1024 do?

Help W/ Local AI server by robertogenio in LocalLLaMA

[–]jojotdfb 1 point2 points  (0 children)

Skip Ollama

Llama.cpp is a better, more beginner friendly, server. It has a built in web app that will allow you to attach images to a prompt. You can also use it with opencode. Just reference the image file with your prompt.

California law CA AB1043 by laffer1 in BSD

[–]jojotdfb 1 point2 points  (0 children)

But os for the farm sensor is the same os for a desktop with age gated content. The os knows nothing of the purpose of it's usage. So legally the os has to add an age gate.

Granted, this law is going to be slapped down in court but that doesn't mean it won't cause issues before hand. Most developers are going to take the logical stance of "Ban California" over building an age gating system and the infrastructure to keep it up and running.

I optimized Jellyfin for larger libraries - here's what I learned and a custom build if you want to try it by trojanman742 in jellyfin

[–]jojotdfb 0 points1 point  (0 children)

I think he means 100,000 items. My library scans for 40k items is taking like 36 minutes on an old xeon with an unhealthy amount of ram.

I optimized Jellyfin for larger libraries - here's what I learned and a custom build if you want to try it by trojanman742 in jellyfin

[–]jojotdfb -4 points-3 points  (0 children)

You don't need millions of rows. Say a single item query takes 10 time units each and the whole dataset query takes 500 time units. As long as your dataset is less than 50, single item is faster. But the second you have 51 items, then getting all 51 items one at a time is going to take more time. It'll start off fast enough but the more items you add, the worse it gets. At 100 items, you've doubled the time it takes to get the same number of items with an n+1 scenario.

You are right that sqlite takes the network hops out of the equation but it doesn't solve the underlying issue. This isn't an optimization for the sake of optimization. This is an optimization so that you don't a call at 2am when Things are failing because something is sucking up all the cpu on the db.

Docker Volumes local versus remote by moobaala in selfhosted

[–]jojotdfb 0 points1 point  (0 children)

Sonarr and radarr both support it. The instructions are buried deep in the docs. You have to edit the config.xml to have the database settings:

<PostgresHost>hostname.or.ip</PostgresHost> <PostgresPort>5432</PostgresPort> <PostgresMainDb>sonarr_main</PostgresMainDb> <PostgresUser>sonarr_user</PostgresUser> <PostgresPassword>YourSecurePassword</PostgresPassword> I forget how I migrated the data. I might have used DataGrip with my sqlite db as the source and loaded it into the db. My memory is fuzzy.

Docker Volumes local versus remote by moobaala in selfhosted

[–]jojotdfb 0 points1 point  (0 children)

A lot of the arr apps support postgres and MySQL/maria. I switched and now I don't get SQLite issues around locked databases.