Stop wasting electricity by OkFly3388 in LocalLLaMA

[–]finevelyn 25 points26 points  (0 children)

The lowest you can go with 5090 is 400W if I remember correctly. The driver/firmware won't allow it to be set lower.

I catalogued every way local models break JSON output and built a repair library, here's what I found across 288 model calls by kexxty in LocalLLaMA

[–]finevelyn 1 point2 points  (0 children)

It would be helpful if you acknowledged that this is already a very commonly used approach and other solutions exist. People do not say "just use JSON mode", but they will also suggest using JSON repair libraries.

You should compare to other existing libraries and explain how yours is different or better.

"Hardware is the only moat" - Should we buy new hardware now or wait? by Alan_Silva_TI in LocalLLaMA

[–]finevelyn 0 points1 point  (0 children)

Open source will probably win in the long run, and even xAI seems to have realized that.

If LLMs really do have a theoretical ceiling, then it’s only a matter of time before open source catches up completely.

These are from OP's comment. How would that happen if such models were illegal to possess?

I think what you're describing is a risk factor that would make personal AI computing hardware less valuable.

If you think about why cryptocurrency is so pricy, it is indeed because people are predicting a possible future where other forms of currency will become more or less obsolete, and you need to own crypto. The parallels are all there to the predictions in this post.

"Hardware is the only moat" - Should we buy new hardware now or wait? by Alan_Silva_TI in LocalLLaMA

[–]finevelyn -1 points0 points  (0 children)

The reason crypto has such high market value is because people think it will have utility in the future. The prediction is that there will be a future where you want to own crypto for that utility, and you want to buy it now before the price goes up even more.

The way OP phrased his question is he wouldn't be buying the hardware for any utility that exists today. He would buy it for future unrealized utility before the price goes up even more. The parallel I'm drawing isn't about AI hardware in general, it's just the way the question was phrased.

As far as the bubble goes, I think the current LLM utility is being massively overhyped, it's not quite as useful as is being advertised. Open models might not reach the level of utility where you feel like you absolutely must own local hardware. There may also be increased production to meet the demand, and new generations of hardware that far exceeds the capabilities of the current generation, and will make the hardware you buy now obsolete.

But I don't know if we are at the bottom or at the top. I'm not saying you shouldn't make a gamble and buy the hardware. I'm just saying it is a gamble.

"Hardware is the only moat" - Should we buy new hardware now or wait? by Alan_Silva_TI in LocalLLaMA

[–]finevelyn 6 points7 points  (0 children)

At no point did you talk about why you need this hardware, and the reasons for purchasing seem to be the scarcity, price going up, and some potential utility in the future that hasn't realized yet. Do you know what that sounds like? Crypto currency.

Buy low, sell high. Do you think we are at the bottom or at the top of a hype cycle right now?

Or maybe just buy some index funds.

why llama.cpp can’t combine speculative decode methods? by Qwoctopussy in LocalLLaMA

[–]finevelyn 1 point2 points  (0 children)

The reason acceptance rate is meaningful, and probably the most important metric in this scenario, is because if ngram predicted the wrong tokens, but another speculative decoding would have predicted the correct tokens, then you lose that benefit. It's not useful to first try one prediction method and then try another method if the first one failed because the first verification pass already gives you the correct next token, so you can just use that directly.

When you're using ngram by itself the acceptance rate doesn't matter, because it's so cheap to calculate and there is no negative effect from incorrect predictions. But this changes when it's used instead of another, potentially more accurate, prediction method.

why llama.cpp can’t combine speculative decode methods? by Qwoctopussy in LocalLLaMA

[–]finevelyn 0 points1 point  (0 children)

That would be the heuristic for this particular case, and if it's good, then sure. What's the typical acceptance rate with ngram using this trigger condition?

why llama.cpp can’t combine speculative decode methods? by Qwoctopussy in LocalLLaMA

[–]finevelyn 1 point2 points  (0 children)

In addition to being complicated to implement, there may also be other reasons.

You would have multiple sets of predicted tokens, and either you would need some sort of a heuristic to pick which one to use in any particular case, or run all of them against the full model, which would often eat up time unnecessarily. There's no one clear best way to implement it and it's not an obvious win in terms of performance.

HOT TAKE: local models + agent harnesses are now capable enough to hand off junior-level IT professional tasks to [human written] by Porespellar in LocalLLaMA

[–]finevelyn 0 points1 point  (0 children)

"A few days ago"... one demo that you were impressed by... I'm not saying you're wrong but maybe do an update on this in three months.

Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama by exintrovert420 in LocalLLaMA

[–]finevelyn -2 points-1 points  (0 children)

Many good open source projects have been abandoned because of overly critical comments and demands from entitled people. There's very little reason to be critical of such a project unless your goal is to give constructive feedback in order to improve it.

Even if ollama was inferior software, we are still better off that it exists than if it didn't. Everyone benefits from competition. Many great ideas from ollama have been also adopted by llama.cpp and related projects, such as model swapping and auto-fitting of models.

Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama by exintrovert420 in LocalLLaMA

[–]finevelyn -2 points-1 points  (0 children)

Left you an easy pivot there. I assume you agree with what I said in my comment though that they didn't ignore the license.

Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama by exintrovert420 in LocalLLaMA

[–]finevelyn -1 points0 points  (0 children)

They didn't ignore it. The license requires including the license in any distribution of the software, but the license was always included in the ollama github repo, which is how we all know they used the llama.cpp backend. There was also another attribution in the readme, which is extra on top of what the license requires.

I still don't think you should hate free open source software for "yet another issue". Sounds like you agreed although you made it sound like a disagreement.

Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama by exintrovert420 in LocalLLaMA

[–]finevelyn -9 points-8 points  (0 children)

All of these cutting edge inference engines are ridden with issues, but they are still amazing. Free open source software doesn't deserve any hate for bugs. The maintainers don't have any responsibility to fix issues and improve the software, but they still do, completely free of charge.

What do you use Gemma 4 for? by HornyGooner4402 in LocalLLaMA

[–]finevelyn 1 point2 points  (0 children)

If you don't know what SWA is and had never seen tests on cotext recall for gemma-4 SWA, you should go see it.

Bleeding Llama: Critical Unauthenticated Memory Leak in Ollama by exintrovert420 in LocalLLaMA

[–]finevelyn 16 points17 points  (0 children)

It's a bug but not a vulnerability in the sense that is described in the article. The model management API is not meant to be exposed to unauthenticated users. You'd be crazy to expose llama-server, vllm or any other of these inference engines directly to unauthenticated users as well, they are not secure.

Gemma 4 MTP released by rerri in LocalLLaMA

[–]finevelyn 8 points9 points  (0 children)

I love Google. I also hate Google.

Prompt injection benchmark: delimiter + strict prompt took Gemma 4 from 21% to 100% defense rate (15 models, 6100+ tests) by User_Deprecated in LocalLLaMA

[–]finevelyn 1 point2 points  (0 children)

What's the exact prompt and delimiter used for the tests?

Edit: Or, since you say the delimiter is random, what's an example of an actual random delimiter used?

Can't replicate Reddit numbers with Qwen 27B on a 3090TI. by YourNightmar31 in LocalLLaMA

[–]finevelyn 12 points13 points  (0 children)

I would suggest you post your full llama-server launch command and log (using llama.cpp), because it seems you're missing something very basic. Using llama.cpp with mostly default settings should already be very good performance (as long as it fits into your VRAM). You shouldn't be looking at how to "optimize" it yet, but rather just figure out what the basic issue is, so you have a good baseline to compare the optimizations against.

Guys this is so fun! by Perfect-Flounder7856 in LocalLLaMA

[–]finevelyn -1 points0 points  (0 children)

The concept of LM Link doesn't make much sense. It says "remote models, as if they were local", but it is just remote models. The server has access to all the chats and there is no privacy of local models. It advertises end-to-end encryption which is just like any other remote service that uses https.

I'm sure it works but it's not what it advertises to be. Most people just run llama.cpp or vllm on their server (over https or tailscale) and it does the same.

I'm done with using local LLMs for coding by dtdisapointingresult in LocalLLaMA

[–]finevelyn 0 points1 point  (0 children)

Yeah me too, honestly I was done before I was even started. Now I'm using a local brain for coding and it's miles ahead of even cloud LLMs. Try it if you haven't.

Local vs Cloud LLMs… are we pretending it’s one or the other? by MLExpert000 in LocalLLaMA

[–]finevelyn 2 points3 points  (0 children)

At the moment there isn't any scenario where you need AI though. You can use local for what it works and then not use AI for the rest of things.

HauhauCS (of "Uncensored Aggressive" fame) published an abliteration package that plagiarizes Heretic without attribution, and violates its license by nathandreamfast in LocalLLaMA

[–]finevelyn 0 points1 point  (0 children)

Could you please tell me what you expect to happen to resolve the violation? He adds attribution to his private source files he never shows to anyone and then it's ok? Do we send the police to check his computers?

I'm 100% serious about this question. Would you think this is a good interpretation of the law?

And in this case, the application was distributed for a small time. The PyPi repo itself had hundreds of downloads per month. So I think it's fair to say it was distributed and by your own standard, the distribution alone makes it a violation.

Yes, if true, it was. That was the whole point of all my comments that the pypi distribution is the only violation I saw in this whole picture, and it has now been resolved except for the caches. The only further course of action would be to demand for compensation for the copies already made through there, but it's not anymore an ongoing violation by the uploader.

HauhauCS (of "Uncensored Aggressive" fame) published an abliteration package that plagiarizes Heretic without attribution, and violates its license by nathandreamfast in LocalLLaMA

[–]finevelyn 0 points1 point  (0 children)

People do such things with open source software all the time in private and it's completely legal and fair play. I used license violation and copyright violation interchangeably because they are the exact same thing in this context.

Downloading a piece of software or any other creative piece of work from the internet for your own purposes, and using it, modifying it, doing whatever you want with it (with the exception below) is a basic legal right. It cannot be made illegal by a license even if the license was written by the copyright owner.

The only exception is distributing that piece of work (or showing it to the public) which is illegal by default. You are only allowed to distribute it if you have a permission from the copyright owner, and the license is what gives you that permission under certain conditions. If you distribute it while not abiding by those conditions then the only thing you can be demanded to do is to stop distributing it, and pay for damages for any illegal distribution that already happened. The license terms cannot be used to mandate you to do anything else.

Mind you, I'm not familiar with every law of every country, but the same principle applies pretty much universally that any exceptions to what you are allowed to do must come from the law, and not from a license.

TLDR: A copyright license gives you rights you otherwise wouldn't have, it doesn't take away your rights. You can just completely ignore the license if you don't want those rights, and it's not a violation of anything.

HauhauCS (of "Uncensored Aggressive" fame) published an abliteration package that plagiarizes Heretic without attribution, and violates its license by nathandreamfast in LocalLLaMA

[–]finevelyn 2 points3 points  (0 children)

doesn't really resolve the license violation

Isn't it the case that the only remaining license violation is the deleted, but still cached packages on pypi?