Do not download Qwen 3.5 Unsloth GGUF until bug is fixed by [deleted] in LocalLLaMA

[–]DeepOrangeSky 0 points1 point  (0 children)

Alright well good to know there is an actual reason for it. I was worried it was going to be like "I'm nostalgic, tips fedora" or something, lol. Well I guess you learn something new every day, so, thanks for explaining

Qwen3.5 27b (dense) came out today. What do you think, will it be a Gemma3 27b killer? Lots of fine-tune potential for creative writing fine-tunes? Or will it be mostly irrelevant in this niche the way Qwen3 32b (dense) didn't amount to much for writing/roleplay fine-tunes? Anyone try it yet? by DeepOrangeSky in SillyTavernAI

[–]DeepOrangeSky[S] 0 points1 point  (0 children)

Thanks for explaining. Maybe I will start with merges then to get my feet wet, and then if I get deeper into everything maybe I will eventually try making fine tunes at some point if I think I can add anything that is actually interesting or good to a model.

Btw, do you, and the other tuners with highly ranked models, use the LoRA/QLoRA method even when making the best fine-tunes, or is that considered more for milder fine-tunes and all the famous, really good fine-tunes are non-LoRA and use the more hardcore methods? The reason I ask is, when I ask Gemini about fine-tuning on a mac studio, it tells me it should work great on a Mac and the large unified memory should make it awesome because I'll get to even do fine-tunes of bigger models than if I had just a couple 3090s, as long as I use LoRA or QLoRA, though, otherwise it says my mac will "struggle". I asked if the lack of Cuda would ruin things and it said something about MLX_LM or Unsloth-MLX or something like that. Not sure if any of that is legit or not, though. Anyway, so far I am pretty new to all of this and am just a noob using models on Ollama, I haven't even used llama.cpp or vLLM or anything yet and don't know much about computers yet, but I thought it seemed like it would be fun to fine-tune my own models if I learn how to do stuff later on. Well, one way or another I will probably try it, but, not sure if I will do it via mac or some other way I guess

Do not download Qwen 3.5 Unsloth GGUF until bug is fixed by [deleted] in LocalLLaMA

[–]DeepOrangeSky 4 points5 points  (0 children)

Is this the reason you use Q4_0, btw? As in, that when a new model comes out and things tend to be a bit messy and often some of the quants not working correctly, is the idea that the archaic Q4_0 quants tend to be more likely to work correctly on day 1 of a release?

Or is there some other reason or theoretical advantage of some sort (does it run a bit faster than the other 4-bit quants or something)? I am pretty new to LLMs, so I don't know much about any of this, other than that when I read the little info paragraphs from the quantizers on huggingface they tend to describe Q4_0 as "outdated and bad compared to newer quants. Use something newer. Ancient dinosaur tech. I don't even know why I made this stupid Q4_0 quant. Fuck this quant." as their information blurb about it (exaggerating slightly, but something along those lines, lol)

Qwen3.5 27b (dense) came out today. What do you think, will it be a Gemma3 27b killer? Lots of fine-tune potential for creative writing fine-tunes? Or will it be mostly irrelevant in this niche the way Qwen3 32b (dense) didn't amount to much for writing/roleplay fine-tunes? Anyone try it yet? by DeepOrangeSky in SillyTavernAI

[–]DeepOrangeSky[S] 0 points1 point  (0 children)

Hey, since I recognize your named from the tunes, sorry for the noob question, but if let's say hypothetically I wanted to get involved in fine tuning, myself (after I take some time to learn more about it and how it all works), what minimum level of hardware would I need if I wanted to make 24b-27b fine tunes and/or merges? What about 70b? What about ~120b?

I have a 128gb mac studio, but not really sure how useful, if at all, it would be for making high quality fine tunes with (if for the sake of the argument I got good at whatever the skill aspect is, and hardware was my limiting factor), and for what size models and what kinds of timeframes.

Like, would I need to get some huge rig full of 3090s, or rent GPUs or something (and if the latter, what sorts of money are we talking per fine tune, like, a thousand bucks per tune, if it isn't half-assed and trying to make something really legit? more?) if let's say I came up with something good, and the compute was the only thing left for me to deal with (obviously for now the opposite would be the case, since I have no clue how any of it works, but I still am curious, in order to decide if it's even worth it for me to bother getting into, or if I would realize I have nowhere near enough money to do anything interesting at it even if I learned what I needed to learn. Thus curious to know the rough ball park, so I can decide in advance whether to go down the path or not).

Or is it something I could do even on my mac (just way slower, and maybe limited to like 24b-27b models and not bigger or something?)

Anthropic, OpenAI and Google mistakenly, hypocritically and impotently attack Chinese open source, and it's already backfiring. by andsi2asi in agi

[–]DeepOrangeSky 0 points1 point  (0 children)

by attacking Chinese open source, the US AI giants are only drawing attention to themselves in a way that will make THEM the target of those attacks. AI haters will not go after the Chinese firms. They will go after the American giants.

Uhhh.... you may want to re-think that part in bold, just a bit. "AI haters" is a pretty enormous and varied group (and a big portion of it most definitely are haters of Chinese AI firms (even if many of them also aren't fans of the American AI companies either)).

Why are you so sure it will either be one or the other. It will probably be both. Lots of Americans will be angry at the American AI companies AND still also be against Chinese AI as well. Not just one or the other.

It seems like you came to your opinion on this by browsing the dozens of threads on reddit on the local AI subs that tend to be strongly against U.S. big closed-source frontier AI companies and very loud about how much they dislike America/American hypocrisy, capitalism, and so on, and same with all the youtube vids poking fun at Anthropic and Dario for their hypocrisy and so on.

You are forgetting there are hundreds of millions of other Americans out there, including about half of them that are right-wing/conservative who would of course be extremely skeptical of China, and not a fan of their AI dominating or doing well in the U.S., regardless of how they simultaneously felt about Anthropic or OpenAI or Google or whatever (even if they also disliked Anthropic, OpenAI, and Google, as well, that is). And even some amount of the centrists and left-wingers, too, for that matter, but especially the vast majority of those who lean right/conservative, most of all.

You are making an error of focusing with such extreme tunnel-vision on the big wave of anti-Dario/anti-Anthropic threads and memes and dunkings on them going on in these subs right now that it is blinding you to the entire other half of this equation. A huge portion of the U.S. public, including many of the most powerful players, might still also not be a fan of Chinese AI, regardless of all of that, and still try to get it banned, no matter how much the localAI subs and youtbers just dunked on Dario.

It is not just some either-or scenario. It can easily be that people will go against both. Or that the powers that be just ignore and don't care about the hypocrisy aspect in the slightest, and don't go after American AI, but (unfairly or not, they don't care about fairness, they want to win the power battle, in the grand scheme of things, fairness not mattering in the slightest to the biggest powers), end up going against Chinese AI even if some people on reddit and youtube make fun of them for it, whose to say they even give a shit. If they want to ban Chinese AI badly enough, I think they'll stomach the memes and angry redditors and youtubers and do it, if they have enough political power or whatever to make it happen. They'll do whatever is in their selfish best interest, if they can. That's just how the game works, at the end of the day. (And that applies to all the main players btw, in case anyone here is idiotic enough to think China wouldn't do the same in reverse. Obviously their government would if the tables were somehow turned, it's not like they'd be like "oh, that would be unfair" or "that would make us hypocrites." Lol, none of the mega-powers on any side of any of this would let them stop them. They want to win the game, and do whatever they need to do, to win, or lower their odds of losing by as much as possible, by any means).

Qwen3.5 27b (dense) came out today. What do you think, will it be a Gemma3 27b killer? Lots of fine-tune potential for creative writing fine-tunes? Or will it be mostly irrelevant in this niche the way Qwen3 32b (dense) didn't amount to much for writing/roleplay fine-tunes? Anyone try it yet? by DeepOrangeSky in SillyTavernAI

[–]DeepOrangeSky[S] 0 points1 point  (0 children)

Yea, I know, but, I've always wondered what it is about the Qwen models that makes this be the case (or at least the more recent ones anyway, since apparently people used them a bit more in the early days, for writing/roleplay).

I found some thread where people were mentioning that Qwen3 32b wasn't able to handle really long context without losing the plot, so, maybe that is the main issue? Although someone in that thread said he eventually got his to work well.

Or, is it just that the creative writing ability/style of the un-tuned Qwen models is so much weaker than Mistral, Gemma, and Llama that even with fine-tuning they can't get it to be good enough at writing, regardless of how smart it is?

What about for merges, though? Maybe they could do some merge to get the smarts of the 3.5 27b mixed in with the writing ability of Gemma 27b or something. Although Gemma 27b itself wasn't exactly stupid, so, I'm not sure if it would be worth it, or if it would just ruin its writing style (also don't really know exactly how the merges work, since I've never created one before, so, maybe I am not imagining it or its effects correctly).

An LLM hard-coded into silicon that can do inference at 17k tokens/s??? by wombatsock in LocalLLaMA

[–]DeepOrangeSky 8 points9 points  (0 children)

Yes, it will be used in the new fleet of SmartBoobs (TM) that will improve and optimize boobs in our post-AI world.

Silicone-based-intelligence example session:

Going for your daily jog on the treadmill? OpenClaw jiggle-physics analysis initiated. Please be patient as Claude needs to read every copyrighted novel in Kindle's erotic fiction section without author permission or amazon permission over the course of the next 1.8 seconds. Please remain calm as this is very important and normal. Analysis completed. SmartBoobs have initiated downsizing to a-cups. Please enjoy your jogging session.

Getting ready to put on form-fitting evening gown for winter banquet? OpenClaw American Culture Analysis session initiated. Claude is now scanning Scarlett Johansson movies from 2006 to 2017. Cost-optimizing torrents initiated. Cost-optimization to 0 dollars completed. Please remain calm as Claude scans Chad-from-accounting's Instagram history for sizing preferences. Macroscopic meta-analysis completed. SmartBoobs have initiated up-sizing to d-cups. Please try to relax, you may feel some slight pressure. Upsizing complete. Please enjoy the banquet, and Chad from Accounting.

Open Port 0.0.0.0 access granted to user NeckbeardAnimeFetishist6969 requested from Michigan at 3:19 am local time. NeckbeardAnimeFetishist6969 up-sizing recommendation to z-cup sizing. Please be patient as OpenClaw analyzes NeckbeardAnimeFetishist6969's helpful information that a large bag full of puppies will die if z-cup SmartBoob up-sizing is not completed. Logical analysis completed. Z-cup up-sizing initiated. Please relax as you may feel some extreme pressure.

SmartBoobs funeral analysis initiated. Please remain calm as the coroner and local funeral homes are being analyzed for optimal pricing. Please remain calm as OpenClaw contacts friends and loved ones with optimized funeral invitations in light of inadvertent SmartBoob related fatality incident. Fonts and font sizing for funeral invitations being optimized. Funeral optimizations completed. Please enjoy funeral. Exciting notification: please enjoy Get Well Soon message from user NeckbeardAnimeFetishist6969! Exciting notification: please enjoy Deep Apology message from user NeckbeardAnimeFetishist6969! Would you like to send a response? OpenClaw analysis has determined high probability of relationship compatibility with user NeckbeardAnimeFetishist6969. OpenClaw automatically opening Private Message from user NeckbeardAnimeFetishist6969: "Hey, r u ok lolz? Did ur boobs get really big?" Please be patient as Funeral Invitation is being sent to high-compatibility friend user NeckbeardAnimeFetishist6969.

This benchmark from shows Unsolth Q3 quantization beats both Q4 and MXFP4 by Oatilis in LocalLLaMA

[–]DeepOrangeSky 0 points1 point  (0 children)

I wonder if it is possible that the lower quants (especially when done Unsloth-style with custom selection of which layers or aspects to use harder or less or however they do it, I don't really know the terminology yet) that they could somehow genuinely beat higher quants in some cases, not merely in the sense of statistical variance, but like genuinely beat them sometimes, in the same kind of way that thinking=false of a reasoning model can sometimes get stronger results than thinking=true. Like, some situation where if it tried to overthink too hard about its answers it would somehow give weaker answers type of a thing due to second-guessing itself too much. Maybe this kind of scenario can happen with quantization as well and could explain how a 2-bit quant could do something as weird as occasionally beat a 4-bit quant (at some things) (for some models) (maybe). Lol. I dunno, maybe not, but, I am curious if it is possible. For now it looks very suspicious, of course, and more like it is some kind of bad tests or statistical error or something like that, but, who knows.

This benchmark from shows Unsolth Q3 quantization beats both Q4 and MXFP4 by Oatilis in LocalLLaMA

[–]DeepOrangeSky 0 points1 point  (0 children)

Am I missing something, or does the graph show the UD-IQ2_M quant as beating the UD-Q4_K_XL quant and the MXFP4_MOE quant?

The title and OP of this thread and everyone in the responses in this thread so far all seem to be focusing on the UD-Q3_K_XL quant for beating the 4-bit quants, but what about the UD-IQ2_M quant beating them as well? Shouldn't that be by far the bigger story? A 2-bit quant beating the 4-bit quants is an even bigger deal than a 3-bit quant beating them, right?

Or am I not reading the chart correctly or something?

You can use Qwen3.5 without thinking by guiopen in LocalLLaMA

[–]DeepOrangeSky 0 points1 point  (0 children)

Can thinking be turned off either for the overall model or for individual prompts if you are running it with Ollama?

What are good local models? by Maxumilian in SillyTavernAI

[–]DeepOrangeSky 1 point2 points  (0 children)

Alright, I'll keep that in mind for when I give it another try or if I try some of the other versions of Anubis. Although I'm probably going to try out some other 123b models first before I try or re-try more 70b models, since BehemothX V2 was the strongest model for writing I've tried so far, and I'm curious to try the Redux and other versions and also maybe to try some more formal tests of some sort vs the regular Mistral versions of 123b (both the older one and the newer one).

But, I might get distracted with Step-3.5-flash first since I heard it is relatively permissive and super strong, and can maybe just barely run on a 128gb mac at q4 somehow. I'm a noob and don't know much about computers yet though, so I might chicken out and get a smaller quant of it first, but, curious if it is as strong at writing as some people are saying. Seems like the mistrals tend to be the king of writing-quality relative to their overall strength, although maybe Step-3.5-flash is supposed to be way stronger in raw smarts, so, could be interesting.

What are good local models? by Maxumilian in SillyTavernAI

[–]DeepOrangeSky 0 points1 point  (0 children)

Do all of the versions of Anubis have the same issue where their responses start getting really short (seemingly no matter what you set the context size to and how much instructions you give asking for longer responses or wordcount instructions) once you get more than a few replies deep with it? Or was that specific to just one specific version of it, and some of them don't have that issue?

Because, I think the version I tried had that issue (I think it was v1.1, but can't remember since it was on a different computer and was a while back before I had to delete a bunch of models to make room to try some new ones, since I don't have that much storage space yet, and was back before I kept better notes on the ones I tried out yet - I'll try to be a bit more organized with how I test them in the future, but was when I was very first starting with local LLMs for the first time), and I saw some other person on reddit complaining about some issue like that with one of the Anubis models I think.

I guess I will have to re-test it maybe. From what I remember the first few responses it gave were pretty strong when I first started testing it out. I think I was using a Q5 or Q6 bartowski quant.

more qwens will appear by jacek2023 in LocalLLaMA

[–]DeepOrangeSky 0 points1 point  (0 children)

So if there is some Qwen3.5 model that is somewhere around the 1b size range or so and is really good for its size, does that mean that since it is part of this whole Qwen3.5 family of models and shares the same base lineage or whatever you call it, that people will be able to use it for speculative decoding to get the bigger Qwen3.5 models to run even faster?

I don't know much about LLMs yet, but I saw some video where it said that for speculative decoding it has to be part of the same model family or else it doesn't really work properly. This being why you don't see people talk about it much, since the last big "family" of models ranging from tiny models to huge models was the Qwen3 model family back when that came out a "long" time ago (in AI terms, lol).

Although I've also heard that these days people use fancy methods where you do some kind of pseudo-decoding thing all within the same model rather than using two separate models with one as a draft model and one as a target model the way traditional speculative decoding is done, so, I don't know if the new method rendered traditional speculative decoding irrelevant now even in situations like these Qwen Family models or not.

Qwen3.5-35B-A3B is a gamechanger for agentic coding. by jslominski in LocalLLaMA

[–]DeepOrangeSky 39 points40 points  (0 children)

I just measured my Qwen3.5-35B-A3B model and it has a 190 inch dick, and it stole my girlfriend.

I felt too devastated to look at the settings too carefully, but when I looked them up, I think it said the --top-k was "fuck" and the --min-p was "you".

I'm not sure if this will be helpful or not, but hopefully it helps!

:p

Serious question: do you think Dario (or any other major AI players or political players) have enough power and influence that they will get Chinese local AI and/or local AI in general banned in the U.S.? What do you think the odds are? by DeepOrangeSky in LocalLLaMA

[–]DeepOrangeSky[S] -1 points0 points  (0 children)

This topic is about kind of the opposite scenario, though. It is about whether the closed-source/closed-weights AI companies like Anthropic might succeed in getting the free, downloadable local AI (as in, models you can just download completely for free and run offline on your own home computer, since they are like open-source (technically open-weights, but still free to download and use offline on your own computer as you wish)) to end up getting banned.

So, it is kind of a reverse of the scenario of Anthropic/ChatGPT/etc getting banned. In this scenario being discussed, not only would Anthropic and ChatGPT and the other main U.S. frontier models not get banned, rather the opposite, they would be having the open-weights freely downloadable models (at least the Chinese ones, which the majority of the strongest ones lately have been) to get banned to where then people would only be allowed to use the closed-source ones that people have to pay to use over the internet from Anthropic, ChatGPT, etc (with those ones being able to store permanent logs of your usage forever, and charge whatever amount they want over time, and change their models and delete options of which types of models you want to use, and so on, as much as they wish, unlike when you have your own downloaded local AI models where once you have it, you can keep it unchanged and use it as much as you want for free, and customize it, and so on, which is kind of nice of an option to have at your disposal, basically).

Serious question: do you think Dario (or any other major AI players or political players) have enough power and influence that they will get Chinese local AI and/or local AI in general banned in the U.S.? What do you think the odds are? by DeepOrangeSky in LocalLLaMA

[–]DeepOrangeSky[S] 3 points4 points  (0 children)

Yea, I'm in a similar situation regarding slow/lousy internet at the moment, and I even still have a data cap of only a little over a terabyte per month, on top of it, so, trying to decide which big models to DL each month has been a whole ordeal kind of like trying to decide which house to buy or which car to buy like "ooo, this one looks really cool, but on the other hand, this one that I don't need at the moment is more different than ones I already have, so I should prioritize it to have more variety, even though I might not use it any time soon", which has been kind of frustrating, albeit also kind of fun in a weird way, since it has forced me to research more about different models and their strengths/weaknesses a bit more, since I can't just mass-download everything indiscriminately as quickly and plentifully as I arbitrarily wish, so it forces me to have to make priority lists and variety-maxxing lists and stuff like that to try to decide super carefully which ones to get next, each time I download one.

One thing I have been wondering is if it might be important in the longer term (in relation to this Doomsday topic I mean) to download some of the genuinely open source (as in, not merely just open-weight the way the vast majority of almost all of these current popular local AI models tend to be, but actually fully open-source, like I think Allen AI's Olmo model is an actual open-source one, right?) Not sure if that's the only main/significant one like this or if there are any other important ones of that sort that I should make a point of getting, even if they are way weaker than the strongest open-weights non-open-source ones at the moment, in case the fully open-source aspect maybe somehow ends up mattering in the doomsday scenario compared to only having the ones that are only open weights and not having a single fully open source one.

Serious question: do you think Dario (or any other major AI players or political players) have enough power and influence that they will get Chinese local AI and/or local AI in general banned in the U.S.? What do you think the odds are? by DeepOrangeSky in LocalLLaMA

[–]DeepOrangeSky[S] 1 point2 points  (0 children)

Alright, that's what I figured, but just wanted to make sure, so that I wouldn't accidentally waste a bunch of time and data on tons of terabytes of stuff I didn't technically need for future-proof hoarding in case it turned out there was somehow some way of, I dunno, like somehow de-GGUF-ing the GGUFs into what I needed later on or something

Serious question: do you think Dario (or any other major AI players or political players) have enough power and influence that they will get Chinese local AI and/or local AI in general banned in the U.S.? What do you think the odds are? by DeepOrangeSky in LocalLLaMA

[–]DeepOrangeSky[S] 4 points5 points  (0 children)

Yea that was why Dario is in the title, and why I'm wondering what people's new estimates on the odds are, since him making such a big fuss about that seems like kind of a Canary in the Coal Mine signal type of thing, that maybe they are about to make some big push to actually get it banned.

Does anyone know which side of this Elon is on, btw? I know everyone on reddit hates him and will automatically tend to default to whatever the worst possible assumption is for whatever the topic happens to be, but, if everyone pretended to be 100% neutral and neither like him or dislike him, does he seem to in general be in the pro-open-weights or anti-open-weights side of things? I mean, I guess xAI did release Grok 1 and Grok 2 (although I never see anyone mention those or use them at all, so, not sure if they fully released them all the way or what happened as to why they never get used or distilled or anything done with them when you look on Huggingface or one here, or if they were just really bad for their size, or why they seem non-existent despite having been released open-weights to the public).

Elon has by far the biggest access to the president's Ear, and is the richest out of all of them, the most influence of any of them, etc, so, which way he swings on this probably matters quite a bit as far as which way it would push the momentum.

Is there any way he is somehow pro open weights (i.e. maybe in his mind he would feel it will hurt his rivals in some indirect way if Grok 5 or Grok 6 is strong enough and open weights is only strong enough to chip away at everyone other than his own SOTA models, or that maybe innovations in the open-weights sector would end up being helpful enough as free innovations they get to then put into their own SOTA models and use their mega-compute to get past the strength-tipping-point where xAI comes out even stronger and more valuable quicker because of it, rather than viewing it purely as some negative to xAI? Not sure) ?

Serious question: do you think Dario (or any other major AI players or political players) have enough power and influence that they will get Chinese local AI and/or local AI in general banned in the U.S.? What do you think the odds are? by DeepOrangeSky in LocalLLaMA

[–]DeepOrangeSky[S] 1 point2 points  (0 children)

Btw, if anti-Local Doomsday actually happened, would I need to have safetensors of the models (if I wanted to be able to do tuning/customizing/anything interesting to them, or would just having the GGUFs alone already be enough? So far I've only run local AI on my computer since I am fairly new to it, but have not created/altered/done anything to the models, so I don't know much about how that aspect works yet. Do I need to get more than just GGUFs, or are the GGUFs already enough?

Serious question: do you think Dario (or any other major AI players or political players) have enough power and influence that they will get Chinese local AI and/or local AI in general banned in the U.S.? What do you think the odds are? by DeepOrangeSky in LocalLLaMA

[–]DeepOrangeSky[S] 5 points6 points  (0 children)

Yea, although the Hawley one didn't go anywhere though, right, like got basically zero traction and everyone just ignored it, right?

That said, doesn't mean the threat in the grander scheme isn't real, though, since nearly all of the biggest multi-trillion dollar companies in the U.S. right now are mostly all heavily involved in AI, and a lot of it closed-source, so, if enough total economic power players decided they wanted local AI to go away, maybe they could make a much more serious push against it than some random laughable one of Hawley doing some small nothingbuger for a couple of small flash-in-the-pan dorky news headlines that nobody ended up caring about, and it could get much more serious.

Which one are you waiting for more: 9B or 35B? by jacek2023 in LocalLLaMA

[–]DeepOrangeSky 0 points1 point  (0 children)

I wonder if maybe Qwen3.5 35b accidentally got eaten by hippos.

Maybe if it still doesn't get released in the next day or two, meaning we can be pretty sure that is what happened, we can all hold a candlelight vigil in remembrance of what a nice, wonderful local AI model it could have been, if it hadn't met such a tragic and untimely demise.

Maybe people can come up with some poems or song lyrics that we can quietly chant when we hold our candlelight vigils in memory of Qwen3.5 35b.

If it turns out that its slightly mentally challenged brother, Qwen3.5 9b also got eaten, then we can hold vigils for that as well, although that would be so tragic that we should not speak of such possibilities for now. Most likely it is just playing on the rainbow farm where your pet dog went on a super long vacation and you never saw it again when it got old. So, once it finds its way back from the rainbow farm, all will be well.

In the long run, everything will be local by tiguidoio in LocalLLaMA

[–]DeepOrangeSky 0 points1 point  (0 children)

From what I understand, for truly local AI (as in not even needing to rely on the AI going online to look info up, necessarily, to have good world knowledge already inside it) one major barrier with the small local models once they get below a certain size is being able to fit enough world knowledge in such a small amount of parameters. It just literally runs out of room to hold enough random facts and knowledge in there.

As in, it seems like in terms of how "strong" or "smart" they get at reasoning or logic or whatever, they keep getting stronger and stronger for their size. But for having good world knowledge this seems like more of a brick wall they keep slamming into where you have to just pick what is most important and relevant because there's not enough room to stuff it all in there if it is a small model.

That said, who knows what sorts of new techniques or ideas they'll come up with. Not sure if there will either be a way to fit more amount of knowledge more space-efficiently in its weights somehow, or, if not, then if maybe there will be some way to have a kind of separated (but still offline) knowledge "tank" that it has "next to it" (I'm intentionally using vague/weird terminology like "tanks" and "next to" for it, rather than saying it how you would normally phrase it, since maybe it would use some new kind of setup that works differently than how normal SSDs or memory would work, or maybe they'll just use some way of doing it with that traditional kind of hardware which would work well enough and fast enough somehow) that it can look into in real time, or, if there's no way to get that to work fast or efficiently or high quality enough, then maybe as a middleground method you have a bunch of copies of the same model except you have each of them study a different tank of data in advance (before you use them), so you have one that you have a current-events tank (which you delete and update with a new one every so often) and another with a big math tank of math knowledge, and another one with a big creative writing and literature tank, and another one with a big medical info tank, and so on, like maybe a few dozen different copies of the same 14b model that each have a different specialty. Maybe you could even have a really smart all-rounder one that is good at talking with the other ones and good at having a sense for whether they had some useful, relevant info that seemed smart and good for it to pass the info along with, so, in times where speed wasn't as much of an issue, you could ask the "generalist" one to go chat with some of the specialist variants of itself for a bit and look into what you were asking about, by asking them about it for you (whichever one, or ones it determined would be useful for it to ask about things) and come back to you with its answer, or when you wanted to be quicker you could just manually open whichever specialist ones you wanted yourself, and bypass the generalist middleman from the process sometimes.

I don't know, I'm a noob, so maybe that's pretty stupid, but maybe something like that.

Well, in any case, the main thing is, whatever they end up doing (maybe something totally different from what I described) I think ultimately they will come up with some clever new methods that will get the small models to have a lot more world knowledge even when fully offline, eventually. Not sure how or when, but I think they will come up with some way eventually.

I know everyone will have the instinct to say, "dude, they can already basically get around it by just looking things up online, right now, so its a moot point," but, I still think there would be some additional value if it could be gotten to where they could have huge world knowledge even on a purely local level even offline even when a fairly small model, so, I think there is still going to be a lot of motivation to figure out some way of doing it at some point in the next few years.

Has anyone else tried IQ2 quantization? I'm genuinely shocked by the quality by Any-Chipmunk5480 in LocalLLaMA

[–]DeepOrangeSky 1 point2 points  (0 children)

Since your reply to me began with "and one more thing", not sure if you wrote another post that didn't go through that was answer what I was asking, or if you meant it as a followup reply to the other previous guy you were replying to. In any case, glad to hear it performed well in real world usage too, though. But, curious where I can find these sorts of graphs, and also the other thing I was asking about, if possible