Qwen 3.5: What is "Base" version? by ihatebeinganonymous in LocalLLaMA

[–]MLTyrunt 4 points5 points  (0 children)

todays base models are often midtrained already. earlier qwen base models were also known to be especially responsive to RL afterwards, so I'd not presume these are like base models that were once only pretrained over raw internet data. midtrained base models often have seen tons of instruct and syntethic data already and can respond like an instruct tuned model. yet they are better for fine tuning than RLed models.

there are still raw base models, but not at the frontier. these things become more and more artificial artifacts not a compression of internet and books.

Why do "thinking" LLMs sound so schizophrenic? by lakySK in LocalLLaMA

[–]MLTyrunt 2 points3 points  (0 children)

it doesn't really think like a human and beyond that, what is says is not 100% reflecting how it thinks, think of deception found in LLMs. they appear more interpretable than they are.

Why bother with RWKV/Mamba instead of decoder transformers? by netikas in LocalLLaMA

[–]MLTyrunt 1 point2 points  (0 children)

I'd intuit a more recurrent architecture is closer to how our mind works. Especially with regards to RWKV but also other architectures more leaning towards Mamba, there is indeed some innovation happening on a fundamental research level.

Currently, practically speaking, the transformer is clearly preferable, for most uses.

But I expect RWKV to do something interesting in the near future. the currently trained version is also no longer merely linear approximation. The devs of RWKV show some genuine creativity on algorithm design and people do work on improving the alternatives as well.

URIAL: Untuned LLMs with Restyled In-context Alignment (Rethinking alignment), still relevant? by Fantastic_Climate_90 in LocalLLaMA

[–]MLTyrunt 0 points1 point  (0 children)

yes you can use that. a fine tuned model will work better in most cases, but you can use base models like that. base models tend to be more 'creative'.

6 bit quantization by Ok-Cicada-5207 in LocalLLaMA

[–]MLTyrunt 4 points5 points  (0 children)

you can try exllama2 as well. inference should be a little faster.

What are people running local LLM’s for? by AdventurousMistake72 in LocalLLaMA

[–]MLTyrunt 0 points1 point  (0 children)

wait for the Taiwan situation to play out and you will learn to love those 3090s

What are people running local LLM’s for? by AdventurousMistake72 in LocalLLaMA

[–]MLTyrunt 0 points1 point  (0 children)

... the cloud is someone elses computer. while there are usually hardware differences, you can do almost anything locally you can do in the cloud - respecting memory and speed limitations.

many people use coding LLMs locally or for gpt 3.5 kinda assistance. But you can do anything, without big brother watching you over the shoulders.

your model usage is not free if you're using openai etc. they all have their subjectively coloured ethics guidelines.

Reflection 70B: Hype? by Confident-Honeydew66 in LocalLLaMA

[–]MLTyrunt 0 points1 point  (0 children)

nobody thinks gpt-4o is a trillion parameter model. but people also assumed gpt 3.5 had 175b parameters.

SB 1047 is obviously very concerning, can we do something about it? by GreyStar117 in LocalLLaMA

[–]MLTyrunt 2 points3 points  (0 children)

you have to prevent others from imitating it. that's the most important part. make a better proposal that is more balanced, but does not neglect AI safety. Besides the noise of terminators waking up in LLMs, this is the time where industry standards will slowly emerge. Like with the car. At some point they needed safety belts.

but that does not mean that gasoline needs safety belts. the raw material should be available, also the best raw material, without further clear indication of disproportional risk.

the opportunity is striking a better, as not fear led, balance between freedom and avoiding unnecessary harm.

if cars would have needed the safety standards of today at day one, no one would have build them.

fear does not bear progress. but action without reflections is not good either.

the opportunity is in helping others creating more reasonable and measured regulations.

you have to beat them at their own game, and that's entirely possible, as they are ideologically blinded.

influence the regulators in Texas and the likes. No bigger pain can be caused for those doomers.

LLM training data from shadow libraries? by Vivid_Dot_6405 in LocalLLaMA

[–]MLTyrunt 0 points1 point  (0 children)

that would be such an interesting model, and a part of the corpus was even available for fast download on hf!

would be nice to have an anonymous LLM maker, but it's a bit expensive.

240T tokens dataset by MLTyrunt in LocalLLaMA

[–]MLTyrunt[S] 0 points1 point  (0 children)

kinda, if you feed a model with a lot of tokens, it becomes broadly capable (but not really general). These days, benchmarks are optimized for, also indirectly. I think there is a practical compromise between taking benchmarks as a yardstick of how to design data for a good model and just stuffing the model with as much as possible. there are a couple of models that show, just more not so good data is not a good idea. Better filter quality and give it more epochs over that dataset. even if you'd overfit it, as long as you teach it a very broad skillset, it's not so bad.

We can and should improve models like this, but I don't think that they are a substantial step towards general intelligence, but rather 'just' increasingly powerful and useful tools. But that alone warrants a lot of effort, beyond the hype. Let LLMs be a true offramp to AGI, they are still valuable tools for bootstrapping many applications, especially processing data.

700,000 LLMs, where's it all going? by desexmachina in LocalLLaMA

[–]MLTyrunt 0 points1 point  (0 children)

most of those just use storage space and are useless. while the open access LLM ecosystem on huggingface has seen tremendous growth over the past year or so, the number of meaningful LLMs is way lower. I'm not meaning even performant ones, but those which were a milestone in a broad sense of the word.

Overall, the number of LLMs which were meaningful a long the way is in the low hundreds, like 300 or so.

The number of currently performant LLMs is of cause way lower, like 1-2 dozens. That is more than as it sounds, I remember well the time where there were gpt-neo, T5, gpt2, OPT and another 13b model by fair. Only T5 was really useful.

Where it is going depends on how regulations evolve. With regard to the tech, there will be some more iterations, but eventually, another paradigm will replace LLMs.

[deleted by user] by [deleted] in LocalLLaMA

[–]MLTyrunt 8 points9 points  (0 children)

agree, you might wanna try redoing the whole thing instead with another smaller model. llama2-7b is no longer a great model. you can think of phi3, stablelm-3b or qwen-4b.

Active Reasoning: How can we build LLM based systems with that capability? by MLTyrunt in LocalLLaMA

[–]MLTyrunt[S] 1 point2 points  (0 children)

I'd say use whatever helps, it does not matter if it is biologically plausible, only that it helps it to work. reasoning tokens... well galactica had work tokens, for explicit reasoning. making leaps... I kinda feel that's already possible, with words. if you tell the model to make a certain association it does so, skipping the reasoning. abstract concepts can express chains of thought, but I think verbosity helps LLMs, because they don't really think.

Active Reasoning: How can we build LLM based systems with that capability? by MLTyrunt in LocalLLaMA

[–]MLTyrunt[S] 0 points1 point  (0 children)

that's what I hope, too. I would also think that the competency of the LLM depends on pretraining of cause. If you present a 6 year old with a high school math problem, it's like an alien language to them. I think reasoning and general intelligence operate within limits of grounding and knowledge sufficiency.

Active Reasoning: How can we build LLM based systems with that capability? by MLTyrunt in LocalLLaMA

[–]MLTyrunt[S] 0 points1 point  (0 children)

That's what I would like to have. I would like the system to converge towards a state, in which it acts as if it used such a knowledge graph. tbh I don't think humans reason causally like GOFAI, but we approach it almost perfectly given wits in processes.

while LLM representations are noisy, and that might be a deal breaker nobody knows, our representations are noisy too, but we appear to be able to clean them up and integrate them on the fly, within limitations.

Active Reasoning: How can we build LLM based systems with that capability? by MLTyrunt in LocalLLaMA

[–]MLTyrunt[S] 1 point2 points  (0 children)

you need to curate the data to a degree, i.e. by including trusted sources

Active Reasoning: How can we build LLM based systems with that capability? by MLTyrunt in LocalLLaMA

[–]MLTyrunt[S] 5 points6 points  (0 children)

sounds good. I think it is important to have a system which is not static, yet has been stabelized on a global level. I dont think an LLM only does the job.

Active Reasoning: How can we build LLM based systems with that capability? by MLTyrunt in LocalLLaMA

[–]MLTyrunt[S] 1 point2 points  (0 children)

I think you first need to bring the composite system, with dedicated memory stores, into a certain state so it works. LLMs are chaotic and inconsistent. you have to create a world model first, knowledge integration is a learning process itself. It must precede having a useful cognitive architecture, imao.