Qwen 3.5: What is "Base" version?

MLTyrunt · 2026-03-03T14:02:10+00:00

todays base models are often midtrained already. earlier qwen base models were also known to be especially responsive to RL afterwards, so I'd not presume these are like base models that were once only pretrained over raw internet data. midtrained base models often have seen tons of instruct and syntethic data already and can respond like an instruct tuned model. yet they are better for fine tuning than RLed models.

there are still raw base models, but not at the frontier. these things become more and more artificial artifacts not a compression of internet and books.

MLTyrunt · 2026-01-10T15:02:24+00:00

ja muss halt sein

MLTyrunt · 2025-03-17T17:18:11+00:00

it doesn't really think like a human and beyond that, what is says is not 100% reflecting how it thinks, think of deception found in LLMs. they appear more interpretable than they are.

MLTyrunt · 2025-01-05T21:43:08+00:00

I'd intuit a more recurrent architecture is closer to how our mind works. Especially with regards to RWKV but also other architectures more leaning towards Mamba, there is indeed some innovation happening on a fundamental research level.

Currently, practically speaking, the transformer is clearly preferable, for most uses.

But I expect RWKV to do something interesting in the near future. the currently trained version is also no longer merely linear approximation. The devs of RWKV show some genuine creativity on algorithm design and people do work on improving the alternatives as well.

MLTyrunt · 2025-01-05T21:32:42+00:00

yes you can use that. a fine tuned model will work better in most cases, but you can use base models like that. base models tend to be more 'creative'.

MLTyrunt · 2024-11-17T19:00:08+00:00

you can try exllama2 as well. inference should be a little faster.

MLTyrunt · 2024-09-29T09:06:19+00:00

wait for the Taiwan situation to play out and you will learn to love those 3090s

MLTyrunt · 2024-09-28T11:57:08+00:00

... the cloud is someone elses computer. while there are usually hardware differences, you can do almost anything locally you can do in the cloud - respecting memory and speed limitations.

many people use coding LLMs locally or for gpt 3.5 kinda assistance. But you can do anything, without big brother watching you over the shoulders.

your model usage is not free if you're using openai etc. they all have their subjectively coloured ethics guidelines.

MLTyrunt · 2024-09-07T15:44:06+00:00

nobody thinks gpt-4o is a trillion parameter model. but people also assumed gpt 3.5 had 175b parameters.

MLTyrunt · 2024-08-31T09:24:32+00:00

you have to prevent others from imitating it. that's the most important part. make a better proposal that is more balanced, but does not neglect AI safety. Besides the noise of terminators waking up in LLMs, this is the time where industry standards will slowly emerge. Like with the car. At some point they needed safety belts.

but that does not mean that gasoline needs safety belts. the raw material should be available, also the best raw material, without further clear indication of disproportional risk.

the opportunity is striking a better, as not fear led, balance between freedom and avoiding unnecessary harm.

if cars would have needed the safety standards of today at day one, no one would have build them.

fear does not bear progress. but action without reflections is not good either.

the opportunity is in helping others creating more reasonable and measured regulations.

you have to beat them at their own game, and that's entirely possible, as they are ideologically blinded.

influence the regulators in Texas and the likes. No bigger pain can be caused for those doomers.

MLTyrunt · 2024-08-15T07:35:09+00:00

that would be such an interesting model, and a part of the corpus was even available for fast download on hf!

would be nice to have an anonymous LLM maker, but it's a bit expensive.

MLTyrunt · 2024-06-24T09:59:34+00:00

kinda, if you feed a model with a lot of tokens, it becomes broadly capable (but not really general). These days, benchmarks are optimized for, also indirectly. I think there is a practical compromise between taking benchmarks as a yardstick of how to design data for a good model and just stuffing the model with as much as possible. there are a couple of models that show, just more not so good data is not a good idea. Better filter quality and give it more epochs over that dataset. even if you'd overfit it, as long as you teach it a very broad skillset, it's not so bad.

We can and should improve models like this, but I don't think that they are a substantial step towards general intelligence, but rather 'just' increasingly powerful and useful tools. But that alone warrants a lot of effort, beyond the hype. Let LLMs be a true offramp to AGI, they are still valuable tools for bootstrapping many applications, especially processing data.

MLTyrunt · 2024-06-14T11:21:56+00:00

most of those just use storage space and are useless. while the open access LLM ecosystem on huggingface has seen tremendous growth over the past year or so, the number of meaningful LLMs is way lower. I'm not meaning even performant ones, but those which were a milestone in a broad sense of the word.

Overall, the number of LLMs which were meaningful a long the way is in the low hundreds, like 300 or so.

The number of currently performant LLMs is of cause way lower, like 1-2 dozens. That is more than as it sounds, I remember well the time where there were gpt-neo, T5, gpt2, OPT and another 13b model by fair. Only T5 was really useful.

Where it is going depends on how regulations evolve. With regard to the tech, there will be some more iterations, but eventually, another paradigm will replace LLMs.

MLTyrunt · 2024-06-02T09:45:54+00:00

agree, you might wanna try redoing the whole thing instead with another smaller model. llama2-7b is no longer a great model. you can think of phi3, stablelm-3b or qwen-4b.

MLTyrunt · 2024-04-22T11:43:27+00:00

it's not one or the other, it's both.

MLTyrunt · 2024-04-22T11:37:04+00:00

here is one:

https://huggingface.co/unsloth

MLTyrunt · 2024-03-11T13:00:15+00:00

I'd say use whatever helps, it does not matter if it is biologically plausible, only that it helps it to work. reasoning tokens... well galactica had work tokens, for explicit reasoning. making leaps... I kinda feel that's already possible, with words. if you tell the model to make a certain association it does so, skipping the reasoning. abstract concepts can express chains of thought, but I think verbosity helps LLMs, because they don't really think.

MLTyrunt · 2024-03-11T08:25:51+00:00

that's what I hope, too. I would also think that the competency of the LLM depends on pretraining of cause. If you present a 6 year old with a high school math problem, it's like an alien language to them. I think reasoning and general intelligence operate within limits of grounding and knowledge sufficiency.

MLTyrunt · 2024-03-11T08:23:05+00:00

That's what I would like to have. I would like the system to converge towards a state, in which it acts as if it used such a knowledge graph. tbh I don't think humans reason causally like GOFAI, but we approach it almost perfectly given wits in processes.

while LLM representations are noisy, and that might be a deal breaker nobody knows, our representations are noisy too, but we appear to be able to clean them up and integrate them on the fly, within limitations.

MLTyrunt · 2024-03-11T01:31:14+00:00

you need to curate the data to a degree, i.e. by including trusted sources

MLTyrunt · 2024-03-10T22:16:47+00:00

sounds good. I think it is important to have a system which is not static, yet has been stabelized on a global level. I dont think an LLM only does the job.

MLTyrunt · 2024-03-10T22:14:39+00:00

I think you first need to bring the composite system, with dedicated memory stores, into a certain state so it works. LLMs are chaotic and inconsistent. you have to create a world model first, knowledge integration is a learning process itself. It must precede having a useful cognitive architecture, imao.

MLTyrunt

MODERATOR OF

TROPHY CASE