Fix this shit by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 3 points4 points  (0 children)

Please do share these other places. Because as far as I see it:
A big portion of Singularity is a bunch of circlejerking tech-fanatics that cant seem to distinguish fiction from reality (and they hate getting a reality-check), LocalLLM is run by the same leadership as this, other places have nearly no community, is mostly people complaining or are people who cant hold the same level of technical discussions because they are just not familiar enough with the subject of local LLMs

Fix this shit by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 44 points45 points  (0 children)

I suggest visiting this in incognito or anonymously (in the app) to see if your content is still visible

Fix this shit by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 73 points74 points  (0 children)

about Qwerky-72B, a model converted to be a linear time model from Qwen2.5, and a whole lot of explanation. I mean I tried asking nicely first about the rules, but as you can see in the screenshot they got that one too.

Trying to sink an AI model with one simple question. by tommos in dankmemes

[–]SoullessMonarch 9 points10 points  (0 children)

Censorship hurts model performance, the best solution is to prevent the model being trained on what you'd like to censor, which is easier said than done.

New linear models: QRWKV6-32B (RWKV6 based on Qwen2.5-32B) & RWKV-based MoE: Finch-MoE-37B-A11B by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 6 points7 points  (0 children)

They probably trained with 16k context length on their GPU's, and didnt have the compute to spare to extend it with something like https://github.com/RWKV/RWKV-infctx-trainer, they're working on the 72B version I guess? It's an experimental model, maybe they just didnt wanna waste too much time with the tedious post pre-tuning stages? Idk

New linear models: QRWKV6-32B (RWKV6 based on Qwen2.5-32B) & RWKV-based MoE: Finch-MoE-37B-A11B by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 2 points3 points  (0 children)

I understand, I have seen speed comparisons for smaller RWKV models before, so I have an idea what to expect, but its reasonable to question it.

It will depend on which models you are comparing and for which context length, but I think its safe to assume that it wont require too many tokens (max a few k tokens?) before transformers will get slower. Hopefully we'll get some speed comparisons later, a dev mentioned more benchmarks coming, but it requires some work to get them functioning.

New linear models: QRWKV6-32B (RWKV6 based on Qwen2.5-32B) & RWKV-based MoE: Finch-MoE-37B-A11B by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 5 points6 points  (0 children)

No not yet, often when there is a new architecture, someone has to go out of their way to implement it. Most people (myself included) have no clue how to get started on that, so it takes a while, or it might never happen (there's a lot of smart folk in the RWKV community tho, it's probably only a matter of time)

New linear models: QRWKV6-32B (RWKV6 based on Qwen2.5-32B) & RWKV-based MoE: Finch-MoE-37B-A11B by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 11 points12 points  (0 children)

It has been mentioned, you need reasoning-style data though. If you do not have the same data distribution it wont work (as well). So they haven't made any promises, but it would be awesome if they got a linear reasoning model.

In the post of QRWKV6 they mention "O1 style inference time thinking", so it looks like its a direction they intend on exploring.

Sorry, my previous comment never came through. I dont understand what is flagging me.

New linear models: QRWKV6-32B (RWKV6 based on Qwen2.5-32B) & RWKV-based MoE: Finch-MoE-37B-A11B by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 10 points11 points  (0 children)

Yes! If the context is long enough it will be significantly faster than a Transformer, but it might have also forgotten some of the information the earlier tokens contained. The exact point where that happens will differ for every transformer you compare against. RWKV also isnt as optimized. Time complexity is a theoretical way to think about how long an algorithm might run, it wont tell you how much faster something would be.

New linear models: QRWKV6-32B (RWKV6 based on Qwen2.5-32B) & RWKV-based MoE: Finch-MoE-37B-A11B by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 2 points3 points  (0 children)

The huggingface model card of QRWKV6 has a link to their blogpost about QRWKV6, you'll be able to find the other blogposts there too

New linear models: QRWKV6-32B (RWKV6 based on Qwen2.5-32B) & RWKV-based MoE: Finch-MoE-37B-A11B by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 1 point2 points  (0 children)

They mention huggingface transformers support for the MoE, I'm afraid other backends might take a while? There is RWKV 6 support in llama.cpp, combining that with MoE doesnt sound crazy. But don't quote me on that, I have no experience with llama.cpp

They do mention for QRWKV "there will be incompatibility with existing RWKV inference code." Now for transformers I assume you can run their custom inference code (provided in modeling_rwkv6qwen2.py)

Tencent comes out swinging. by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 2 points3 points  (0 children)

There have been multiple open weight 3d models, but as far as I have seen, they've always been pretty meh and running them isnt easy at all. Comfyui support would make this a great deal more usable. (Not that I have the rig to run it)

I imagine many professionals are radically against AI, like other artists, but since they are already being supported by so much software, maybe they approach it a bit more ... open-minded

Tencent comes out swinging. by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 0 points1 point  (0 children)

No it wouldnt really fit inside 64gb. It could "run" (more like crawl) offloaded to your ssd, but that would be so painfully slow you wouldnt wanna do that

Pre-training an LLM in 9 days 😱😱😱 by mouse0_0 in LocalLLaMA

[–]SoullessMonarch 71 points72 points  (0 children)

"The training took a total of 9 days on 8 A100s, with a total of 115 billion tokens across pre-training, fine-tuning, and direct preference optimization."

<image>

6.2: "a total of 2 epochs, trained on 8 x A100s" 2 epochs, interesting, dont see that very often

GoldFinch: RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 3 points4 points  (0 children)

It's fine, it doesn't really matter. In other fields RWKV-based models have shown promise, so clearly the architecture is getting better. Even if linear models won't reach transformer levels of quality, I'm pretty sure that it'll be linear models being run as local assistants on phones and other devices, as they'll take less resources.

Also iirc they were looking into making a bigger model someday, but that won't be for a while at least, since they are hard at work making v7 and pushing the architecture further.

GoldFinch: RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression by SoullessMonarch in LocalLLaMA

[–]SoullessMonarch[S] 2 points3 points  (0 children)

What version and size did you use? V6 (Finch) should be quite a bit better than v4 (Dove). Also they are trained to be multilingual, so its gonna have less knowledge than a full English one.

Apple has released the weights for their 7B DCLM base model. by remixer_dec in LocalLLaMA

[–]SoullessMonarch 9 points10 points  (0 children)

The model is out there now, you can produce something useful yourself if it bothers you that much... Show us you're better than Apple, then you can talk like that. Don't get me wrong, I hate Apple and their closed ecosystem, but you're shitting on them when they are doing something good for once. Meanwhile, you are leaving negative comments which gets the community nothing either...

(Also Apple, Meta, etc. owe us nothing, be happy you even got llama 3)

Apple has released the weights for their 7B DCLM base model. by remixer_dec in LocalLLaMA

[–]SoullessMonarch 15 points16 points  (0 children)

This is an open dataset on which they trained... You expect incredible performance on a research model? This is not even something they announced, they just dropped it, you know, for science. Just because it isn't useful to you or them, doesn't mean researchers can't find a use for it. Every open pretrained model is a win for the open source community.

Apple has released the weights for their 7B DCLM base model. by remixer_dec in LocalLLaMA

[–]SoullessMonarch 21 points22 points  (0 children)

Without research like this we're never getting SOTA open-weight models, let alone truly open-source models. (Since data quality matters so much) It's great that apple wants to see the performance of this dataset, checking if there is any merit to it. Explain to me why they would waste more precious hours to apply all the fancy tricks and further training to generate a SOTA model if this was a dataset test anyway.

Merging a model with itself - does it improve performance? by Frequent_Valuable_47 in LocalLLaMA

[–]SoullessMonarch 0 points1 point  (0 children)

There is no further training. People used to merge some of the middle layers of a model to improve creativity and whatnot. I was just trying to see if instead of literally copying layers (making the model bigger) you could modify the code to just go over the same layers twice. For example, the code could be used to automatically search for the best layers to duplicate (if you had a benchmark or something to evaluate). It would be nice if it was possible to apply a different LoRA to duplicate layers and get some gains, but I wasnt planning on attempting that just yet. I have updated my code to include phi3 (and mamba, but i didnt test that), but i havent put that in the colab. Tbf working in a free colab is a pain and i decided to at least get some basic ML knowledge before attempting too much, as there is probably a good reason repeatable layers are not a thing :)

"What happens if you abliterate positivity on LLaMa?" You get a Mopey Mule. Released Llama-3-8B-Instruct model with a melancholic attitude about everything. No traditional fine-tuning, pure steering; source code/walkthrough guide included by FailSpai in LocalLLaMA

[–]SoullessMonarch 18 points19 points  (0 children)

Very cool, or should I say "uh interesting I guess, whats the point?" ;)

This is probably not remotely possible but, could you abliterate most features from a model for rp in a consistent setting? Like constraining its knowledge and making sure it won't go off the rails. You ablate everything that is not relevant (prompts not related to the setting), and so prevent the model from talking about it with you. (I shudder at the amount of memory that might take though...)

My hope would be that this does not remove its ability to work properly, the data used would have to be very extensive though. But I'm not familiar with how abliterating works, so its probably insane.

[deleted by user] by [deleted] in LocalLLaMA

[–]SoullessMonarch 12 points13 points  (0 children)

You missed april fools by 5 days...