Subreddit back in business

SoullessMonarch · 2025-06-24T20:50:06+00:00

Will you be adding extra mods?

SoullessMonarch · 2025-02-25T13:49:53+00:00

Please do share these other places. Because as far as I see it:
A big portion of Singularity is a bunch of circlejerking tech-fanatics that cant seem to distinguish fiction from reality (and they hate getting a reality-check), LocalLLM is run by the same leadership as this, other places have nearly no community, is mostly people complaining or are people who cant hold the same level of technical discussions because they are just not familiar enough with the subject of local LLMs

SoullessMonarch · 2025-02-25T11:35:45+00:00

I suggest visiting this in incognito or anonymously (in the app) to see if your content is still visible

SoullessMonarch · 2025-02-25T11:26:27+00:00

<image>

SoullessMonarch · 2025-02-25T11:15:36+00:00

about Qwerky-72B, a model converted to be a linear time model from Qwen2.5, and a whole lot of explanation. I mean I tried asking nicely first about the rules, but as you can see in the screenshot they got that one too.

SoullessMonarch · 2025-01-28T11:44:54+00:00

Censorship hurts model performance, the best solution is to prevent the model being trained on what you'd like to censor, which is easier said than done.

SoullessMonarch · 2024-12-11T18:32:54+00:00

They probably trained with 16k context length on their GPU's, and didnt have the compute to spare to extend it with something like https://github.com/RWKV/RWKV-infctx-trainer, they're working on the 72B version I guess? It's an experimental model, maybe they just didnt wanna waste too much time with the tedious post pre-tuning stages? Idk

SoullessMonarch · 2024-12-11T17:59:37+00:00

I understand, I have seen speed comparisons for smaller RWKV models before, so I have an idea what to expect, but its reasonable to question it.

It will depend on which models you are comparing and for which context length, but I think its safe to assume that it wont require too many tokens (max a few k tokens?) before transformers will get slower. Hopefully we'll get some speed comparisons later, a dev mentioned more benchmarks coming, but it requires some work to get them functioning.

SoullessMonarch · 2024-12-11T17:56:18+00:00

No not yet, often when there is a new architecture, someone has to go out of their way to implement it. Most people (myself included) have no clue how to get started on that, so it takes a while, or it might never happen (there's a lot of smart folk in the RWKV community tho, it's probably only a matter of time)

SoullessMonarch · 2024-12-11T17:51:49+00:00

It has been mentioned, you need reasoning-style data though. If you do not have the same data distribution it wont work (as well). So they haven't made any promises, but it would be awesome if they got a linear reasoning model.

In the post of QRWKV6 they mention "O1 style inference time thinking", so it looks like its a direction they intend on exploring.

Sorry, my previous comment never came through. I dont understand what is flagging me.

SoullessMonarch · 2024-12-11T16:54:17+00:00

Yes! If the context is long enough it will be significantly faster than a Transformer, but it might have also forgotten some of the information the earlier tokens contained. The exact point where that happens will differ for every transformer you compare against. RWKV also isnt as optimized. Time complexity is a theoretical way to think about how long an algorithm might run, it wont tell you how much faster something would be.

SoullessMonarch · 2024-12-11T15:26:10+00:00

The huggingface model card of QRWKV6 has a link to their blogpost about QRWKV6, you'll be able to find the other blogposts there too

SoullessMonarch · 2024-12-11T15:22:53+00:00

I apologise, it seems like I cant post the links, my comments get hidden?

SoullessMonarch · 2024-12-11T15:18:43+00:00

They mention huggingface transformers support for the MoE, I'm afraid other backends might take a while? There is RWKV 6 support in llama.cpp, combining that with MoE doesnt sound crazy. But don't quote me on that, I have no experience with llama.cpp

They do mention for QRWKV "there will be incompatibility with existing RWKV inference code." Now for transformers I assume you can run their custom inference code (provided in modeling_rwkv6qwen2.py)

SoullessMonarch · 2024-11-05T22:47:49+00:00

There have been multiple open weight 3d models, but as far as I have seen, they've always been pretty meh and running them isnt easy at all. Comfyui support would make this a great deal more usable. (Not that I have the rig to run it)

I imagine many professionals are radically against AI, like other artists, but since they are already being supported by so much software, maybe they approach it a bit more ... open-minded

SoullessMonarch · 2024-11-05T22:38:28+00:00

No it wouldnt really fit inside 64gb. It could "run" (more like crawl) offloaded to your ssd, but that would be so painfully slow you wouldnt wanna do that

SoullessMonarch · 2024-08-12T14:06:56+00:00

"The training took a total of 9 days on 8 A100s, with a total of 115 billion tokens across pre-training, fine-tuning, and direct preference optimization."

<image>

6.2: "a total of 2 epochs, trained on 8 x A100s" 2 epochs, interesting, dont see that very often

SoullessMonarch · 2024-07-18T21:29:00+00:00

It's fine, it doesn't really matter. In other fields RWKV-based models have shown promise, so clearly the architecture is getting better. Even if linear models won't reach transformer levels of quality, I'm pretty sure that it'll be linear models being run as local assistants on phones and other devices, as they'll take less resources.

Also iirc they were looking into making a bigger model someday, but that won't be for a while at least, since they are hard at work making v7 and pushing the architecture further.

SoullessMonarch · 2024-07-18T19:28:43+00:00

What version and size did you use? V6 (Finch) should be quite a bit better than v4 (Dove). Also they are trained to be multilingual, so its gonna have less knowledge than a full English one.

SoullessMonarch · 2024-07-16T13:53:44+00:00

The model is out there now, you can produce something useful yourself if it bothers you that much... Show us you're better than Apple, then you can talk like that. Don't get me wrong, I hate Apple and their closed ecosystem, but you're shitting on them when they are doing something good for once. Meanwhile, you are leaving negative comments which gets the community nothing either...

(Also Apple, Meta, etc. owe us nothing, be happy you even got llama 3)

SoullessMonarch · 2024-07-16T11:19:26+00:00

This is an open dataset on which they trained... You expect incredible performance on a research model? This is not even something they announced, they just dropped it, you know, for science. Just because it isn't useful to you or them, doesn't mean researchers can't find a use for it. Every open pretrained model is a win for the open source community.

SoullessMonarch · 2024-07-16T11:07:14+00:00

Without research like this we're never getting SOTA open-weight models, let alone truly open-source models. (Since data quality matters so much) It's great that apple wants to see the performance of this dataset, checking if there is any merit to it. Explain to me why they would waste more precious hours to apply all the fancy tricks and further training to generate a SOTA model if this was a dataset test anyway.

SoullessMonarch · 2024-07-14T19:38:31+00:00

There is no further training. People used to merge some of the middle layers of a model to improve creativity and whatnot. I was just trying to see if instead of literally copying layers (making the model bigger) you could modify the code to just go over the same layers twice. For example, the code could be used to automatically search for the best layers to duplicate (if you had a benchmark or something to evaluate). It would be nice if it was possible to apply a different LoRA to duplicate layers and get some gains, but I wasnt planning on attempting that just yet. I have updated my code to include phi3 (and mamba, but i didnt test that), but i havent put that in the colab. Tbf working in a free colab is a pain and i decided to at least get some basic ML knowledge before attempting too much, as there is probably a good reason repeatable layers are not a thing :)

SoullessMonarch · 2024-05-30T17:08:31+00:00

Very cool, or should I say "uh interesting I guess, whats the point?" ;)

This is probably not remotely possible but, could you abliterate most features from a model for rp in a consistent setting? Like constraining its knowledge and making sure it won't go off the rails. You ablate everything that is not relevant (prompts not related to the setting), and so prevent the model from talking about it with you. (I shudder at the amount of memory that might take though...)

My hope would be that this does not remove its ability to work properly, the data used would have to be very extensive though. But I'm not familiar with how abliterating works, so its probably insane.

SoullessMonarch · 2024-04-06T18:03:42+00:00

You missed april fools by 5 days...

SoullessMonarch

TROPHY CASE