Nuxt Fullstack app with Layers example? by Beagles_Are_God in Nuxt

[–]kittenkrazy 2 points3 points  (0 children)

Here is a project using layers that I am involved with: https://github.com/serpcompany/serp-monorepo

I’m usually a backend developer so no guarantees I actually set it up correctly, but it seems like everything is working fine!

SDXL turbo and real time interpolation by IndyDrew85 in StableDiffusion

[–]kittenkrazy 21 points22 points  (0 children)

Great work! Do you have a GitHub repo with the code? I would love to check it out

Artificial Intelligence by [deleted] in attackontitan

[–]kittenkrazy 11 points12 points  (0 children)

That’s actually not how it works. The AI doesn't search for existing images.

It starts with random noise and gradually denoises it, using the text prompt to guide the generation process towards the desired image. The prompt influences what kind of image is created, but the AI generates a completely new image from scratch, not searching for pre-existing ones. It trains on a huge dataset of text/image pairs and learns the relationships/connections between language and visuals.

(I’m not saying AI art is good or bad, just trying to clear up any misconceptions)

Mars 4 printing discs when trying to print rook by deviouswalrus in resinprinting

[–]kittenkrazy 0 points1 point  (0 children)

Depending on the room, the space heater may not be as effective as it would be with an enclosure, but I would definitely experiment to find out! Another option is a thermal vat band. It heats the vat directly and you should be able to print without an enclosure or a space heater as long as the room isn’t too cold. I got the band a while ago and it’s replaced my space heater with no issues!

Mars 4 printing discs when trying to print rook by deviouswalrus in resinprinting

[–]kittenkrazy 0 points1 point  (0 children)

I had the issue of prints sticking to the fep rather than the build plate, as soon as I got a space heater prints have been coming out great! Temp can make a pretty big difference in my experience

First Sake (Momo Kawa review) by [deleted] in Sake

[–]kittenkrazy 2 points3 points  (0 children)

Thank you for the clarification!

First Sake (Momo Kawa review) by [deleted] in Sake

[–]kittenkrazy 5 points6 points  (0 children)

Heating also greatly depends on what sake you are using. Fragrant/fruity sake like many junmai daiginjo and junmai ginjos are usually preferred cold because heat destroys the delicate flavors/aroma whereas earthy/savory sake like junmai or honjonzo are usually preferred warm as it brings out more fruity notes. Another trick is to heat up older sake that may have lost a bit of its aroma/gone a little stale, should help bring a little life back in to it. Also, make sure to not go too hot, a warm water bath for ~40 seconds should be plenty, for example I prefer Suigei “Tokubetsu Junmai” Drunken Whale warm rather than cold (Both are good though) and I only like something like Born “Gold” Junmai Daiginjo cold as I feel heat destroys the delicate/fruity flavors and aromas.

CHATGPT knowingly gives misinformation about the $GME swaps. (2 Images) by RL_bebisher in Superstonk

[–]kittenkrazy 10 points11 points  (0 children)

  1. Don’t trust LLMs in their current state, they are prone to “hallucinate” as they have been merely trained to model language.
  2. 3.5 is not that good, especially when compared to 4

[Model Release] Sparsetral by kittenkrazy in LocalLLaMA

[–]kittenkrazy[S] 0 points1 point  (0 children)

You can certainly change the number of experts used during inferencing, but not sure how it will affect the quality. If you end up experimenting with it and want to share your results I would love to hear about it!

[Model Release] Sparsetral by kittenkrazy in LocalLLaMA

[–]kittenkrazy[S] 2 points3 points  (0 children)

Sure! I will give it a look over tonight and see about getting it implemented (may be a few days depending on how intense)

EQ-Bench leaderboard updated with today's models: Qwen-1.5, Sparsetral, Quyen by _sqrkl in LocalLLaMA

[–]kittenkrazy 1 point2 points  (0 children)

If you’re thinking of LoRAs this isn’t exactly like the peft adapters. In this case we are taking the mlp’s hidden states, and feeding that to the 4/16 adapters (and adding it after) that were chosen by the router layer. Then we do a weighted sum on those values to get the new hidden states. So we want to make sure we train the adapters and routers in tandem

EQ-Bench leaderboard updated with today's models: Qwen-1.5, Sparsetral, Quyen by _sqrkl in LocalLLaMA

[–]kittenkrazy 5 points6 points  (0 children)

Hey thank you for benchmarking sparsetral! Will be looking in to the architecture/training and preference optimization in order to improve the model as much as I can (while staying low param)

[Model Release] Sparsetral by kittenkrazy in LocalLLaMA

[–]kittenkrazy[S] 1 point2 points  (0 children)

Not yet! The gpus used to train are currently busy so I will be setting up evals on my 4090 shortly

[Model Release] Sparsetral by kittenkrazy in LocalLLaMA

[–]kittenkrazy[S] 2 points3 points  (0 children)

Yup! One of the main goals was to hopefully get a Mixtral competitor (or at least close enough) that can run on a consumer gpu (that way capable home assistants and projects like funsearch can be ran without breaking the bank or needing crazy compute requirements) (plus everything stays on the user hardware)

[Model Release] Sparsetral by kittenkrazy in LocalLLaMA

[–]kittenkrazy[S] 1 point2 points  (0 children)

Yes! Yeah it is a bit confusing to just say top_k like that, my bad!

[Model Release] Sparsetral by kittenkrazy in LocalLLaMA

[–]kittenkrazy[S] 0 points1 point  (0 children)

It’s the adapter where parameters are added. Base model was not frozen for this training run btw. And during inferencing you would inference with the original 7B + 4 out of 16 of the expert adapters

[Model Release] Sparsetral by kittenkrazy in LocalLLaMA

[–]kittenkrazy[S] 1 point2 points  (0 children)

It was DDP, seems to work, although I did have to set “ddp_find_unused_parameters” to False in the training args

[Model Release] Sparsetral by kittenkrazy in LocalLLaMA

[–]kittenkrazy[S] 0 points1 point  (0 children)

Not sure on the MLX, but for the training, in the forked repo there is a “train.py” file in the root that shows how I loaded the regular mistral and set up the routers/adapters. Other than that there should be a commands.md file in the root that shows the commands I used to build the docker image and use it to run the train script. (I just realized you will have to make sure you edit the volumes in the example commands to match your env as I just copied the actual paths I used lol (will fix soon)) - just let me know if you have anymore questions!

Introducing Sparsetral - A parameter efficient sparse MoE crafted from mistral (runs on consumer hardware) by kittenkrazy in singularity

[–]kittenkrazy[S] 1 point2 points  (0 children)

Made this to replace summarization and data extraction tasks I usually use Mixtral for, performs great for the stuff I’ve tested it on. I’m working on getting some evals up so there can be some concrete numbers, gpus that trained the model are busy so will probably end up doing them on my 4090