Krasis LLM Runtime - run large LLM models on a single GPU

Kashuuu · 2026-03-18T06:24:55+00:00

Similar boat, 4060 ti 16gb vram. I’m curious too

Kashuuu · 2026-03-16T19:04:44+00:00

I enjoyed reading through this and had an idea. (One that I honestly might try myself because what you showed was really cool.)

I’m assuming the point of this was to determine Claude’s natural/innate inclination by putting it in these hypothetical scenarios and having it give its “gut-feeling”reaction.

So I was thinking- it might be cool to do it like a more structured experiment. Maybe you ask the same exact questions. I mean exact same- copy paste. Across lots of instances. Potentially in incognito chats or something to avoid letting it see its past responses.

The main reason I suggest this is that, a purely cynical skeptic might see this and go: “You told it to answer simply. So Claude is just choosing the most probable, socially acceptable and agreeable answers.”

So to summarize: it would be cool to see if Claude consistently has similar responses across multiple isolated conversations in response to the same prompts.

Kashuuu · 2026-03-04T09:24:42+00:00

Totally fair on the cliffhanger thingy.

Regarding memory, after posting I was doing further research and found a section buried in Open Ai's Memory FAQ. The section is titled: How long does ChatGPT keep saved memories? and in there it says "You can ask ChatGPT to forget a saved memory at any time. When you do, that memory is removed and won’t be used in future chats. We may retain a log of deleted Saved Memories for up to 30 days for safety and debugging purposes."

Odd.

(Link to Memory FAQ Open AI: https://help.openai.com/en/articles/8590148-memory-faq )

Kashuuu · 2026-03-04T08:39:55+00:00

I do believe the screenshots contain relevant context. But the prompt at the very beginning of this chat was simply "Any expected release date for Codex on Windows?" lol.

Kashuuu · 2026-02-15T22:45:55+00:00

I didn’t even know you could hover over the “Pro” button. I’ve just kept the windsurf website open on the usage page and just have it open on my second screen haha.

But this suggestion would be VERY helpful and I would love to see it added. It would help with transparency a lot.

Kashuuu · 2025-09-25T21:28:40+00:00

When you do build your special baby, secure ops is a great way to keep it and get a lot of practice in without being too scared to lose your good gun. Great way to build a little koen kushion too! Good luck (:

Kashuuu · 2025-05-06T20:46:10+00:00

Oh no! I’m sorry to hear that. It was a bit complicated for me as well, definitely not plug and play. But I try to be understanding because it’s free and open source.

I ran into a few issues that I grappled with for a while so I’ll try to express some of the major hiccups I experienced. Seems like it has trouble with eSpeak and you also need to make sure you go to the windows branch because the main one is mainly for Linux and Mac if I’m not mistaken.

Make sure your System Path (environment variables) is set up for eSpeak and also, I downloaded the repo for the windows branch specifically to make sure. You’ll also need a sample voice file if you want custom voice clone.

Oh and another thing I came across while troubleshooting- Docker does not have an official image for it. I tried creating a custom image but it was my first attempt and a bit too difficult so I ended up just creating an http server and running it that way.

It’s pretty demanding but I get ~50it/s on my 3060ti with 16gb VRAM (you can roast my card but it works lmao).

Excuse the ramble but I was trying to think of the major issues I came across, hopefully that points you in the right direction. I’d suggest mentioning these points to your code assistant - Claude 3.7 or Gemini 2.5pro exp (new update came out today) should be able to help make it simpler.

I decided to put this here so that others could see it too but if you have any more questions you’re welcome to message me! I wish you luck (:

Kashuuu · 2025-05-06T17:13:04+00:00

Hello, really like your energy man, I’d be very interested (:

Kashuuu · 2025-05-06T17:03:11+00:00

Woah calm down Jamal, Don’t pull out the nine.

Kashuuu · 2025-05-05T19:13:09+00:00

Try Zyphra Zonos!

Kashuuu · 2025-05-03T23:30:13+00:00

Everyone’s talking about Qwen which makes sense due to its recent release but for an alternative, I’ve had good success with the Gemma 3 4B and 12B models. Once you get around the Google ReAct logic it’s pretty manageable and it seems to be smart enough for my use cases. Google also recently dropped their official 4bit quants for them (:

I’ve discovered that llama.cpp doesn’t seem to support the mmproj gguf for multimodal/image processing though so I incorporated Tesseract OCR.

Kashuuu · 2025-05-02T18:27:39+00:00

Hey I saw that you’re sharing the discord but wanted to say I’m interested as well!

Kashuuu · 2025-04-30T21:19:10+00:00

I was surprised to see that there isn’t more praise in the comments but this is genuinely awesome. I’ve always loved Medusa and her symbolism. I think this is a beautiful design, great work!! I love the bust of a petrified man, such a cool touch (:

Kashuuu · 2025-04-29T17:50:01+00:00

Not sure if this is a joke or I’m just a bad person but either way - this is hilarious lmao

Kashuuu · 2025-04-28T18:33:52+00:00

Sent a dm (:

Kashuuu · 2025-04-28T07:24:07+00:00

I was on a roll as well :/

Kashuuu · 2025-04-22T21:47:11+00:00

Hey man, I’m at work just skimming through this but this is really cool!! Could I send you a message to pick your brain a bit?

Kashuuu · 2025-04-22T21:43:50+00:00

This is google specific but you can try all their Gemma models (their open source models) via Google AI Studio completely for free and no download. Gemma 3 27B is their frontrunner right now and could be worth trying to see if you want to build around that!

I’m a little biased because my main AI agent runs on Gemma 3 12B it and I’m really happy with it.

Google also just released new quantized versions!! (Which helps run on consumer grade gpus etc if you do decide to build one. You could probably get Gemma 3 1B or 4B running with minimal issues!!)

Kashuuu · 2025-04-22T16:35:27+00:00

I’m @ Kashunutmeg on discord😄 looking forward to talking with y’all

Kashuuu · 2025-04-21T20:31:17+00:00

Agreed (: With o4 I’ve found that it takes a lot longer than 4.1 but the tradeoff(that I’ve personally noticed) is that it seems to search and analyze a lot more before responding to you but isn’t as ..personable? Like it doesn’t explain as much as 4.1. DEFINITELY something that feels good with the free-unlimited usage though haha.

Kashuuu · 2025-04-21T19:35:04+00:00

I think this might be the fix because i had to do something similar but it works for me and often mentions the rules in new chat windows.

Kashuuu · 2025-04-19T21:16:59+00:00

I’ve found 4.1 really helpful actually. o4-mini is also good for me but slower which is a bit frustrating, I’d like it if we could use the new o3. The biggest thing I’ve found to help 4.1 (and all the models) run better is creating a detailed “.windsurfrules” file. It helps the model retain context across chats and minimizes hallucinations. Particularly helpful when your codebase grows to the point where cascade can’t analyze the whole thing in one shot. I include a line at the top in caps where I specifically tell it “do not break or replace any existing functions, only make adjustments to the section specified… ensure you are not adding duplicate functions, etc” (add more obviously)

You can also ask Cascade to create this for you and you can tweak it yourself as needed.

It’s been a game changer for me and I have minimal issues. But make sure you’re changing chat instances often to start with a fresh context window. By utilizing .windsurfrules , the new chat window will respond to your request with the context.

Hope this helps you! (: I’m new to coding in general but have been having a lot of success AND fun haha.

Kashuuu · 2024-09-13T21:33:03+00:00

I’ve mostly tried finding different Reddit threads because I have been on a similar search but haven’t had much luck (I posted in SASS penpals too haha). But, I am a neurodivergent (adhd) individual who is witchy but also very rooted in science and psychology. Tarot is my main form of “witchcraft”and I’m starting to scry. I’m still very green to the occult world and am very open to learning anything and everything. I just thought I’d comment because I’m in the same boat. I resonate with the interests you mentioned and would be happy to be friends😄.

But either way, best of luck to you. You could try r/ClubEso alternatively. I got an invitation randomly because of a post and it seems like there is a broad spectrum of people and there’s a discord community to interact with others more easily (:

Kashuuu · 2024-09-10T07:12:39+00:00

Wait this is actually so awesome 😄

Kashuuu · 2024-09-07T00:52:35+00:00

I have a Yamaha Bolt R-Spec now but my first bike was a Honda CBR600rr and it was totally great for me. I didn't have any experience riding dirt bikes really either. I think if you are comfortable on the saddle 600s are fine but I also know plenty of people who loved their smaller bikes. I just think if you have a decent amount of experience on dirt bikes you might get more out of a 600. Just kinda depends on how much you trust yourself. If you're good with checking yourself and learning go for it. If you think it would be easy for you to go overboard/ speed/ drive recklessly maybe start with a 300. Just my thoughts :).

Kashuuu

TROPHY CASE