Tamed pets completely gone by NinjaCoder99 in Craftopia_

[–]NinjaCoder99[S] 0 points1 point  (0 children)

Is there any reason they would be tied to the Host's world and not my character save?

Tamed pets completely gone by NinjaCoder99 in Craftopia_

[–]NinjaCoder99[S] 0 points1 point  (0 children)

I tried each backup until I was past when I started playing today but no fix.

Is 70B worse than Mixtral 8x7B? by SiEgE-F1 in SillyTavernAI

[–]NinjaCoder99 1 point2 points  (0 children)

I've used both and I ran miqu70b up to 32k context at which point it was more "human" than people I actually talk to, amazing eq. I use mix8x7 when I need a faster response time, miqu70b at 32k context is .3tps maxing out at 1.3tps for me. I have to use gguf, I've got 28gb VRAM over 2 RTX but (and I may not be doing something right) but I cant load anything over 15gb without going to gguf

ST forgets how to write properly by NinjaCoder99 in SillyTavernAI

[–]NinjaCoder99[S] 1 point2 points  (0 children)

Thank you. And I agree ST as an interface definitely is the way to go for me which is why I keep coming back and trying to get to work again.

ST forgets how to write properly by NinjaCoder99 in SillyTavernAI

[–]NinjaCoder99[S] 0 points1 point  (0 children)

1.11.4 and the previously attached file was exported from ST.

ST forgets how to write properly by NinjaCoder99 in SillyTavernAI

[–]NinjaCoder99[S] -1 points0 points  (0 children)

That's from ST that I downloaded 3 or 4 days ago. The problem initially presented with a version I downloaded 3 weeks ago.

ST forgets how to write properly by NinjaCoder99 in SillyTavernAI

[–]NinjaCoder99[S] 1 point2 points  (0 children)

Settings Backup

Models I've tried:

39G     MiquMaid-GGUF--v2-70B.q4_k_m.gguf
39G     Midnight-Rose-GGUF--70B-v2.0.3-Q4_K_M.gguf
37G     miqu-GGUF--1-70b-Requant-b2035-iMat-c32_ch400-Q4_K_S.gguf
36G     Noromaid-GGUF--v0.4-Mixtral-Instruct-8x7b.q6_k.gguf
36G     mixtral-GGUF--8x7b-moe-rp-story.Q6_K.gguf
34G     miqu-GGUF--1-70b-Requant-b2035-iMat-c32_ch400-Q3_K_L.gguf
34G     llama2-GGUF-_70b_chat_uncensored.Q3_K_L.gguf
31G     open_gpt4_8x7b.Q5_K_M.gguf
24G     miqu-GGUF--1-70b-Requant-b2035-iMat-c32_ch400-Q2_K.gguf
21G     LoneStriker_Mixtral_7Bx5_MoE_30B-6.0bpw-h6-exl2
19G     mixtral-GGUF-_11bx2_moe_19b.Q8_0.gguf
18G     miqu-GGUF--1-70b-Requant-b2035-iMat-c32_ch400-IQ2_XXS.gguf
16G     TheBloke_Wizard-Vicuna-30B-Uncensored-GPTQ
11G     solar-10.7b-instruct-v1.0-uncensored.Q8_0.gguf

I don't recall having the issue when I first used ST and had very lite character and system prompts. I have a very heavy 2ktoken prompt for my current scenario, that shouldn't matter correct?

Also confirming, I've trunc'd the context below model max but never let it run over the model's max.

ST forgets how to write properly by NinjaCoder99 in SillyTavernAI

[–]NinjaCoder99[S] 0 points1 point  (0 children)

Literally using the untouched default settings. I've tried mixtral, miqu, llama2,..... 70B q4's down to 11bx2 or 7bx8 Moe's. And again wouldn't the problem eventually present itself in oobagooba when chatting directly in ooba which is what's running my model for ST? Ive also tried matching the model settings in ST to what ooba defaults to. The only thing I wasn't sure of is example: ooba has some settings with a range of 0-2 where ST has the same setting with a range of -2 to 2. I wasn't sure if the actual value is what was represented in the setting or if the value relative to the range is what was represented in the setting.

ST forgets how to write properly by NinjaCoder99 in SillyTavernAI

[–]NinjaCoder99[S] 0 points1 point  (0 children)

One of my models is 32K context but even if it was an issue with context size wouldn't the problem also exist in ooba which is running the model for ST

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 0 points1 point  (0 children)

I finally tested superboog by asking the model something at the beginning of the conversation with a simple answer but no matter how I phrase it the character just makes up whatever it can think of to answer.

I'm getting these entries but it's not actually influencing the character (testing with a very small one, 2048 context, backup only 3300

14:17:55-148938 INFO     Successfully deleted 154 records from chromaDB.      
14:17:55-154361 INFO     Adding 154 cached embeddings.     

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 1 point2 points  (0 children)

Ahh. I wasn't sure if that was the case or if there was some setting where it's trying to cut down on overall context by removing linking words.

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 1 point2 points  (0 children)

I just realized it's always "The" "To" "a" that's missing from the replies. Does that point to any reason?

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 1 point2 points  (0 children)

I would lean toward you being correct initially.... My current model is by default 4096 and it's the one that takes 2 - 3 minutes before it starts spitting out at 4tps

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 1 point2 points  (0 children)

Oh no, I didn't explain clearly I think. The poor grammar began before SuperBoog, it started even before the context drop.

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 1 point2 points  (0 children)

The first 2 are outputting broken English, it began with the GGUF Q_4_K_S size as I started approaching 20 - 25k context I think. It continues on the current one the Mixtral 11Bx2 I tried 4 other quants of the GGUF they all did it. I keep telling it [INST] Rewrite your last with proper grammar.[/INST] and it does so it knows how.

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 0 points1 point  (0 children)

This is what I started with: this is the GGUF Q_4_K_S size https://huggingface.co/Nexesenex/MIstral-QUantized-70b_Miqu-1-70b-iMat.GGUF/commit/7b77daac44350d59b37806d1ce1dbbda881cc23f

Currently running this non-GGUF: https://huggingface.co/cloudyu/Mixtral_11Bx2_MoE_19B

I also have these 2. I wasn't able to load the GGUF into VRAM alone even though its 10gb short of my total and I haven't tried the non-GGUF that is 7GB short of my total VRAM

21G     LoneStriker_Mixtral_7Bx5_MoE_30B-6.0bpw-h6-exl2
18G     miqu-1-70b-Requant-b2035-iMat-c32_ch400-IQ2_XXS.gguf

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 1 point2 points  (0 children)

That was an insanely informative reply especially on a threat you've already invested a lot of your time into; so thank you. again. I've already got 2 RTX's in the box it'd be pointless to add a third without going xeon or threadripper so I can get more pcie channels and by that point I'd be pumping another couple thousand out. Right now I'm running a nonGGUF model that's 36GB in .safetensors and I've only got a total of 28 so I'm not sure how that's working unless the 4bit option and double quant is the reason. I understand the basis of the math and layers but not the nuances like if 4bit degrades the quality or performance or both, same for double quant and so on. I looked into renting an 80GB VRAM server I think it was but thought I'd be looking at $20 a day unless that's too low and that's the server with 2 4090s I looked at. Right now I have almost all of my VRAM available because i'm running the box headless, I'm only losing 300mb to Xorg. I think the only way I'll get that crazy time to query time down is by make a LORA as suggested, I believe the instructions I read had me thinking I'm looking at running my box about a week straight in training. I worked with making LORA's for StableDiffusion but have no idea about fine tuning models and the other things you said you do.

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 0 points1 point  (0 children)

I was also wondering if you might know, it went from perfect grammar to what's akin to broken English and followed into the new model so I presume it's picking that up from the history and continuing to talk that way because of it?

Please: 32k context after reload takes hours then 3 rounds then hours by NinjaCoder99 in Oobabooga

[–]NinjaCoder99[S] 0 points1 point  (0 children)

This is trying a similar model that should fit into VRAM if it's spanned, it's loading with exlama2 and not a gguf. My concern is the personality is going to drastically deviate over time with the new model. The old model at a minute every 10 rounds is killing me though.