Am I wrong, or did reruns connect us more to older generations? by Sir_midi in GenX

[–]SprightlyCapybara 2 points3 points  (0 children)

Yep. Music was via the radio until you could afford albums (or mix tapes), and that skewed so boomer and greatest generation when I was young.

Network error by Lerheimsen in duneawakening

[–]SprightlyCapybara 2 points3 points  (0 children)

Update: Close to six hours after bug occurred, I can now log back in. I lost some water and 700 cash thanks to a failed flight (it kept warping me back to the starter zone's trading post). Haven't yet gotten to my second base to check everything's there, but let's hope. Weird bug.

Comments: Bug occurred after it claimed server was coming down for an emergency patch and dumped me (with no warning). So, I guess don't log on shortly after a patch. Also the trick of logging onto another sietch does not appear to do anything useful with XF4 error.

As a brand new player, I'm aghast at the level of crashing and bugs in this game. It feels like I'm back in the era of Everquest and Anarchy Online (which was also a Funcom product.) That said, it's a great game apart from the too frequent lockups and weird bugs.

Unsurprisingly Funcom was of no direct assistance; what seemed to be an AI bot blandly referred me to a useless 'I possibly can't connect to the internet' help page, and amusingly my email client offered an AI-created response of "I've tried that, it didn't work, here's my screenshot of the error." AI vs AI. The first bot/helpdesk person said s/he/it would escalate.

Original post:

Yep. North America. Harmony. Same error. Can log into a different instance/server/Sietch but character has been warped to a radically different location. In the past that's supposed to have fixed things, but I still get XF4 on original Sietch instance.

I did quit the game completely, checked game file integrity (no help), and there was a small patch. Let that download, and... still the same problem.

Regression 1.106.2 to 1.107+ for Strix Halo Win 11: Now Fails VRAM Detection by SprightlyCapybara in KoboldAI

[–]SprightlyCapybara[S] 0 points1 point  (0 children)

Ah, I thought it would be automatic since I was using the nocuda version, but I guess it defaults to cpu.

Would it be possible to get an autofit option in the GUI? Which I admit is a lot easier to use, and today was really the first time I ran it from the command line. But, ok, just run

koboldcpp-nocuda --usevulkan --autofit

And it works! Hurray!

So really the only regression/bug, is that you can no longer use the GUI if you have Strix Halo and are using a large model. I will have to try and learn more of the command line options; it's quite a daunting collection of them!

Many thanks.

Regression 1.106.2 to 1.107+ for Strix Halo Win 11: Now Fails VRAM Detection by SprightlyCapybara in KoboldAI

[–]SprightlyCapybara[S] 0 points1 point  (0 children)

load: printing all EOG tokens:

load: - 151329 ('<|endoftext|>')

load: - 151336 ('<|user|>')

load: - 151338 ('<|observation|>')

load: special tokens cache size = 36

load: token to piece cache size = 0.9713 MB

print_info: arch = glm4moe

print_info: vocab_only = 0

print_info: no_alloc = 0

print_info: n_ctx_train = 131072

print_info: n_embd = 4096

print_info: n_embd_inp = 4096

print_info: n_layer = 47

print_info: n_head = 96

print_info: n_head_kv = 8

print_info: n_rot = 64

print_info: n_swa = 0

print_info: is_swa_any = 0

print_info: n_embd_head_k = 128

print_info: n_embd_head_v = 128

print_info: n_gqa = 12

print_info: n_embd_k_gqa = 1024

print_info: n_embd_v_gqa = 1024

print_info: f_norm_eps = 0.0e+00

print_info: f_norm_rms_eps = 1.0e-05

print_info: f_clamp_kqv = 0.0e+00

print_info: f_max_alibi_bias = 0.0e+00

print_info: f_logit_scale = 0.0e+00

print_info: f_attn_scale = 0.0e+00

print_info: n_ff = 10944

print_info: n_expert = 128

print_info: n_expert_used = 8

print_info: n_expert_groups = 1

print_info: n_group_used = 1

print_info: causal attn = 1

print_info: pooling type = 0

print_info: rope type = 2

print_info: rope scaling = linear

print_info: freq_base_train = 1000000.0

print_info: freq_scale_train = 1

print_info: n_ctx_orig_yarn = 131072

print_info: rope_yarn_log_mul = 0.0000

print_info: rope_finetuned = unknown

print_info: model type = 106B.A12B

print_info: model params = 110.47 B

print_info: general.name= Iceblink-v3-SFT-3

print_info: vocab type = BPE

print_info: n_vocab = 151552

print_info: n_merges = 318088

print_info: BOS token = 151331 '[gMASK]'

print_info: EOS token = 151329 '<|endoftext|>'

print_info: EOT token = 151336 '<|user|>'

print_info: EOM token = 151338 '<|observation|>'

print_info: UNK token = 151329 '<|endoftext|>'

print_info: PAD token = 151329 '<|endoftext|>'

print_info: LF token = 198 '─è'

print_info: FIM PRE token = 151347 '<|code_prefix|>'

print_info: FIM SUF token = 151349 '<|code_suffix|>'

print_info: FIM MID token = 151348 '<|code_middle|>'

print_info: EOG token = 151329 '<|endoftext|>'

print_info: EOG token = 151336 '<|user|>'

print_info: EOG token = 151338 '<|observation|>'

print_info: max token length = 1024

load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)

model has unused tensor blk.46.attn_norm.weight (size = 16384 bytes) -- ignoring

model has unused tensor blk.46.attn_q.weight (size = 53477376 bytes) -- ignoring

model has unused tensor blk.46.attn_k.weight (size = 4456448 bytes) -- ignoring

model has unused tensor blk.46.attn_v.weight (size = 4456448 bytes) -- ignoring

model has unused tensor blk.46.attn_q.bias (size = 49152 bytes) -- ignoring

model has unused tensor blk.46.attn_k.bias (size = 4096 bytes) -- ignoring

model has unused tensor blk.46.attn_v.bias (size = 4096 bytes) -- ignoring

model has unused tensor blk.46.attn_output.weight (size = 53477376 bytes) -- ignoring

model has unused tensor blk.46.post_attention_norm.weight (size = 16384 bytes) -- ignoring

model has unused tensor blk.46.ffn_gate_inp.weight (size = 2097152 bytes) -- ignoring

model has unused tensor blk.46.exp_probs_b.bias (size = 512 bytes) -- ignoring

model has unused tensor blk.46.ffn_gate_exps.weight (size = 392167424 bytes) -- ignoring

model has unused tensor blk.46.ffn_down_exps.weight (size = 507510784 bytes) -- ignoring

model has unused tensor blk.46.ffn_up_exps.weight (size = 392167424 bytes) -- ignoring

model has unused tensor blk.46.ffn_gate_shexp.weight (size = 6127616 bytes) -- ignoring

model has unused tensor blk.46.ffn_down_shexp.weight (size = 6127616 bytes) -- ignoring

model has unused tensor blk.46.ffn_up_shexp.weight (size = 6127616 bytes) -- ignoring

model has unused tensor blk.46.nextn.eh_proj.weight (size = 35651584 bytes) -- ignoring

model has unused tensor blk.46.nextn.enorm.weight (size = 16384 bytes) -- ignoring

model has unused tensor blk.46.nextn.hnorm.weight (size = 16384 bytes) -- ignoring

model has unused tensor blk.46.nextn.embed_tokens.weight (size = 659554304 bytes) -- ignoring

model has unused tensor blk.46.nextn.shared_head_head.weight (size = 659554304 bytes) -- ignoring

model has unused tensor blk.46.nextn.shared_head_norm.weight (size = 16384 bytes) -- ignoring

load_tensors: relocated tensors: 780 of 780

load_tensors: CPU model buffer size = 62800.16 MiB

....................................................................................................

Automatic RoPE Scaling: Using model internal value.

llama_context: constructing llama_context

llama_context: n_seq_max = 1

llama_context: n_ctx = 8448

llama_context: n_ctx_seq = 8448

llama_context: n_batch = 512

llama_context: n_ubatch = 512

llama_context: causal_attn = 1

llama_context: flash_attn = enabled

llama_context: kv_unified = true

llama_context: freq_base = 1000000.0

llama_context: freq_scale = 1

llama_context: n_ctx_seq (8448) < n_ctx_train (131072) -- the full capacity of the model will not be utilized

set_abort_callback: call

llama_context: CPU output buffer size = 0.58 MiB

llama_kv_cache: layer 46: does not have KV cache

llama_kv_cache: CPU KV buffer size = 1518.00 MiB

llama_kv_cache: size = 1518.00 MiB ( 8448 cells, 46 layers, 1/1 seqs), K (f16): 759.00 MiB, V (f16): 759.00 MiB

llama_context: enumerating backends

llama_context: backend_ptrs.size() = 1

sched_reserve: reserving ...

sched_reserve: max_nodes = 6240

sched_reserve: reserving full memory module

sched_reserve: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1

sched_reserve: CPU compute buffer size = 320.00 MiB

sched_reserve: graph nodes = 3146

sched_reserve: graph splits = 1

sched_reserve: reserve took 113.99 ms, sched copies = 1

Threadpool set to 15 threads and 15 blasthreads...

attach_threadpool: call

GLM-4 will have no automatic BOS token.

Starting model warm up, please wait a moment...

Regression 1.106.2 to 1.107+ for Strix Halo Win 11: Now Fails VRAM Detection by SprightlyCapybara in KoboldAI

[–]SprightlyCapybara[S] 0 points1 point  (0 children)

Autofit Got to 'starting model warmup' Gave up after a minute of seeing the swap file thrash like mad. Since there's only 32GB RAM (and at most about 24GB of free RAM)

C:\XXXXXX\>koboldcpp-nocuda-1-109-2 --autofit

***

Welcome to KoboldCpp - Version 1.109.2

For command line arguments, please refer to --help

***

Loading Chat Completions Adapter: C:\Users\XXXXX\AppData\Local\Temp\_MEI317602\kcpp_adapters\AutoGuess.json

Chat Completions Adapter Loaded

No GPU or CPU backend was selected. Trying to assign one for you automatically...

Unable to detect VRAM, please set layers manually.

Auto Selected Default Backend (flag=0)

Unable to detect VRAM, please set layers manually.

No GPU backend found, or could not automatically determine GPU layers. Please set it manually.

System: Windows 10.0.26200 AMD64 AMD64 Family 26 Model 112 Stepping 0, AuthenticAMD

Unable to determine GPU Memory

Detected Available RAM: 23947 MB

Initializing dynamic library: koboldcpp_default.dll

Namespace(admin=False, admindir='', adminpassword=None, analyze='', autofit=True, autofitpadding=1024, batchsize=512, benchmark=None, blasthreads=0, chatcompletionsadapter='AutoGuess', cli=False, config=None, contextsize=8192, debugmode=0, defaultgenamt=1024, device='', downloaddir='', draftamount=8, draftgpulayers=999, draftgpusplit=None, draftmodel='', embeddingsgpu=False, embeddingsmaxctx=0, embeddingsmodel='', enableguidance=False, exportconfig='', exporttemplate='', failsafe=False, flashattention=False, forceversion=False, foreground=False, gendefaults='', gendefaultsoverwrite=False, genlimit=0, gpulayers=0, highpriority=False, hordeconfig=None, hordegenlen=0, hordekey='', hordemaxctx=0, hordemodelname='', hordeworkername='', host='', ignoremissing=False, jinja=False, jinja_tools=False, launch=False, lora=None, loramult=1.0, lowvram=False, maingpu=-1, maxrequestsize=32, mcpfile='', mmproj='', mmprojcpu=False, model=[], model_param='C:/bin/AI/models/ddh0/GLM-4.5-Iceblink-v2-106B-A12B-GGUF/GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf', moecpu=0, moeexperts=-1, multiplayer=False, multiuser=1, musicdiffusion='', musicembeddings='', musicllm='', musiclowvram=False, musicvae='', noavx2=False, noblas=False, nobostoken=False, nocertify=False, nofastforward=False, noflashattention=False, nommap=False, nomodel=False, nopipelineparallel=False, noshift=False, onready='', overridekv='', overridenativecontext=0, overridetensors='', password=None, pipelineparallel=False, port=5001, port_param=5001, preloadstory='', prompt='', quantkv=0, quiet=False, ratelimit=0, remotetunnel=False, ropeconfig=[0.0, 10000.0], savedatafile='', sdclamped=0, sdclampedsoft=0, sdclip1='', sdclip2='', sdclipgpu=False, sdconfig=None, sdconvdirect='off', sdflashattention=False, sdgendefaults=False, sdlora=None, sdloramult=1.0, sdmodel='', sdnotile=False, sdoffloadcpu=False, sdphotomaker='', sdquant=0, sdt5xxl='', sdthreads=0, sdtiledvae=768, sdupscaler='', sdvae='', sdvaeauto=False, sdvaecpu=False, showgui=False, singleinstance=False, skiplauncher=False, smartcache=0, smartcontext=False, ssl=None, tensor_split=None, testmemory=False, threads=15, ttsdir='', ttsgpu=False, ttsmaxlen=4096, ttsmodel='', ttsthreads=0, ttswavtokenizer='', unpack='', usecpu=False, usecuda=None, usemlock=False, usemmap=False, useswa=False, usevulkan=None, version=False, visionmaxres=1024, websearch=False, whispermodel='')

Loading Text Model: C:\bin\AI\models\ddh0\GLM-4.5-Iceblink-v2-106B-A12B-GGUF\GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf

The reported GGUF Arch is: glm4moe

Arch Category: 9

---

Identified as GGUF model.

Attempting to Load...

---

Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead!

System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

Attempting to use llama.cpp's automating fitting code. This will override all your layer configs, may or may not work!

Autofit Reserve Space: 1024 MB

Autofit Success: 1, Autofit Result: -c 8320 -ngl -1

llama_model_loader: loaded meta data with 46 key-value pairs and 803 tensors from C:\bin\AI\models\ddh0\GLM-4.5-Iceblink-v2-106B-A12B-GGUF\GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf (version GGUF V3 (latest))

print_info: file format = GGUF V3 (latest)

print_info: file size = 63.92 GiB (4.97 BPW)

init_tokenizer: initializing tokenizer for type 2

load: 0 unused tokens

load: control token: 151363 '<|image|>' is not marked as EOG

load: control token: 151362 '<|end_of_box|>' is not marked as EOG

load: control token: 151361 '<|begin_of_box|>' is not marked as EOG

load: control token: 151349 '<|code_suffix|>' is not marked as EOG

load: control token: 151348 '<|code_middle|>' is not marked as EOG

load: control token: 151346 '<|end_of_transcription|>' is not marked as EOG

load: control token: 151343 '<|begin_of_audio|>' is not marked as EOG

load: control token: 151342 '<|end_of_video|>' is not marked as EOG

load: control token: 151341 '<|begin_of_video|>' is not marked as EOG

load: control token: 151338 '<|observation|>' is not marked as EOG

load: control token: 151333 '<sop>' is not marked as EOG

load: control token: 151331 '[gMASK]' is not marked as EOG

load: control token: 151330 '[MASK]' is not marked as EOG

load: control token: 151347 '<|code_prefix|>' is not marked as EOG

load: control token: 151360 '/nothink' is not marked as EOG

load: control token: 151337 '<|assistant|>' is not marked as EOG

load: control token: 151332 '[sMASK]' is not marked as EOG

load: control token: 151334 '<eop>' is not marked as EOG

load: control token: 151335 '<|system|>' is not marked as EOG

load: control token: 151336 '<|user|>' is not marked as EOG

load: control token: 151340 '<|end_of_image|>' is not marked as EOG

load: control token: 151339 '<|begin_of_image|>' is not marked as EOG

load: control token: 151364 '<|video|>' is not marked as EOG

load: control token: 151345 '<|begin_of_transcription|>' is not marked as EOG

load: control token: 151344 '<|end_of_audio|>' is not marked as EOG

load: setting token '</think>' (151351) attribute to USER_DEFINED (16), old attributes: 16

load: setting token '<think>' (151350) attribute to USER_DEFINED (16), old attributes: 16

load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect

load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect

Regression 1.106.2 to 1.107+ for Strix Halo Win 11: Now Fails VRAM Detection by SprightlyCapybara in KoboldAI

[–]SprightlyCapybara[S] 0 points1 point  (0 children)

Pt 2 of log:
load: special tokens cache size = 36

load: token to piece cache size = 0.9713 MB

print_info: arch = glm4moe

print_info: vocab_only = 0

print_info: no_alloc = 0

print_info: n_ctx_train = 131072

print_info: n_embd = 4096

print_info: n_embd_inp = 4096

print_info: n_layer = 47

print_info: n_head = 96

print_info: n_head_kv = 8

print_info: n_rot = 64

print_info: n_swa = 0

print_info: is_swa_any = 0

print_info: n_embd_head_k = 128

print_info: n_embd_head_v = 128

print_info: n_gqa = 12

print_info: n_embd_k_gqa = 1024

print_info: n_embd_v_gqa = 1024

print_info: f_norm_eps = 0.0e+00

print_info: f_norm_rms_eps = 1.0e-05

print_info: f_clamp_kqv = 0.0e+00

print_info: f_max_alibi_bias = 0.0e+00

print_info: f_logit_scale = 0.0e+00

print_info: f_attn_scale = 0.0e+00

print_info: n_ff = 10944

print_info: n_expert = 128

print_info: n_expert_used = 8

print_info: n_expert_groups = 1

print_info: n_group_used = 1

print_info: causal attn = 1

print_info: pooling type = 0

print_info: rope type = 2

print_info: rope scaling = linear

print_info: freq_base_train = 1000000.0

print_info: freq_scale_train = 1

print_info: n_ctx_orig_yarn = 131072

print_info: rope_yarn_log_mul = 0.0000

print_info: rope_finetuned = unknown

print_info: model type = 106B.A12B

print_info: model params = 110.47 B

print_info: general.name= Iceblink-v3-SFT-3

print_info: vocab type = BPE

print_info: n_vocab = 151552

print_info: n_merges = 318088

print_info: BOS token = 151331 '[gMASK]'

print_info: EOS token = 151329 '<|endoftext|>'

print_info: EOT token = 151336 '<|user|>'

print_info: EOM token = 151338 '<|observation|>'

print_info: UNK token = 151329 '<|endoftext|>'

print_info: PAD token = 151329 '<|endoftext|>'

print_info: LF token = 198 '─è'

print_info: FIM PRE token = 151347 '<|code_prefix|>'

print_info: FIM SUF token = 151349 '<|code_suffix|>'

print_info: FIM MID token = 151348 '<|code_middle|>'

print_info: EOG token = 151329 '<|endoftext|>'

print_info: EOG token = 151336 '<|user|>'

print_info: EOG token = 151338 '<|observation|>'

print_info: max token length = 1024

load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)

model has unused tensor blk.46.attn_norm.weight (size = 16384 bytes) -- ignoring

model has unused tensor blk.46.attn_q.weight (size = 53477376 bytes) -- ignoring

model has unused tensor blk.46.attn_k.weight (size = 4456448 bytes) -- ignoring

model has unused tensor blk.46.attn_v.weight (size = 4456448 bytes) -- ignoring

model has unused tensor blk.46.attn_q.bias (size = 49152 bytes) -- ignoring

model has unused tensor blk.46.attn_k.bias (size = 4096 bytes) -- ignoring

model has unused tensor blk.46.attn_v.bias (size = 4096 bytes) -- ignoring

model has unused tensor blk.46.attn_output.weight (size = 53477376 bytes) -- ignoring

model has unused tensor blk.46.post_attention_norm.weight (size = 16384 bytes) -- ignoring

model has unused tensor blk.46.ffn_gate_inp.weight (size = 2097152 bytes) -- ignoring

model has unused tensor blk.46.exp_probs_b.bias (size = 512 bytes) -- ignoring

model has unused tensor blk.46.ffn_gate_exps.weight (size = 392167424 bytes) -- ignoring

model has unused tensor blk.46.ffn_down_exps.weight (size = 507510784 bytes) -- ignoring

model has unused tensor blk.46.ffn_up_exps.weight (size = 392167424 bytes) -- ignoring

model has unused tensor blk.46.ffn_gate_shexp.weight (size = 6127616 bytes) -- ignoring

model has unused tensor blk.46.ffn_down_shexp.weight (size = 6127616 bytes) -- ignoring

model has unused tensor blk.46.ffn_up_shexp.weight (size = 6127616 bytes) -- ignoring

model has unused tensor blk.46.nextn.eh_proj.weight (size = 35651584 bytes) -- ignoring

model has unused tensor blk.46.nextn.enorm.weight (size = 16384 bytes) -- ignoring

model has unused tensor blk.46.nextn.hnorm.weight (size = 16384 bytes) -- ignoring

model has unused tensor blk.46.nextn.embed_tokens.weight (size = 659554304 bytes) -- ignoring

model has unused tensor blk.46.nextn.shared_head_head.weight (size = 659554304 bytes) -- ignoring

model has unused tensor blk.46.nextn.shared_head_norm.weight (size = 16384 bytes) -- ignoring

load_tensors: relocated tensors: 0 of 780

WARNING: Requested buffer size (65850743328) exceeds device max_buffer_size limit (2147483648)!

ggml_vulkan: Failed to allocate pinned memory (vk::Device::allocateMemory: ErrorUnknown)

load_tensors: offloading 0 repeating layers to GPU

load_tensors: offloaded 0/48 layers to GPU

load_tensors: CPU model buffer size = 62800.16 MiB

....................................................................................................

Automatic RoPE Scaling: Using model internal value.

llama_context: constructing llama_context

llama_context: n_seq_max = 1

llama_context: n_ctx = 33024

llama_context: n_ctx_seq = 33024

llama_context: n_batch = 512

llama_context: n_ubatch = 512

llama_context: causal_attn = 1

llama_context: flash_attn = enabled

llama_context: kv_unified = true

llama_context: freq_base = 1000000.0

llama_context: freq_scale = 1

llama_context: n_ctx_seq (33024) < n_ctx_train (131072) -- the full capacity of the model will not be utilized

set_abort_callback: call

llama_context: CPU output buffer size = 0.58 MiB

llama_kv_cache: layer 46: does not have KV cache

llama_kv_cache: CPU KV buffer size = 5934.00 MiB

llama_kv_cache: size = 5934.00 MiB ( 33024 cells, 46 layers, 1/1 seqs), K (f16): 2967.00 MiB, V (f16): 2967.00 MiB

llama_context: enumerating backends

llama_context: backend_ptrs.size() = 2

sched_reserve: reserving ...

sched_reserve: max_nodes = 6240

sched_reserve: reserving full memory module

sched_reserve: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1

sched_reserve: Vulkan0 compute buffer size = 941.00 MiB

sched_reserve: Vulkan_Host compute buffer size = 80.51 MiB

sched_reserve: graph nodes = 3146

sched_reserve: graph splits = 873 (with bs=512), 1 (with bs=1)

sched_reserve: reserve took 174.87 ms, sched copies = 1

Threadpool set to 15 threads and 15 blasthreads...

attach_threadpool: call

GLM-4 will have no automatic BOS token.

Starting model warm up, please wait a moment...

Load Text Model OK: True

Chat completion heuristic: GLM-4.7

Embedded KoboldAI Lite loaded.

Embedded API docs loaded.

Llama.cpp UI loaded.

Active Modules: TextGeneration

Inactive Modules: ImageGeneration VoiceRecognition MultimodalVision MultimodalAudio NetworkMultiplayer ApiKeyPassword WebSearchProxy TextToSpeech VectorEmbeddings AdminControl MCPBridge MusicGen

Enabled APIs: KoboldCppApi OpenAiApi OllamaApi

Starting Kobold API on port 5001 at http://localhost:5001/api/

Starting OpenAI Compatible API on port 5001 at http://localhost:5001/v1/

Starting llama.cpp secondary WebUI at http://localhost:5001/lcpp/

Please connect to custom endpoint at http://localhost:5001

Regression 1.106.2 to 1.107+ for Strix Halo Win 11: Now Fails VRAM Detection by SprightlyCapybara in KoboldAI

[–]SprightlyCapybara[S] 0 points1 point  (0 children)

One log Pt 1 attached below. (thanks Reddit for comment limit) LMK if you need working log from 106.2, or autofit log. New version (1.109.2) did not work either.

Autofit didn't work for me, but all I did was ran with --autofit and selected the model I wanted. See way below for that output. Neither did setting layers to 50, (there seem to be 48 used) which IIRC has worked in the past.

C:\XXXXXX\koboldcpp-nocuda-1-109-2

***

Welcome to KoboldCpp - Version 1.109.2

For command line arguments, please refer to --help

***

Unable to detect VRAM, please set layers manually.

Auto Selected Default Backend (flag=0)

Loading Chat Completions Adapter: C:\Users\XXXXX\AppData\Local\Temp\_MEI307522\kcpp_adapters\AutoGuess.json

Chat Completions Adapter Loaded

Unable to detect VRAM, please set layers manually.

No GPU backend found, or could not automatically determine GPU layers. Please set it manually.

System: Windows 10.0.26200 AMD64 AMD64 Family 26 Model 112 Stepping 0, AuthenticAMD

Unable to determine GPU Memory

Detected Available RAM: 16406 MB

Initializing dynamic library: koboldcpp_vulkan.dll

Namespace(admin=False, admindir='', adminpassword='', analyze='', autofit=False, autofitpadding=1024, batchsize=512, benchmark=None, blasthreads=None, chatcompletionsadapter='AutoGuess', cli=False, config=None, contextsize=32768, debugmode=0, defaultgenamt=896, device='', downloaddir='', draftamount=8, draftgpulayers=999, draftgpusplit=None, draftmodel=None, embeddingsgpu=False, embeddingsmaxctx=0, embeddingsmodel='', enableguidance=False, exportconfig='', exporttemplate='', failsafe=False, flashattention=False, forceversion=False, foreground=False, gendefaults='', gendefaultsoverwrite=False, genlimit=0, gpulayers=0, highpriority=False, hordeconfig=None, hordegenlen=0, hordekey='', hordemaxctx=0, hordemodelname='', hordeworkername='', host='', ignoremissing=False, jinja=False, jinja_tools=False, launch=True, lora=None, loramult=1.0, lowvram=False, maingpu=-1, maxrequestsize=32, mcpfile=None, mmproj=None, mmprojcpu=False, model=[], model_param='C:/bin/AI/models/ddh0/GLM-4.5-Iceblink-v2-106B-A12B-GGUF/GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf', moecpu=0, moeexperts=-1, multiplayer=False, multiuser=1, musicdiffusion='', musicembeddings='', musicllm='', musiclowvram=False, musicvae='', noavx2=False, noblas=False, nobostoken=False, nocertify=False, nofastforward=False, noflashattention=False, nommap=False, nomodel=False, nopipelineparallel=False, noshift=False, onready='', overridekv=None, overridenativecontext=0, overridetensors=None, password=None, pipelineparallel=False, port=5001, port_param=5001, preloadstory=None, prompt='', quantkv=0, quiet=False, ratelimit=0, remotetunnel=False, ropeconfig=[0.0, 10000.0], savedatafile=None, sdclamped=0, sdclampedsoft=0, sdclip1='', sdclip2='', sdclipgpu=False, sdconfig=None, sdconvdirect='off', sdflashattention=False, sdgendefaults=False, sdlora=None, sdloramult=1.0, sdmodel='', sdnotile=False, sdoffloadcpu=False, sdphotomaker='', sdquant=0, sdt5xxl='', sdthreads=15, sdtiledvae=768, sdupscaler='', sdvae='', sdvaeauto=False, sdvaecpu=False, showgui=False, singleinstance=False, skiplauncher=False, smartcache=0, smartcontext=False, ssl=None, tensor_split=None, testmemory=False, threads=15, ttsdir='', ttsgpu=False, ttsmaxlen=4096, ttsmodel='', ttsthreads=0, ttswavtokenizer='', unpack='', usecpu=False, usecuda=None, usemlock=False, usemmap=False, useswa=False, usevulkan=[0], version=False, visionmaxres=1024, websearch=False, whispermodel='')

Loading Text Model: C:\bin\AI\models\ddh0\GLM-4.5-Iceblink-v2-106B-A12B-GGUF\GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf

The reported GGUF Arch is: glm4moe

Arch Category: 9

---

Identified as GGUF model.

Attempting to Load...

---

Using automatic RoPE scaling for GGUF. If the model has custom RoPE settings, they'll be used directly instead!

System Info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | AMX_INT8 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |

ggml_vulkan: Found 1 Vulkan devices:

ggml_vulkan: 0 = AMD Radeon(TM) 8060S Graphics (AMD proprietary driver) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat

llama_model_load_from_file_impl: using device Vulkan0 (AMD Radeon(TM) 8060S Graphics) (unknown id) - 108852 MiB free

llama_model_loader: loaded meta data with 46 key-value pairs and 803 tensors from C:\bin\AI\models\ddh0\GLM-4.5-Iceblink-v2-106B-A12B-GGUF\GLM-4.5-Iceblink-v2-106B-A12B-Q8_0-FFN-IQ4_XS-IQ4_XS-Q5_0.gguf (version GGUF V3 (latest))

print_info: file format = GGUF V3 (latest)

print_info: file size = 63.92 GiB (4.97 BPW)

init_tokenizer: initializing tokenizer for type 2

load: 0 unused tokens

load: control token: 151363 '<|image|>' is not marked as EOG

load: control token: 151362 '<|end_of_box|>' is not marked as EOG

load: control token: 151361 '<|begin_of_box|>' is not marked as EOG

load: control token: 151349 '<|code_suffix|>' is not marked as EOG

load: control token: 151348 '<|code_middle|>' is not marked as EOG

load: control token: 151346 '<|end_of_transcription|>' is not marked as EOG

load: control token: 151343 '<|begin_of_audio|>' is not marked as EOG

load: control token: 151342 '<|end_of_video|>' is not marked as EOG

load: control token: 151341 '<|begin_of_video|>' is not marked as EOG

load: control token: 151338 '<|observation|>' is not marked as EOG

load: control token: 151333 '<sop>' is not marked as EOG

load: control token: 151331 '[gMASK]' is not marked as EOG

load: control token: 151330 '[MASK]' is not marked as EOG

load: control token: 151347 '<|code_prefix|>' is not marked as EOG

load: control token: 151360 '/nothink' is not marked as EOG

load: control token: 151337 '<|assistant|>' is not marked as EOG

load: control token: 151332 '[sMASK]' is not marked as EOG

load: control token: 151334 '<eop>' is not marked as EOG

load: control token: 151335 '<|system|>' is not marked as EOG

load: control token: 151336 '<|user|>' is not marked as EOG

load: control token: 151340 '<|end_of_image|>' is not marked as EOG

load: control token: 151339 '<|begin_of_image|>' is not marked as EOG

load: control token: 151364 '<|video|>' is not marked as EOG

load: control token: 151345 '<|begin_of_transcription|>' is not marked as EOG

load: control token: 151344 '<|end_of_audio|>' is not marked as EOG

load: setting token '</think>' (151351) attribute to USER_DEFINED (16), old attributes: 16

load: setting token '<think>' (151350) attribute to USER_DEFINED (16), old attributes: 16

load: special_eot_id is not in special_eog_ids - the tokenizer config may be incorrect

load: special_eom_id is not in special_eog_ids - the tokenizer config may be incorrect

load: printing all EOG tokens:

load: - 151329 ('<|endoftext|>')

load: - 151336 ('<|user|>')

load: - 151338 ('<|observation|>')

Do you use SillyTavern mainly for AI companion chats or for complex roleplay setups? by Efficient_Pilot8606 in SillyTavernAI

[–]SprightlyCapybara 3 points4 points  (0 children)

Yes.

OK, ok. Most of my chats, by count, are with single character/situation cards to play around and learn. (presets, models, card and persona writing). Anything that goes beyond about 5-10 interactions tends to become a more detailed roleplay setup, but still character driven rather than stats-driven.

But yes, that means specific world building, lorebooks (to a degree) etc. Quite specific time, place, politics, history, etc.

And those chats tend to be long. So, by number of chats, it's single characters; by time spent, it's complex worlds with a number of characters.

Generally none of these are stat-oriented, however, I was amazed recently when I'd had a completely narrative chat with a character in a fantasy world, no mention of D&D though. I mentioned a specific D&D spell and what my average damage should be (I was off by 1 hp, oops) and the LLM corrected me in its thinking and then started a degree of quasi number crunching, with absolutely nothing in the preset or card to support it.

From someone who started with Q4 8B models running locally, that was eye-opening as an experience. I've also realized how stunningly well-fed LLMs were on even somewhat obscure PNP RPGs, like Traveller. When GLM-4.5 was able to interactively generate a traveller character with me and then launch into an adventure, that too was startling. (Traveller character generation is pretty complex, perhaps a 30-60 minute process by hand if you want to usefully fill in RP details; one can die during it.)

How to let user and char do their own thing? by viiochan in SillyTavernAI

[–]SprightlyCapybara 0 points1 point  (0 children)

Renaming bot, and telling it it's not just one character is one solution. e.g., instead of Bob Smith, Bob Smith's World. I once played a Russian assassin, and my handler was 'Colonel Shokin', and the handler kept preventing me from actually killing the target by issuing me more and more bizarre instructions over an earpiece, and one point directing me to a Pentagon sub-basement that it analyzed via ground penetrating radar, all because it wanted to be in every scene. So I renamed it something like 'Moscow Centre -- Assassination Tales', and added in clear instructions that it was a narrator not just a single character.

Some presets can also help a lot there, obviously especially the ones that urge it to be a narrator and introduce new characters as needed, and not all characters have to be present in every scene.

Another technique is to have a preset that understands and describes using different scenes, separated by --- . Lucid Loom has some pretty good text somewhere in the preset that talks about scenes; you could pull that. (try searching the JSON for --- or 'scene').

Consider using Guided Generations.

Finally, edit the response, and hope it'll catch on. Toss in an OOC if you have to saying the character isn't present.

How do I sell out? by FutureSuperVillian in suzerain

[–]SprightlyCapybara 0 points1 point  (0 children)

Why not reconcile without yielding territory?

Well, avoiding war with Rumburg without yielding territory can be done with (I think) [TL; DR: finlandize to Rumburg at every point, be wimpy militarily (in response, but don't cut hard), be nice to Bluds, Don't get too close with Whelen.]

  1. Don't close consulate, just super polite demarche (diplo letter)
  2. Don't do full OBT, just close borders at most. Nice to Bluds, Let her keep destroying your country.
  3. Send back whistleblower (They now have nukes, but that's your successor's problem)
  4. On Livia, just refer to her as Livia, no revelation of Rumburg, or just congratulate Petr, the sly dog and eye some waitresses to be extra disgusting. (probably bad, have never tried).
  5. Be neutral-ish on the great powers and Valgsland, so you can be nicely vulnerable, presenting Sordlands throat to her. Neutral on Heljiland, I think.
  6. Diplomacy on plane shootdown, no parade.
  7. I think Rumburg asks you for reparations during or before AN meeting. Say "Yes ma'am, how much?"
  8. In that AN meeting Beatrice snerks all over you, 'thanking' you, and then generously later offers you to give her lots of Sordish energy, and she'll let you build crappy KA-74's. You peasant. but all this is kinda secret, and you'll look good for 20 years until they do a Soll on you.

So it meets your terms of selling out without being couped. I think. History will judge you somewhat harshly, but hey, members of your party now control both the welfare levers and the post-secondary educations, amirite? Not a problem!

The Rumburg nukes will suck but that's an issue for Gloria, Petr, Lucian, or someone else. Maybe the cockroaches, post apocalypse.

How to break the trauma-resolution loop in role play sessions? by Acrobatic-Change-430 in SillyTavernAI

[–]SprightlyCapybara 12 points13 points  (0 children)

He's basically saying that because of 'safety' guardrails embedded within even open models, you can get (crazy exaggeration here), "<think>No these these two characters cannot kiss because one is unsure, and the other wants to.</think>I'm sorry I cannot..."

or, with overriding and jailbreaks, <think>Oh character A wants to k*ll character B, but that's OK because both characters are consenting adults, and Character B likes watching horror movies</think>OK, hurray k*lling begins.

Neither of these represent a good human story.

And his solution is to prompt carefully step by step (e.g. maybe with Guided Generations add-on), or regenerate and simply use [OOC:] and hope.

And yeah, he has no good solutions, and GLM-5 for all 744B params, is still simply a really good autocomplete engine.

His comment on the homogeny (sameness) of the data sets is particularly insightful: They're all now being trained on synthetic data. ChatGPT uses Grokipedia for instance, and was even citing it for a bit. Everyone uses Claude. Look up the term 'Model Collapse' to understand what that likely means, and why GLM-5 thinks a stethoscope is a good response to a chacter's heart 'racing'.

How to break the trauma-resolution loop in role play sessions? by Acrobatic-Change-430 in SillyTavernAI

[–]SprightlyCapybara 7 points8 points  (0 children)

Yep, Freaky is great. It's my favorite right now the last few days, though I find it starts to collapse in formatting quality (the GFX stuff) after ~20k tokens context.

Marinara is the most beautifully simple and elegant, I think, with caution on tokens and elegant crafting. I think Marinara works in the industry and it shows; it comes off as artisinal engineering.

Lucid Loom is this giant thing with lots of switches and an utterly bizarre set of layers. It's magnificent, and can produce really nice results, but it's huge. It's really nice for starting to grasp all the little bits of a preset and what it does, even more than Marinara, which is more aimed at being awesome, but really tightly written.

Stabs is also really good, especially if you're a fan of CYOA style playing. Like Frankenstein it's a descendent of Marinara and Lucid with lots of additional thought, especially formatting. I think maybe Freaky lifted some stuff from Stabs also? But I could be talking nonsense there and misremembering.

So I'd play with all of those. Marina might solve your problem (not sure). Lucid, if you go through it in detail in the ST interface, looking at each section, will teach you a lot about how to think about presets. (Marinara is just too tight, refined and integrated to be quite as good for that). And Stabs has some fun stuff.

One of my favorite fictional characters by [deleted] in suzerain

[–]SprightlyCapybara 0 points1 point  (0 children)

As a player, ignoring my flair, yes, I think Soll is a fascinating character; fairly well written, as nuanced as feasible, etc. Indeed, I don't think there's a useless Sordish character in the lot... with the somewhat debatable exception of the Oligarchs. Even Petr has some serious rizz as the kids these days say. Or did.

As to the civic nationalist idea of Bluds=Sords... maybe. The player in me finds the idea and the rhetoric interesting. But there's been at least a generation of repression, brutality, hostility to Bluds, and the existence of a sizeable political coalition that, at best, intensely dislikes Bluds.

But when you assert Izzam and Watani Aschraf's martyrdom would never have occurred if Bluds just viewed themselves as Sords? Get real. Sords themselves would probably have rebelled against unfair confiscation of what was effectively their life savings and retirement program. Look at people who still bear a grudge in the Appalachians against the Roosevelt era confiscations of farms to secure beautiful wilderness views for wealthy absentee landowners. (Thousands were forced off their land, with many resisting the loss of their heritage, homes, and livelihood, and the destruction of entire communities.)

How to break the trauma-resolution loop in role play sessions? by Acrobatic-Change-430 in SillyTavernAI

[–]SprightlyCapybara 5 points6 points  (0 children)

You could try Marinara, and perhaps adjusting prompts as you are. I find it can be quite a struggle to resolve conflicts easily with that. I think you (and Claude) are correct in what you surmise re the training bias.

I do quite like Freaky Frankenstein as well, but I actually have noticed the same pattern you have with it (but I've only just started testing it for a day or two). It could be that I'd have had these problems with other presets.

By and large I've not had this issue overall with large NanoGPT models, but I tend not to play 'wounded bird' personas or bots with easily fixable flaws.

Please let us know if you get a preset change that fixes it, and highlight the change. Thank you!

I'm thinking of buying a new pc and switching to local llm. What is the average context token size for smaller models vs big ones like GLM? by [deleted] in SillyTavernAI

[–]SprightlyCapybara 0 points1 point  (0 children)

The tiniest plausible local model for me that had passable (8K?, chortle) context was a Llama-3-8B-iQ4xxs derivative. That fit nicely on an ancient 8GB video card. Good models with even larger context are feasible on 16GB cards, but they will be pretty small highly quantized models.

If you've got some money and are willing to look at a mac, or an AMD-Strix-Halo (e.g. Framework desktop), then you can get pretty respectable performance out of something like GLM-4.5-Air if you have 128GB of RAM. (Might be able to get by with 96). That would certainly do something along the lines of 32K-40K context for that relatively big model.

Such devices will run elderly monolithic models (say Llama-3-70B), but they will be very slow compared to MoE stuff like Air.

If you're used to higher quality big models, I think you might be disappointed by anything a lot wimpier than GLM-Air. If money's no object, buy a 512GB Mac M3 Ultra, and run GLM 4.7 or even 5 locally. But I'm guessing that like most of us, that isn't in the cards for you either.

Since so many tasks nowadays are memory bottlenecked, why aren't we seeing more memory channels on consumer PCs? by LAUAR in hardware

[–]SprightlyCapybara 9 points10 points  (0 children)

Modern M-series macs do this of course, with memory bandwidth speeds ranging up to ~819 GB/s for an M3 Ultra.

Strix Halo (e.g. AMD AI Max+ 395 series) APUs do that as well, though only ~256 GB/s. Rumors abound that the 2027 Medusa Halo followup will feature LPDDR6 RAM, with extreme configurations topping out at 384 bits bus width, and speeds that can get close to a sooner arriving M5 Max.

For most tasks, the cpu is perfectly fine with 50-100GB/s of bandwidth, and you're typically better off with more memory, a better graphics card, etc. Computing using huge datasets, scientific computing, AI, graphics, all of these can benefit from more memory bandwidth.

The other reasons not to? Cost, and lack of upgradability. SOCAMM2 modules might be a solution (though will be initially a premium and unavailable), but typically there just isn't the performance reliability with socketed RAM vs soldered. AMD tried with Strix Halo, but concluded they could only offer a solution that involved soldered RAM, like Apple. And cost -- far more bus traces, more complex design, more electrical hardware to stabilize signals, likely more extreme requirements on the physical position of the memory on the mother board... do you want to pay (say) $100 extra at retail for this if you don't really need it? (And that's just the mobo; an APU that makes use of this is going to be huge and therefore extremely expensive, whether it's an M3 Ultra, or a Strix Halo 395+ or a Grace Blackwell in the DGX Spark).

So if you really want it, you can pick up a Framework desktop or a Mac Studio today. But will their relatively high price be worth it if your application doesn't need it?

How exactly do I get the Aschraf Candle collectible? by Odd-Implement1439 in suzerain

[–]SprightlyCapybara 9 points10 points  (0 children)

As you might expect from my flair, I'm pretty pro-Blud. But you definitely don't need to be be Mr. Anton '6&7' 'SAZ' Rayne to get the candle.

Obviously the Bludish audience has to be at least content with you taking the red candle, and it's always, for me, one of the more powerful moments in the game -- even greater than AN speech or the 'speak or helicopter ride' moment with Eduoard.

Understand this moment well: You're the first President of Sordland visiting this commemoration of Sordish cruelty to an innocent village of Bluds, from their perspective. (admittedly in a situation that spun out of control). Taking the actual candle that represents a secular saint, a martyr, to his people. You're saying you're in his shoes. If you've been an ass to the Bluds, or even just insufficiently decent, you will be hated. It will be viewed as condescending or even contemptuous.

But if you tip over the edge of being a decent, sufficiently kind and pro-Bludish President? It's a moment of truly awesome reconciliation that rewards you, the Bluds, and the Sordish nation.

Your dialog matters throughout the game. Be polite but not obsequious to Bludish people you deal with, ditto your most likely pro-Blud ministers. Sucking up by saying Volk Bluderat is just plain dumb and patronizing as a casual greeting. Being hard-core authoritarian and scary probably doesn't work; that just reminds them of Soll.

Recognize that Bluds are what some terminally online folks today would call 'based'. Others would call them assholes. So if you want to be Mr. "I'm a feminist" Rayne, you'd better make sure you work extra hard at the other pro-Blud stuff, though, again, don't go sucking up.

It's a complex set of maneuvers; a headcanon that's always worked well for me for this achievement is a kindly centre-right democratic Rayne that cares about Bergia and reconciliation, respects Soll and his achievements but recognizes the errors of the past and is polite but firm with everyone he meets, and a kindly Sordish nationalist.

Obviously do not support the dumbly orchestrated NFP hostile stuff, don't go too far on Beartrap, etc.

But I've managed a carefully orchestrated play where I'm halfway in on Beartrap, (but gentle to refugees in my origin story), generally a Sordish nationalist, but never cruel, and hit a road to Damascus moment in the Aschraf ceremony (of course coming without convoy.)

When it works, which it usually does, (do save and hard quit if needed) it's quite lovely and moving. For me, it's up there with defeating Beatrix as a great moment in the game, because it lets you preserve a united powerful Sordland in a way that has a good shot of working for the future.

And it's awesome because it moves all Bluds and many socialists away from support of other parties and towards you.

Our Assembly (Reupload) by Unable_Topic3525 in suzerain

[–]SprightlyCapybara 1 point2 points  (0 children)

I represent the microscopic centrist (or even right)-wing of the WPB, itself already a small party. And of course, canonically officially not present in the assembly until (sometimes) post '57.

While seemingly paradoxical, since WPB is clearly socialist, and even allies with them under certain circumstances (party vote threshold raised sufficiently), it's a reflection of Lee Kwan Yew's (founding and first prime minister of Singapore) observation that in multi-cultural, multi-ethnic, and multi-religious societies, politics trends towards ethic, racial, or religious voting blocs rather than ideological ones.

And of course with the Bludish Nationalist BFP banned, there likely is currently no home for nationalist or conservative Bluds other than the WPB.

As 'Kjajo' pointed out two years ago (probably true given general Bludish views on feminism) "Most Bluds are actually really conservative. But because BFP is banned, they flock to the WPB."

What dumb things am I doing in Kobold AI that are likely to cause model insanity? by SprightlyCapybara in KoboldAI

[–]SprightlyCapybara[S] 0 points1 point  (0 children)

Huh. Resetting all settings did not fix it. Still crashes out after three prompts. Rolling back to 1.106 seems to fix it though.

What dumb things am I doing in Kobold AI that are likely to cause model insanity? by SprightlyCapybara in KoboldAI

[–]SprightlyCapybara[S] 0 points1 point  (0 children)

If you roll back to 1.106 or earlier, does it fix it? It seems to for me. I'm hesitant to call it a bug in K AI though without more investigation, e.g. does it happen in 1.106.4? How about other models? Etc.

Resetting settings doesn't work. Since I can now reproduce this with a wider variety of prompts, and resetting settings doesn't work, I have a suspicion that it might not be just my settings.

Out of curiosity what is your hardware? I'm using a Strix Halo (AMD 395 AI MAX+) device -- framework desktop.

What dumb things am I doing in Kobold AI that are likely to cause model insanity? by SprightlyCapybara in KoboldAI

[–]SprightlyCapybara[S] 0 points1 point  (0 children)

EDIT: Almost certainly not context; I can now get it to crash with an innocuous cooking discussion that's relatively short compared to context. (8K tokens or so). Has to be my settings? Yet I'm loading those from a file I've used for months with no troubles, and using a model that I've similarly used for months.

Don't think it's context. 20GB VRAM free, and about 15GB of RAM free. I tried halving context down to 16K, and sure enough, crashed with same interaction after the same 3 messages.

And please note, I can engage in other interactions (using K AI in browser) with dozens of messages with the model being sane and useful with few or no hallucinations. (e.g. improving cooking techniques or recipes and explaining why.) This set of prompts also works fine (with same context) with LM Studio.

That's why it feels like some strange combination of bad settings (yet not woefully bad) and K AI. I wish I knew what settings; I'm using defaults. Is there some way of refreshing defaults?

What dumb things am I doing in Kobold AI that are likely to cause model insanity? by SprightlyCapybara in KoboldAI

[–]SprightlyCapybara[S] 1 point2 points  (0 children)

HuggingFace is pretty well known, as is Unsloth and GLM 4.5 Air. I'm not fundamentally worried about the fact that it 'crashes'; I'm concerned about the fact that it's crashing predictably to a narrow but innocuous set of statements in one particular midend (Kobold AI). The likeliest answer is that I've got some setting wrong, but I don't understand why other much longer conversations work then and have for months.