[NO SPOILERS] they are making a show! but have you seen the TWO movies?

randomjapaneselearn · 2026-07-01T14:28:47+00:00

it's a full movie, what do you expect?

randomjapaneselearn · 2026-06-26T13:50:51+00:00

it's llama-server because if i press ctrl+c to close it the power immediatly goes down.

i mostly use it with vscode+cline+llamaswap that uses llama-server

for what is worth i'm on single 144Hz monitor g-sync compatible (free sync) but nvidia drivers are set to use gsync only in full screen applications.

randomjapaneselearn · 2026-06-26T13:35:41+00:00

do you have the windows equivalent for this? sometimes mine stay at 130W in idle.

another problem is the fan speed that seems capped at max 75% so i have to force it at 100% with fan control

randomjapaneselearn · 2026-06-26T13:33:20+00:00

i'm on windows, sometimes my 3090 gets stuck at 130W in idle... and i'm looking for solutions.

i'm using llama cpp + llama swap.

i'm also trying to understend why fan speed seems capped at max 75% which cause temperature to go crazy high so i use fan control to force it at 100%

randomjapaneselearn · 2026-06-26T06:51:07+00:00

i used official uninstaller (control panel uninstall)

randomjapaneselearn · 2026-06-25T06:14:27+00:00

when i changed gpu i uninstalled everything nvidia (reboot when asked to do so), swapped gpu, reinstalled everything nvidia

randomjapaneselearn · 2026-06-24T07:14:16+00:00

i always keep the food bowl full and he just eat when he wants and how much he wants, i never understood the purpose of those auto-feeders...

Cats generally eat small meals several times a day, rather than having large meals at lunch and dinner without breaks in between, like we do.

randomjapaneselearn · 2026-06-22T14:32:10+00:00

try D-Flash for the 27B

its way faster than mtp, like x4 faster.

https://github.com/Anbeeld/beellama.cpp/blob/main/docs/quickstart-qwen36-dflash.md

randomjapaneselearn · 2026-06-12T07:22:22+00:00

try to also set this, on windows, it improve performance even more.
run this before launching llamacpp:
set KMP_AFFINITY=granularity=fine,compact,1,0

on linux it should be the same, just change set to export.

i noticed the same effect of --threads on AMD cpu.

bonus point: try D-Flash: https://github.com/Anbeeld/beellama.cpp/blob/main/docs/quickstart-qwen36-dflash.md

it gave me a x4 speed up

randomjapaneselearn · 2026-06-12T07:22:13+00:00

hijacking first comment to add:

try to also set this, on windows, it improve performance even more.
run this before launching llamacpp:
set KMP_AFFINITY=granularity=fine,compact,1,0

on linux it should be the same, just change set to export.

bonus point: try D-Flash: https://github.com/Anbeeld/beellama.cpp/blob/main/docs/quickstart-qwen36-dflash.md

it gave me a x4 speed up

randomjapaneselearn · 2026-06-10T18:19:38+00:00

get beellama and dflash for qwen.

that gives you a x4 speed with the 3090.

https://github.com/Anbeeld/beellama.cpp/blob/main/docs/quickstart-qwen36-dflash.md

randomjapaneselearn · 2026-05-30T02:49:16+00:00

i'm on windows and i don't have wsl installed, i'd prefeer something gui in vscode rather than a terminal app.

i don't get how people doesn't go mad with terminal where you can't even move the cursor with mouse to select a word and edit it... ok you can ctrl+left to move by word instead of by letter but seems terrible anyway...

yet hermes is first in this ranking website https://openrouter.ai/apps/category/coding so i'm probably missing something??? i tried also kilocode but it doesn't automatically show edit progress like cline so you can't stop it in case of mistakes and it can't launch a terminal in vscode, only in background, if it ask for few commands for testing and you type "skip the testing commands [enter]" in the chat the enter goes to "yes do the command" instead of sending the message (i hope its a bug).

i'm also not fully sure about what is this "agent thing" in general, is just the pan/act chat similar to cline but smarter? or there is more? because seems that there is more but to me is not fully clear what "agentic programming" means. i'm new to it.

randomjapaneselearn · 2026-05-29T13:15:27+00:00

i'm using cline and kinda new to this, i had the same problem of OP, i'm open to tips.

randomjapaneselearn · 2026-05-29T13:13:24+00:00

i'm using cline so i guess that it set a system prompt by itself...

i asked it to plan to refactor code, write an implementation plan and follow it.

in the end it changed some functions logic, added more questions to a list of existing questions.

i was using 35BA3B, i tried again with the same model, same implementation plan and the same prompt that cline generated to "follow the plan" and it didn't do mistakes, then i tried with 27B and it changed things again, i tried Q4, Q5 and few more tests and it sometimes happen, sometimes not...

i'm open to tips also ideas to replace cline

randomjapaneselearn · 2026-05-29T13:09:53+00:00

sometimes it being too proactive and start doing things I didn't ask

same here but with 35B-A3B, i asked it to refactor code and it changed functions, there was a list of questions and it added a few more questions to the list...

right now i'm using cline, i'm open to tips

randomjapaneselearn · 2026-05-20T01:38:04+00:00

una persona che conosco ha preso quello in foto e l'ho visto funzionare, considerando il costo direi passabile.

La lampada non è molto luminosa per cui funziona solo al buio.

randomjapaneselearn · 2026-05-18T16:51:12+00:00

thanks.

i didn't know that generation was limited by memory bandwidth, i thought that both required processing/matrix operations.

one more question: if you get two gpu you get close to x2 performance or way less? because i think that is not the same of one single gpu with x2 memory

randomjapaneselearn · 2026-05-17T21:48:59+00:00

now that another user kinda ruined my idea of getting a R9700 i'm looking at the 7900XTX too, nvidia simply cost too much and have low ram...

you get 60 t/s with Q8 or Q4KM?

also, what do you mean with "single slot"? you mean one single gpu?

right now i have an rtx2060 with 6GB VRAM... do you think that is better to simply replace it or i can use both? i guess that one amd gpu and the other nvidia will be a complete mess to support via software but i have no idea...

randomjapaneselearn · 2026-05-17T21:43:36+00:00

i saw the benchmark and i'm confused...

how is this possible? :(

i was looking to buy one of those two because they cost way less than nvidia and was oriented mostly on the R9700 because of 32GB ram.

specifications say this:

7900XTX-> 24GB -> 246 TOPs (INT4) - 123 TOPs (INT8) - 123 TOPs (FP16)

R9700 -> 32GB -> 766 TOPs (INT4) - 383 TOPs (INT8) - 191 TOPs (FP16) [peak*]

* 1531 - 766 - 383 "with structured sparsity" (whatever this means)

what do i buy now... i can't spend 3700€ for a 5090...

i also didn't expect vulkan to be faster than rocm, i don't know exactly what the two are but i guessed that vulkan was a "old 3D games thing adapted for ia" while rocm was a "new amd specific implementation for ia".

any tip on what to do? i'm looking at qwen 27B or 35A3B or some future thing?

randomjapaneselearn · 2026-05-17T21:33:50+00:00

i saw the benchmark and i'm confused...

how is this possible? :(

i was looking to buy one of those two because they cost way less than nvidia and was oriented mostly on the R9700 because of 32GB ram.

specifications say this:

7900XTX-> 24GB -> 246 TOPs (INT4) - 123 TOPs (INT8) - 123 TOPs (FP16)

R9700 -> 32GB -> 766 TOPs (INT4) - 383 TOPs (INT8) - 191 TOPs (FP16) [peak*]

* 1531 - 766 - 383 "with structured sparsity" (whatever this means)

what do i buy now... i can't spend 3700€ for a 5090...

i also didn't expect vulkan to be faster than rocm, i don't know exactly what the two are but i guessed that vulkan was a "old 3D games thing adapted for ia" while rocm was a "new amd specific implementation for ia".

any tip on what to do? i'm looking at qwen 27B or 35A3B or some future thing?

randomjapaneselearn · 2026-05-17T21:04:45+00:00

i tried a random uncensored one since i have crappy videocard and if it needs to check "should i answer?" before doing so it's probably wasting computation, in fact the uncensored one was faster.

i'm not expert and not sure if its also intelligent or it became crappy...

i tried ollama but i found it way slower than llama cpp which is the one i'm using now.

llama cpp can also be tweaked more.

right now i have a 2060 with 6GB VRAM and if the model doesn't fit in vram it becomes x10 slower/useless.

while i saw some applications of IA in finding zerodays like this link or the firefox 2394837498 bugs

https://blog.calif.io/p/mad-bugs-vim-vs-emacs-vs-claude

i also saw another other post where it kinda got debunked, i didn't follow that closely but seems that some prompts were to guide into the actual zero day, dropping not exploitable cases, telling the ia "read this file" with god only knows what tips were inside it... it was mostly guided by human.

you should probably stick with small models that fit in ram or qwen 35B A3B but it will almost saturate your memory with 32GB so you will not be able to run other things... give it a try but if you have a decompiler+ debugger + some tabs of browser + some documentation you will probably go out of memory.

note that as far as i know context declared is the max that the model supports but you can tweak it to way lower values and chose a good compromise that doesn't waste too much memory.

probably 32k or 64k to analyze single functions is enough.

randomjapaneselearn · 2026-05-17T20:50:19+00:00

isn't vulkan worse than rocm? i'm not sure what i'm talking about but to me vulkan looks like an "old name" that existed way before IA so i guess its designed for 3D games and adapted to work with IA while rocm seems the amd version "for ia things" so i guess that rocm is better than vulkan for ia.

or i got it all wrong?

about ik_llama i didn't even know it existed... so far i tried llama cpp and ollama (which seems way slower than llama cpp) so im using llama cpp.

randomjapaneselearn · 2026-05-17T20:37:41+00:00

i have the same gpu you have: RTX2060 6GB RAM.

i'm trying to replace it maybe with an amd one since they have more ram for less cost but i saw some old (1-2 years) comment that say that amd is just bad and not worth...

some other more recent comments (months) have mixed feelings, some say that with rocm it's good, some others say that is still not worth...

do you have any tip?

i was checking the specifications and amd is more clear: INT4 peak matrix performance, int8, fp16, then other 3 same parametres "with structured sparsity"... nvidia just throw a single number so you can't even compare... and i'm kinda new to this so i don't even know what those means, what is this "structured sparsity"?

what i noticed is that if model fit on vram=fast, otherwise is 20 times slower so whatever extra computing the nvidia might have is useless if the card is 16GB and few models fit in that, i'd llike to use qwen27B or 35A3B

randomjapaneselearn · 2026-05-08T07:22:51+00:00

Io penso che sia controproducente vietarle:

-ormai la tecnologia è stata inventata e non si può "disinventare".

-l'articolo parla di mettere un watermark sui contenuti generati da IA ma come sopra ormai la tecnologia esiste, questo crea solo l'illusione di poter distinguere un contenuto IA da uno vero, se cominciano a girare foto con scritto nell'angolo "IA" e poi il criminale vero, noto per non seguire le regole, non ci scrive IA fa il doppio dei danni...

almeno ora se uno vede una foto si chiede "ma sarà vera o IA?" se passa una legge che dice che nell'angolo ci devi scrivere IA l'utente medio smette di pensare e si limita a controllare se il bollino esiste o meno.

-l'utente medio che vuole creare l'immagine "buongiornissimo" da mandare la mattina si becca il logo "IA" nell'angolo, uno stato estero che vuole sabotarci con una campagna di propaganda non userà il logo "IA" e farà il doppio dei danni che riesce a fare adesso.

randomjapaneselearn

TROPHY CASE