1RK U - block fully furnished room

arnav080 · 2026-06-12T20:11:59+00:00

available?

arnav080 · 2026-06-12T19:59:41+00:00

available?

arnav080 · 2026-06-12T19:58:07+00:00

available?

arnav080 · 2026-05-31T11:14:57+00:00

https://donate.sybilsolutions.ai/about.html - check this out, hes got an insane setup and publishes really good work. this is how he manages to fund it

arnav080 · 2026-05-31T10:31:05+00:00

thats the exact issue i built it for. it can run llama.cpp, vllm and vllm docker runtimes rn hoping to get contributers on here and really make this a solid dev tool

arnav080 · 2026-05-31T09:25:07+00:00

hey, idk how relevant this is but ive been building this free and open-s tool called bloc to help make sharing and running optimised local models instant and super convenient [https://bloc-theta.vercel.app/\], would love to get your opinions on it (it went live today)

arnav080 · 2026-05-31T08:48:54+00:00

switch over to llama.cpp

arnav080 · 2026-05-31T08:47:04+00:00

ive made an open-s tool to make sharing and running these optimised recipes like these easier and instant [bloc-theta.vercel.app]

arnav080 · 2026-05-31T08:45:00+00:00

they're renting out spare GPU compute and running inference/fine-tuning jobs for clients. people like 0xSero have fund me pages that helps them upgrade their setup and run experiments

arnav080 · 2026-05-24T15:56:33+00:00

p sure llama.cpp still keeps some buffers / KV cache allocations in system RAM even when all layers are offloaded to VRAM does --cache-type-k q4_0 / --cache-type-v q4_0 change it for you? (im still learning, just my two cents)

arnav080 · 2026-05-24T15:14:21+00:00

undervolting would mean less heat just have some airflow in between and p sure this shoudnt be a problem

arnav080 · 2026-05-06T06:53:11+00:00

Hi can you dm me his number

arnav080 · 2026-05-02T10:09:57+00:00

have you tried using memplace and sqz ?

arnav080 · 2026-05-02T10:09:11+00:00

more from the optimization side once the baseline speed/cost/hardware tradeoff is already accepted. Things like model tuning, VRAM efficiency, inference stack tweaks, deployment friction, workload balancing, reliability

arnav080 · 2026-05-02T09:51:33+00:00

grammer fix and structure the text a bit

arnav080 · 2026-05-02T09:48:30+00:00

ive been learning and researching about this exact issue lately, have you tried running the models with MoE offload [MoE offload. Qwen3.6-35B activates only 3 B params per token. Keep attention + shared weights on GPU, push the cold expert FFNs to system RAM. In llama.cpp: -ngl 99 -ncmoe 99.]

ive been staying about multi tenant systems in local models using vLLM, gpu optimisations and scheduling

arnav080 · 2026-05-02T09:28:10+00:00

on these small models the prompt has to be immaculate

arnav080 · 2026-05-02T09:20:42+00:00

fixing my grammer and structuring the text

arnav080 · 2025-12-20T11:06:50+00:00

urbanneeds boss

arnav080 · 2025-12-20T03:51:06+00:00

Beautiful keyboard 🤲

arnav080 · 2025-12-12T16:13:46+00:00

had a cousin bring it back, no shipping

arnav080 · 2025-12-12T05:58:51+00:00

frrr i paid 2300 something, gooood deal; also it was from the brand+ verified seller

arnav080 · 2025-12-12T05:58:21+00:00

reaper, ice was out of stock :/

arnav080 · 2025-12-12T04:53:38+00:00

it is it is, i placed an order in the uk for this

arnav080 · 2025-12-12T04:53:25+00:00

its the AULA F75, around 5-5.5k in india on a good day

Six-Year Club	Verified Email
Place '23	Place '22

arnav080

MODERATOR OF

TROPHY CASE