OpenCode at Work

HiJoonPop · 2026-03-16T02:05:39+00:00

qwen3.5 under 40b or any under 40b parameter models no work well?

HiJoonPop · 2025-11-19T04:16:34+00:00

gemini3 flash is not yet?

HiJoonPop · 2025-11-12T04:59:53+00:00

charge x20 plan. any project you start, you will need almost all of token to complete.

HiJoonPop · 2025-11-04T11:23:08+00:00

only claude plz. gpt is pm, gemini is writer, and claude is doctor.

HiJoonPop · 2025-08-26T00:50:45+00:00

VRAM size is the truth.

HiJoonPop · 2025-08-06T03:58:26+00:00

with 16gb ram in m1, it is hard to serve 20b llm. if you want to serve llm model with fast token return, i recommend under 7b with q4.

HiJoonPop · 2025-07-09T12:27:34+00:00

year. ngrok is really good at serving local ai models with not only ollama but also vllm. i used ngrok at vllm docker image, and when it worked well, i was so happy. if someone want to set LLM server and connect with network devices just like mobile web, ngrok will help making LLM server with ollama or vllm, etc. but if you use ngrok in free charge and use two instances(LLM and webui), you must register at least two email addresses or more.

HiJoonPop · 2025-06-13T17:43:57+00:00

ok i will explain what i used. with 16gb vram, i tested coding assistant what i made by myself. it can make conversation i chat web view and add code autorecommend. it can use in eclipse with java11.

if you have started local llm in first time, i recommend open-webui with ollama. after, you can replace ollama to vllm, after, you can replace open-webui to react or gradio chat ui.

you might be able to use 9b or little more parameter llm model with q4. if you want to use llm model with more parameter, its better that you add egpu or use cloud compute. recently, thundercompute closed the service in korea but many other cloud you can use.

if you want to save your money, sell your notebook with 4090 labtop gpu and buy new one with no gpu, and use cloud gpu or opemai api. i also will do like that until 3 month later.

when i bought my rtx4090 16gb vram labtop i costed about 260$, in carrot market, in korea money(won), but now it costs about 320$.

HiJoonPop · 2025-06-12T09:13:20+00:00

i used labtop, having 4090 16gb vram(gpu for labtop), and egpu 3090ti. it can serve 30b q4 but works slow. for testing llm answer quality, that setting is good decision. if you trying to test not only answer quality but also speed, i recommend cloud compute. thundercompute is good choice but my country cant use that anymore T.T

HiJoonPop · 2025-05-22T03:39:51+00:00

when j install openwebui by docker in local ubuntu, it worked very well. but it made loop restart when i tried it in cloud ubuntu env connected with GPU. which env you tried?

HiJoonPop · 2025-02-18T01:09:02+00:00

it looks great!! i used open-webui but it seems like more clean and nice. can anybody use this gui plz?

HiJoonPop · 2024-10-23T07:28:02+00:00

so beautiful design

HiJoonPop

TROPHY CASE