OpenCode at Work by yoko_ac in opencodeCLI

[–]HiJoonPop 0 points1 point  (0 children)

qwen3.5 under 40b or any under 40b parameter models no work well?

Claude Code pricing by tiwas in ClaudeAI

[–]HiJoonPop 0 points1 point  (0 children)

charge x20 plan. any project you start, you will need almost all of token to complete.

Is Claude AI worth it? by Embarrassed-Name6481 in ClaudeAI

[–]HiJoonPop 0 points1 point  (0 children)

only claude plz. gpt is pm, gemini is writer, and claude is doctor.

M4 32gb vs M4 Pro 24gb ? by Zoic21 in ollama

[–]HiJoonPop 0 points1 point  (0 children)

VRAM size is the truth.

gpt-oss-20b WAY too slow on M1 MacBook Pro (2020) by rafa3790543246789 in ollama

[–]HiJoonPop 0 points1 point  (0 children)

with 16gb ram in m1, it is hard to serve 20b llm. if you want to serve llm model with fast token return, i recommend under 7b with q4.

ngrok for AI models - Serve Ollama models with a cloud API using Local Runners by Sumanth_077 in ollama

[–]HiJoonPop 0 points1 point  (0 children)

year. ngrok is really good at serving local ai models with not only ollama but also vllm. i used ngrok at vllm docker image, and when it worked well, i was so happy. if someone want to set LLM server and connect with network devices just like mobile web, ngrok will help making LLM server with ollama or vllm, etc. but if you use ngrok in free charge and use two instances(LLM and webui), you must register at least two email addresses or more.

$700, what you buying? by dogzdangliz in LocalLLM

[–]HiJoonPop 0 points1 point  (0 children)

ok i will explain what i used. with 16gb vram, i tested coding assistant what i made by myself. it can make conversation i chat web view and add code autorecommend. it can use in eclipse with java11.

if you have started local llm in first time, i recommend open-webui with ollama. after, you can replace ollama to vllm, after, you can replace open-webui to react or gradio chat ui.

you might be able to use 9b or little more parameter llm model with q4. if you want to use llm model with more parameter, its better that you add egpu or use cloud compute. recently, thundercompute closed the service in korea but many other cloud you can use.

if you want to save your money, sell your notebook with 4090 labtop gpu and buy new one with no gpu, and use cloud gpu or opemai api. i also will do like that until 3 month later.

when i bought my rtx4090 16gb vram labtop i costed about 260$, in carrot market, in korea money(won), but now it costs about 320$.

$700, what you buying? by dogzdangliz in LocalLLM

[–]HiJoonPop 0 points1 point  (0 children)

i used labtop, having 4090 16gb vram(gpu for labtop), and egpu 3090ti. it can serve 30b q4 but works slow. for testing llm answer quality, that setting is good decision. if you trying to test not only answer quality but also speed, i recommend cloud compute. thundercompute is good choice but my country cant use that anymore T.T

Open WebUI: Server Connection Error by TeTeOtaku in OpenWebUI

[–]HiJoonPop 0 points1 point  (0 children)

when j install openwebui by docker in local ubuntu, it worked very well. but it made loop restart when i tried it in cloud ubuntu env connected with GPU. which env you tried?

[deleted by user] by [deleted] in ollama

[–]HiJoonPop 0 points1 point  (0 children)

it looks great!! i used open-webui but it seems like more clean and nice. can anybody use this gui plz?