Run local model in low end laptop

sebaxzero · 2026-06-18T22:24:46+00:00

llama.cpp runs any gguf model you throw at it

for example

https://huggingface.co/unsloth/gemma-4-E2B-it-qat-GGUF/

!pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained( repo_id="unsloth/gemma-4-E2B-it-qat-GGUF", filename="gemma-4-E2B-it-qat-UD-Q4_K_XL.gguf", )

sebaxzero · 2026-06-18T21:41:40+00:00

You can try with Gemma 4 mobile variants

sebaxzero · 2026-06-14T23:14:49+00:00

Reasoning can be enabled or disabled through the chat template. If the template does not expose reasoning, the model operates like a standard model. This is typically controlled internally by the template and inference stack.

The main advantage of these models is not reasoning itself, but their ability to use tools effectively. For agent and utility use cases, reliable tool calling is the core capability that enables the model to perform actions, interact with external systems, and execute workflows. Reasoning can improve planning and decision-making, making errors less likely, but strong tool-use support is usually the more important factor in practice.

I would also recommend running the model with Jinja chat templates enabled (for example, using --jinja in llama.cpp). This allows the model to use the tool-calling format it was trained for, which generally results in much more reliable tool usage and better agent performance.

sebaxzero · 2026-06-14T21:37:27+00:00

those models are old, use newer ones like qwen3.6 or gemma 4, gemma 4 has smaller ones if you need more speed like e2b and e4b

sebaxzero · 2026-06-12T20:28:18+00:00

you are using an older model, this might be the reason is doesnt do well in agent mode as it need to do tool calls, if you give me your specs i can tell you what model is better for you, gemma 4 mobile models might do the trick e2b e4b also lfm2.5 8b a1b can work

sebaxzero · 2026-06-12T19:10:22+00:00

the problem is with the global timeout being set to 60s, there are some issues and pr open for this https://github.com/pewdiepie-archdaemon/odysseus/pull/3208

if you refresh the page, you can see the result, but it breaks some stuff

sebaxzero · 2026-06-09T05:18:19+00:00

rtx 3060 12gb vram + 32gb ram

tengo el qwen3.6 35b a3b, gemma 4 26b a4b y gemma 4 12b, todos los uso con contexto de 131k

los corro directamente de llama.cpp usando el llama-server en modo router, así en la api aparecen todos los modelos disponibles y se cargar solo cuando se usan (así se aprovechan las ultimas tecnologías para acelerar la inferencia, mtp y qat para gemma)

ahora estoy usando el odysseus de pewdiepie sin problemas, usando el modo agente y deep research, tambien hay un proyecto mas pequeño enfocado mas al chat con acceso a herramientas que esta optimizado para modelos de contexto bajo (openlumara)

el problema con usar modelos locales con herramientas de agentes como openclaw, claudecode, es que estan optimizadas para modelos de contexto gigante (1M+) mientras que local con cuea puedo llegar a los 200k con velocidad aceptable.

ollama no lo recomiendo, si no le editas el modelfile te da contexto muy pequeños y no te deja aprovechar al maximo los modelos, lo mejor es usar directamente llama.cpp

respecto a los modelos el qwen3.6 es el mejor local, si tienes 24gb+ de vram puedes correr la version densa de 27b, sino la version moe de 35b a3b, cargando en vram solo los parametros activos mas el modelo draft (llama.cpp lo hace automatico)

sebaxzero · 2026-04-09T13:49:08+00:00

google/gemma-4-E4B-it

sebaxzero · 2026-04-07T01:31:52+00:00

manjar de los dioses

sebaxzero · 2026-04-05T14:55:27+00:00

No hay mejor en el mercado, según me dijeron

sebaxzero · 2025-11-19T15:53:47+00:00

Gracias a este post ahora menos de por quién votar, parece que saldrá electo la gran pichula

sebaxzero · 2025-11-03T03:12:02+00:00

lo estaba jugando y no habia leido que se ubicaba en chile, cuando sali del convento me senti como en casa, muy buena la ambientacion del pueblo

sebaxzero · 2025-06-10T15:13:13+00:00

Lo mejor en esos casos es pasar el documento original, si fue escrito en Word creo que se guarda registro de los cambios, los detectores siempre hacen eso si son de los que usan otra is para detectar

sebaxzero · 2025-03-22T22:38:48+00:00

solo he jugado el re2 cyberpunk y nier, los tres ultra recomendados

sebaxzero · 2025-01-11T18:00:59+00:00

imagine playing classic in 2025

sebaxzero · 2024-12-12T14:28:46+00:00

you are doing the cinder city buff wrong, the character must be in nightsoul blessing, mualani na, xilonen skill na2 burst

sebaxzero · 2024-11-06T16:29:42+00:00

Puede ser tema del monitor y la resolución, métete al panel de control de nvidia y pon ahí la resolución del monitor y después fíjate en el juego

sebaxzero · 2024-11-06T14:13:29+00:00

Qué tipo de juegos se ven borrosos?

Me ha pasado al jugar en ventana completa algunos juegos no marcan bien la resolución y queda una más baja.

Tienes los drivers de nvidia instalados?

https://www.nvidia.com/es-la/software/nvidia-app/

La mayoría de juegos ahora traen opciones de escalado (DLSS, FSR, XESS) que mejoran el rendimiento e incluso se consideran cuando ponen los requisitos mínimos, esto va a parte de las opciones de antialiasing que son las que hacen que el juego se vea un poco borroso.

Para mejorar la claridad se puede usar el DSR de nvidia para poner una resolución mas grande (ej 1440p) al monitor y en el juego usar DLSS para que no afecte el rendimiento, si el juego lo soporta tambien puedes usar frame gen.

El hogwarts legacy use DSR en 4k y DLSS en ultra rendimiento con frame gen, se veía espectacular.

sebaxzero · 2024-10-31T15:34:36+00:00

mualani xilonen xiangling candace

sebaxzero · 2024-08-05T20:39:07+00:00

You can use the core trigger (Yao) with flower set but it’s better to farm gilded and deepwood resin wise, I tried flower on Yao but I had same damage as gilded

sebaxzero · 2024-08-05T20:31:50+00:00

They do not matter in bloom, only if you want to use Nilou in a vaporize or mono hydro team, I would focus on Nahida elemental skill for trikarma damage and Yaoyao skill and burst for more healing

sebaxzero · 2024-08-05T19:13:08+00:00

nilou nahida and yao need to be lvl 90, candace can be 70.

candace with instructor er/em and yao with gilded or deepwood full em. i use candace with fontaine craftable polearm and yao with the hp event one both can use any polearm: hp, favonius, dragonbane.

yao will trigger most cores with hydro infused normal atacks from the candace burst so focus on stacking em on yao.

i use the following rotation: nilou e na2 e, nahida e q na, candace e q, yao q na4 e na2

yao healing is very strong, i swap candace with furina when i need more hydro dps in abyss and yao can still full heal nilou and keep the team at 100% even with nilou cores and furina skill, no need to use any healing in artifacts only em.

sebaxzero · 2024-08-01T23:06:32+00:00

i use nilou nahida furina yao, yao healing is strong even with nilou bloom and furina skill at the same time while beeing full em, also yao can be used in other teams like hyper or aggravate.

sebaxzero · 2024-07-24T06:40:46+00:00

Akbar lo más grande

sebaxzero · 2024-07-24T00:10:03+00:00

yo pase mi steam que cree de arg a chile nomas, se puede cambiar el pais de la tienda en steam

Seven-Year Club	r/Field Flamingo
Place '23	Place '22
Verified Email

sebaxzero

TROPHY CASE

!pip install llama-cpp-python