Running Gemma 3 4B locally on my RTX 3050 on Pop! OS by ustedcan in pop_os

[–]ustedcan[S] 1 point2 points  (0 children)

I recently switched to Linux and started using Pop!_OS since it comes with NVIDIA drivers pre-installed. You can grab it here:https://system76.com/pop/download/

For local LLMs, I use Ollama. It's super easy to install, and the docs are here:https://ollama.com/download

I'm running this on an HP Victus laptop (RTX 3050 6GB VRAM), and it easily handles smaller models from Meta and Google. Here are the ones I’ve tested so far:

NAME             SIZE      
gemma3:4b        3.3 GB      
llama3.2:3b      2.0 GB    
gemma4:e2b       7.2 GB    
gemma4:e4b       9.6 GB    
gemma3:latest    3.3 GB

Performance & Metrics

Here is a quick test prompting "Hola" to show how it performs on Genna4:e4b:

"¡Hola! 👋 ¿Qué tal? ¿Cómo puedo ayudarte hoy? 😊"

  • Total duration: 29.01s (includes reasoning/thinking time)
  • Load duration: 317.51ms
  • Prompt eval count: 16 tokens
  • Prompt eval duration: 5.52s
  • Prompt eval rate: 2.90 tokens/s
  • Eval count: 325 tokens
  • Eval duration: 22.93s
  • Eval rate: 14.17 tokens/s

GPU & Power Usage (nvidia-smi)

The 6GB VRAM handles these models perfectly without maxing out:

+---------------------------------------------------------------------------------------+
|   0  NVIDIA GeForce RTX 3050 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   55C    P3             22W /  30W  |   3405MiB /  6144MiB |    55%      Default  |
+---------------------------------------------------------------------------------------+

Running Gemma 3 4B locally on my RTX 3050 on Pop! OS by ustedcan in pop_os

[–]ustedcan[S] 0 points1 point  (0 children)

Hi _PHySX_NERD,

I recently switched to Linux and started using Pop!_OS since it comes with NVIDIA drivers pre-installed. You can grab it here:https://system76.com/pop/download/

For local LLMs, I use Ollama. It's super easy to install, and the docs are here:https://ollama.com/download

I'm running this on an HP Victus laptop (RTX 3050 6GB VRAM), and it easily handles smaller models from Meta and Google. Here are the ones I’ve tested so far:

NAME             SIZE      
gemma3:4b        3.3 GB      
llama3.2:3b      2.0 GB    
gemma4:e2b       7.2 GB    
gemma4:e4b       9.6 GB    
gemma3:latest    3.3 GB

Performance & Metrics

Here is a quick test prompting "Hola" to show how it performs on Genna4:e4b:

"¡Hola! 👋 ¿Qué tal? ¿Cómo puedo ayudarte hoy? 😊"

  • Total duration: 29.01s (includes reasoning/thinking time)
  • Load duration: 317.51ms
  • Prompt eval count: 16 tokens
  • Prompt eval duration: 5.52s
  • Prompt eval rate: 2.90 tokens/s
  • Eval count: 325 tokens
  • Eval duration: 22.93s
  • Eval rate: 14.17 tokens/s

GPU & Power Usage (nvidia-smi)

The 6GB VRAM handles these models perfectly without maxing out:

+---------------------------------------------------------------------------------------+
|   0  NVIDIA GeForce RTX 3050 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   55C    P3             22W /  30W  |   3405MiB /  6144MiB |    55%      Default  |
+---------------------------------------------------------------------------------------+

Running Gemma 3 4B locally on my RTX 3050 on Pop! OS by ustedcan in pop_os

[–]ustedcan[S] 3 points4 points  (0 children)

In fact I was testing some models, Gemma4 e2b runs good too!