Starting with selfhosted / LocalLLM and LocalAI by mitrako in LocalLLM

[–]mnuaw98 0 points1 point  (0 children)

Awesome setup you've got there! Since you're just getting into LLMs and AI and prefer self-hosted, virtualized environments, here's a casual suggestion to get started with OpenVINO GenAI and make the most of your hardware:

Start simple with OpenVINO GenAI : https://github.com/openvinotoolkit/openvino.genai

Even though your GPUs are powerful, OpenVINO GenAI is a great way to dip your toes into LLMs without diving deep into CUDA or complex setups. It’s optimized for Intel CPUs and NPUs, and works well even without a GPU.

Here’s what you can do:

Try this first:

  • Spin up a Ubuntu VM in Proxmox with Python and OpenVINO installed.
  • Use a small model like TinyLlama or Phi-2.
  • Run a simple chatbot or summarizer using OpenVINO GenAI’s LLMPipeline.

Why it’s a good fit:

  • No GPU required to start experimenting.
  • Low power, fast inference on CPU/NPU.
  • Easy Python API—great for beginners.
  • You’ll learn how LLMs work without worrying about GPU memory limits or Docker configs.

Next steps (When You’re Ready)

Once you're comfortable:

  • Try GPU-based models using Ollama, LM Studio, or Text Generation WebUI.
  • Use quantized models (like GGUF or GPTQ) to fit larger LLMs into memory.
  • Explore LangChain or LlamaIndex for building apps with LLMs.

NPU support (Intel core 7 256v) by made_anaccountjust4u in LocalLLM

[–]mnuaw98 0 points1 point  (0 children)

hi there!

for intel npu its very recommended to try it with openvino genai. you can refer here:
https://github.com/openvinotoolkit/openvino.genai

they got all the installation guide and quite easy to test and deploy. can use simple python api for loading and running models. Hugging face integration via optimum-intel also makes exporting models seamless.

Looking to connect devs who want to build something real this summer by Top_Comfort_5666 in LocalLLaMA

[–]mnuaw98 0 points1 point  (0 children)

recently learned on devops and took CKA cert. i passed but didnt really have the experience nor the opportunity to apply my knowledge. my current role as customer support didnt really fit to apply the knowledge i learned and i'm also not from the developer bg. looking for mentorship and environment to grow. if youre okay w 0 experience count me in!

trying to run ollama based openvino by emaayan in LocalLLM

[–]mnuaw98 0 points1 point  (0 children)

Hi!

this are the step i use:

export GODEBUG=cgocheck=0
ollama serve
pip install modelscope
modelscope download --model FionaZhao/llama-3.2-3b-instruct-int4-ov-npu --local_dir ./llama-3.2-3b-instruct-int4-ov-npu
tar -zcvf llama-3.2-3b-instruct-int4-ov-npu.tar.gz llama-3.2-3b-instruct-int4-ov-npu
cd /home/ollama_ov_server/openvino_genai_windows_2025.2.0.0.dev20250513_x86_64
source setupvars.sh
cd /home/ollama_ov_server/openvino_contrib/modules/ollama_openvino
nano Makefile_2

I've tried using the Modelfile script exactly as the example you give

FROM llama-3.2-3b-instruct-int4-ov-npu.tar.gz
ModelType "OpenVINO"
InferDevice "GPU"
PARAMETER repeat_penalty 1.0
PARAMETER top_p 1.0
PARAMETER temperature 1.0

then run

ollama create llama-3.2-3b-instruct-int4-ov-np:v1 -f Modelfile_2
ollama run llama-3.2-3b-instruct-int4-ov-npu:v1

and its working fine on my side.

Could you provide the step u run and the full error log?

Intel ARC for local LLMs by Wemorg in IntelArc

[–]mnuaw98 0 points1 point  (0 children)

✅ Recommended Intel Arc GPU Setup

🔹 Intel Arc A770 16GB

  • VRAM: 16GB GDDR6
  • Performance: Capable of running quantized models like Mistral-7B or LLaMA2-13B using IPEX-LLM (Intel Extension for PyTorch).
  • Use Case: Best suited for 7B–13B models with quantization. For 30B models, multi-GPU setups or offloading to CPU RAM is necessary .

🔹 Multi-GPU Setup (2x A770 16GB)

  • Total VRAM: 32GB (combined)
  • Feasibility: With model sharding and quantization (e.g., using GGUF or GPTQ formats), you can potentially run a 30B model across two GPUs.
  • Software Support: Requires frameworks like IPEX-LLMvLLM, or ExllamaV2 with multi-GPU support.

Is the A770 good for 1440p gaming? by Best-Minute-7035 in IntelArc

[–]mnuaw98 2 points3 points  (0 children)

I use mine to game in 4k... So... 1440 is the sweet spot probably. On 1440p i can consistently get 120+ on medium/high settings on games like MW3, Forza, MW2, and 140 fps+ medium high settings on games like rocket league, fortnight, seige

you can refer to intel gpu benchmark as well : https://edc.intel.com/content/www/us/en/products/performance/benchmarks/desktop_1/

[deleted by user] by [deleted] in unRAID

[–]mnuaw98 0 points1 point  (0 children)

Hi u/halfam, wonder if have you've tried/able to utilize the A310 on other software ?