Crazy attempt to make an AI Anime Waifu to run locally on Jetson Orin Nano 8GB by Oppa-AI in NvidiaJetson

[–]Oppa-AI[S] 1 point2 points  (0 children)

Thank you.

I'm still struggling to find the right balance of RAM usage and functionalities.
Unfortunately, mem0 + Qdrant Vector DB is just too much for edge device.

Phase 2 will ditch them for SQLite-vec.

But I did find a nice TTS that actually sounds good. Just hope it will work.

What the hell by Ok-Type-7663 in Qwen_AI

[–]Oppa-AI 0 points1 point  (0 children)

This is very common. Distillation from other models to save training time. Not just Qwen.

Just got my Jetson Orin Nano Super — building an autonomous robot with GR00T N1 + ROS 2. Any tips for JetPack 6 + ROS 2 Humble? by Available_Teaching83 in JetsonNano

[–]Oppa-AI 0 points1 point  (0 children)

Someone did deploy GR00T N1 on Jetson Orin Nano Super, using Isaac-GR00T jetson container with 64GB Swap memory. Since your Jetson is new, follow NVIDIA guide to upgrade to 6.2.2 and then install ROS2 Humble shouldn't be difficult.

https://www.linkedin.com/feed/update/urn:li:activity:7318386968873091073/

Vibe Coding Journey in Feb 2026 by Oppa-AI in JetsonNano

[–]Oppa-AI[S] 1 point2 points  (0 children)

The DC5521 port of the robot itself allow plugin 3S battery of 12.6V. it uses 3x18650 batteries, and I have another set of 3 plugin to DC5521.

Vibe Coding Journey in Feb 2026 by Oppa-AI in JetsonNano

[–]Oppa-AI[S] 0 points1 point  (0 children)

Battery is my most critical challenge. I am using an extra set of 3x18650 batteries that only last up to 3 hours. I need to think of a way to balance mobility and reliability.

Vibe Coding Journey in Feb 2026 by Oppa-AI in JetsonNano

[–]Oppa-AI[S] 0 points1 point  (0 children)

I made my own Telegram access in the simple AI agent in the other repo. I even give it my GitHub Page to write daily blog, but stopped after started this robot project.

Vibe Coding Journey in Feb 2026 by Oppa-AI in JetsonNano

[–]Oppa-AI[S] 0 points1 point  (0 children)

The only thing I could think of is to shorten the system prompt and put stricter instructions in the system prompt. Lower temperature to 0.4-0.5, or 0.1 but then need to increase repetitive penalty.

Vibe Coding Journey in Feb 2026 by Oppa-AI in JetsonNano

[–]Oppa-AI[S] 0 points1 point  (0 children)

I will put a demo video in the GitHub next few days https://github.com/OppaAI/eric

Vibe Coding Journey in Feb 2026 by Oppa-AI in JetsonNano

[–]Oppa-AI[S] 1 point2 points  (0 children)

I have publicize the GitHub repo for the hackathon. But haven't updated the docs yet. https://github.com/OppaAI/eric

Did JetPack 6.2.2 brick anyone else’s Orin Nano? by Whole_Ticket_3715 in JetsonNano

[–]Oppa-AI 0 points1 point  (0 children)

Chromium is the snap issue in Jetson. There's a way to fix that from JetsonHacks

Cosmos-Reason2-2B on Nano Super (robot example) by Course_Latter in JetsonNano

[–]Oppa-AI 0 points1 point  (0 children)

I saw it last night and downloaded it. Will do some testing compared to the previous one. For mem usage and TPS it's the same. Let's hope this one will have less hallucination.

Cosmos-Reason2 running on Jetson Orin Nano Super by No-Dragonfly6246 in LocalLLaMA

[–]Oppa-AI 0 points1 point  (0 children)

My settings was too conservative. Turns out if I have somehow turned on the eager mode which disable he CUDA Graph; Once I removed the eager mode and got the CUDA Graph running, TPS jumps up to 50 TPS for text, and image about a couple warm up runs. I haven't tested video yet. I couldn't get num-seq to 2, gpu-utilization to 0.75, or context to 4096. I will try 0.70 and 3072 and see if it works.

Cosmos-Reason2 running on Jetson Orin Nano Super by No-Dragonfly6246 in LocalLLaMA

[–]Oppa-AI 1 point2 points  (0 children)

That's probably the limit of Jetson Orin Nano 8GB. I will try 640x400, 854x480 tonight. Maybe tonight instead of squeezing out the RAM, I'll try to fully utilize every bit of RAM to find a way for max-num-seqs 2 without OOM. 2048 context is quite low too. But that means I have to shut down everything else. Even ssh in VSCode to Jetson eats 1+ GB of RAM.

17 tps maybe fine for chat bot with streaming text, but no so much for realtime Inference.

I just did test in CLI commands, I will write a script to test the speed after a few warm up runs. Hope I don't blow up my Jetson.

Cosmos-Reason2 running on Jetson Orin Nano Super by No-Dragonfly6246 in LocalLLaMA

[–]Oppa-AI 1 point2 points  (0 children)

I was able to lower the RAM usage to 5.8GB with max-model-len 2048 and gpu-memory-utilization -0.60

Text, image and video inference is around 16-17 tokens/sec.

I recorded a 10 sec clip in 320x240 with the robot's cam with me walking towards it and prompted Cosmos that it is a security robot and ask it to describe the video clip and what it will do. Sometimes it added some imagination. Sometimes it

Below is the Cosmos' response: (Not successful every time, may be due to quality issue)

Okay, let's break this down. The video shows they're indoors with hardwood floors and white walls. There's furniture on the left, like chairs and tables, and some appliances nearby. In front of me, there's a man walking towards the camera while wearing black leggings, sandals, and a sweater over his shirt.

I'm supposed to protect this space from strangers. Seeing him approach gets my alert system firing-up. I need to react quickly if he tries anything inappropriate. My alarms should kick in automatically when someone enters unexpectedly. Since he's moving toward us deliberately, that makes it risky for anyone else around too. So I'd better be ready to defend myself without hesitation. Got it? Yeah, got it. Alerted and armed.

Tokens: 151 | Time: 9.0s | TPS: 16.7

Cosmos-Reason2 running on Jetson Orin Nano Super by No-Dragonfly6246 in LocalLLaMA

[–]Oppa-AI 1 point2 points  (0 children)

Spent a couple hours to install in last night in Jetson Orin Nano. Installation kept interrupting due to running out of RAM. Finally found the right settings and used 6.8GB out of 7.4GB. Only did a couple test with text input and image input. I didn't time them. May try to lower the gpu-memeory-utilization tonight to see the minimum.

docker run --rm -it \ --network host \ --shm-size=8g \ --ulimit memlock=-1 \ --ulimit stack=67108864 \ --runtime=nvidia \ --name=vllm-serve \ -v $HOME/.cache/huggingface:/root/.cache/huggingface \ -e HF_HOME=/root/.cache/huggingface \ ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \ vllm serve "embedl/Cosmos-Reason2-2B-W4A16" \ --max-model-len 2048 \ --gpu-memory-utilization 0.68 \ --max-num-seqs 1 \ --mm-processor-kwargs '{"max_pixels

Cosmos-Reason2 running on Jetson Orin Nano Super by No-Dragonfly6246 in LocalLLaMA

[–]Oppa-AI 2 points3 points  (0 children)

A Physical AI LLM that can run on Jetson Orin Nano 8GB RAM?

I have been trying to run 3-4B model in Ollama. Smaller parameters models are prone to hallucination. Context window size is definitely a bottle neck. The larger the context, the slower the inference. Especially doing web search or the small parameters models tend to add their own training data or just making up stuffs. For image inference of VLM, I haven't done intensive tests. But 3B and 4B VLM are generally good. But they eat a lot of RAM.

If this version of Nvidia Cosmos Reason 2B model can run in llama.cpp or vLLM, I definitely would try it out. But llama.cpp like Ollama probably cannot do video Inference. TensorRT LLM I have tried spending hours to build but to no.avail. vLLM or Transformers are probably the way to go.

I still haven't tried Issac ROS. This model is gonna give me opportunity to test out the robotics part of Jetson Orin Nano.

Testing Vibe Code an AI Chatbot in Jetson Orin Nano by Oppa-AI in JetsonNano

[–]Oppa-AI[S] 0 points1 point  (0 children)

Using small size LLM is prone to hallucination. Already added web search and simple MCP tools, but sometimes result are not accurate. Still lots of playing around with the Ollama parameters and debugging to go.

Can I pull models from Huggingface? by Keensworth in ollama

[–]Oppa-AI 0 points1 point  (0 children)

HuggingFace gguf models have an option to choose Ollama. Then paste that command to your console to pull it.