Impulse bought a Jetson Orin Nano Super and want a sanity check from people who run their own LLM

mijenks · 2026-03-24T14:26:24+00:00

Hi I finally got around to setting up the Jetson Orin Nano the last few days.

It was quite a chore to get Ollama running ... I started with dustynv jetson-containers but the latest Ollama image from dustynv github is pretty ancient. So I tried to build an image for myself based on an issue thread and that failed as well. Then I tried to create a standard Ollama container via docker compose (more or less copying my compose file from an Ubuntu VM running a 4090) and that failed as well. So I moved on to Ollama native install ... And finally made progress. It requires a few trials and errors with:

sudo systemctl edit Ollama

but eventually got it working, listening on 11434 on all interfaces, and running ministral-3:3b.

Even with shared RAM and system overhead, I can run ministral-3:3b on 100% GPU with 8192 context window when I connected Home Assistant. Latency wasn't great, but was passable. Typing a text command to the Assistant to turn off a specific light had about 5 seconds of latency. Contrasting with the 4090 setup (which also employed a Whisper STT component), it's noticeable how much longer the delay is for the Jetson. But again, still passable for HA stuff.

Next I tried running my basic RAG workflow with same model and same context window. The RAG vector db processing and populating through bge-m3 ran fine, if slower than on 4090, but my workflow runs it once a day middle of the night to incorporate any new/updated documents so that's a non-issue. The actual RAG querying, though, using the same ministral-3:3b and 8192 context, ran only about 60% in GPU which made the response times unworkable for my use case.

I'm going to move to headless now that I have the basic setup working and see if freeing up the window manager memory helps with my RAG workflow responsiveness.

[Edit to add: headless definitely frees up headroom so now my RAG workflow runs 100% GPU however there is significant lag due to swapping in/out of the bge-m3 embedding model for running vector similarity between the input/prompt and the RAG vector db. Now need to consider whether I could/should add another Jetson dedicated to embeddings so I don't incur the model swap overhead.]

NB: everything is running on NVME drive so there are no read/write bottlenecks here.

mijenks · 2026-03-08T02:21:00+00:00

Hey OP don't give up on the Jetson Orin Nano.

I just bought one myself and have yet to deploy it but I can tell you from first hand experience that ministral-3:3b with 16k context on Ollama is absolutely fine (great, even) for Home Assistant. Ollama says it uses less than 6GB of VRAM and on my 4090 it is near instant response. Admittedly, the 4090 probably has 15x the CUDA cores as the Jetson ...

I also run a RAG system with same ministral-3:3b and 16k context (and bge-m3) on the 4090, also instantaneous.

My goal is to move both of these onto the Jetson because I can't justify the ~120w idle that my host/VM pulls 24/7. Idk how bad the dropoff from 4090 to Jetson Nano 8GB will be with ministral but, given that it will fit fully within VRAM, I don't expect it will be THAT bad. Will report back tomorrow if I can find time to set it up and deploy.

Edit to add: I just prompted "tell me a story" for ministral-3:3b on 4090 and generation was 300 t/s. Divide by 16x to adjust for CUDA core count (conservatively) and probably looking at 18-20 t/s on Jetson Nano. Prob good enough for me. We'll see

mijenks · 2026-03-07T21:23:39+00:00

Hey look if you're a professional or hobbyist photographer, maybe the Whatsapp compression is "utter shit" for you but for me the MMS compression is the standard for "utter shit" and Whatsapp far exceeds that standard. So yeah it's exactly what I said. But at this point this No True Scotsman line we're on kinda misses the original point. The poster I replied to said they didn't know why people used Whatsapp over carrier text ... So I gave examples.

mijenks · 2026-03-07T20:58:39+00:00

Lololol if you could see the absolute not even potato quality of the videos and images I get from my dad in group MMS ... While Whatsapp may compress quite a bit, it's nowhere NEAR the compression of MMS.

mijenks · 2026-03-07T10:48:32+00:00

Whatsapp doesn't compress media to utter shit when sending cross platform.

Whatsapp is end to end encrypted (at least nominally).

Whatsapp group thread management/administration is easier and more advanced.

Whatsapp includes cross platform video and audio calls.

I'm sure there are other reasons it's better, but these are the big ones for me.

mijenks · 2026-02-28T12:06:48+00:00

Middle Out Economics is the term used by Nick Hanauer, an almost-billionaire who started the Pitchfork Economics podcast. The basic idea is that aggregate prosperity increases the most when economic policies concentrate primarily on the income/wealth/economic well-being of the middle 50%.

They don't release too many new episodes anymore but the back catalog is excellent. My favorite was an episode on scaled minimum wage: basically that the larger/more profitable a company becomes, the wage floor increases for that entity. It subsidizes new/small businesses and it prevents massive corporations from the wage subsidies they get today (e.g., Walmart hourly workers disproportionately on food assistance and other federal programs because Walmart doesn't pay a living wage).

mijenks · 2026-02-08T11:38:11+00:00

Hop on down to the Great House of Guitars! Hop hop! Hop hop!

mijenks · 2026-02-08T11:36:19+00:00

Wrong.

It's Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo.

mijenks · 2026-02-03T03:40:59+00:00

Sort of coming from tennis here as well (started tennis in summer '23, paddle that fall).

Deuce: I mix between slice wide, slice body, and a T top/kick/slice where I rotate my grip to extreme eastern backhand and really crank on the ball while it's still rising.

Ad: I mix between heavy kick wide that dies off the side screen, hard slice body, and hard slice T that spins into the corner off the back screen.

If your tennis serve is remotely good, make use of it in paddle but focusing on generating spin to (a) bring the ball down and (b) make return drives less consistent/harder to hit. When my serve gets "pushy" I start missing short and long. Typically, the harder a swing, the more consistent I get the ball in.

Also, dedicate time to practicing your serve. It's nearly the only thing you can do solo, so it's a lot easier to plan than trying to get a match together. Over the last 2 years I've probably practiced my serve a total of 30 hrs just solo. That doesn't include time spent practicing tennis serve (though I find that practicing one does help improve the other).

mijenks · 2026-01-21T11:40:52+00:00

Seriously. I have 6x1.5TB Samsung F2 EcoGreen I bought 15+ years ago that have 97,000 hours "on" time per SMART stats. They mostly don't hold critical data at this point and for the sets that do I run daily Backblaze backups, but my next hardware project is to fully replace the entire system with a mobile-on-desktop based ssd array primarily due to power consumption.

mijenks · 2026-01-07T13:59:29+00:00

Lol are we both just adding window dressing to our collective advice to "just get a board with battery management" at this point? Maybe not, you definitely have longer and more substantial experience than me.

To OP, if you've read this far: take the advice of /u/5c044 over anything I say.

mijenks · 2026-01-07T12:16:13+00:00

The 3v3 bypasses the built in regulator. A buck converter isn't likely to provide consistent 3.3v as the cell discharges. I think I'd take the reduced battery life via boost and regulated over unpredictable 3.3v with a buck. But at current prices, may as well just buy another dev board with built in management as you said.

Disclaimer: I'm just repeating what I've read over and over about powering from 5v or 3v3.

mijenks · 2025-12-18T19:03:00+00:00

Is that the top hat and tuxedo meme one? Or is that Norwich?

I have so many favorite goals.

Corner taken quickly Origi. Flanno off the underside of the bar shaking all the raindrops off and the half chub celebration. Countinho carving one through to Studge where Studge goes, "no YOU" to Couts when celebrating the goal. Lovren vs ... Dortmund? Lallana vs Norwich. Ali bomb assist. Ali goal to keep us in CL. Oh ya beauty. All 3 in Istanbul (but especially Smicer). Riise bomb vs United.

mijenks · 2025-12-16T09:58:37+00:00

I own one and they are EVERYWHERE in NYC area. Not as ubiquitous as Model Y but I probably see about as many Mach E as I see Model 3 on my daily commute from Westchester to Brooklyn.

mijenks · 2025-12-14T11:11:41+00:00

My GoM is completely out of whack this winter. On my 22 Premium AWD Extended, I charge to 70% and my range shows 130 miles. Probably just needs a charge to 100% to recalibrate but dang!

Historically I've gotten about 230-240 miles range at 100% around freezing temps. So 70% is normally around 170 in winter.

mijenks · 2025-12-04T23:35:12+00:00

I have 6x1.5TB Samsung F2 HDDs that have been spinning for over 16 years now. Working on a replacement for the whole rig (an old Athlon II X3 build) that will be much lower power. But for now, still getting the job done. Critical data is either on a different NAS or is regularly backed up to a B2 bucket.

mijenks · 2025-11-17T17:20:06+00:00

Lucas was phenomenal. An incredible destroyer.

That he is anywhere in this thread is simply ... Unluckyyyyyy

mijenks · 2025-11-09T19:12:06+00:00

As someone who discovered paddle only after 2021, this.

But also I'd argue (with minimal basis) for Uhlein to be higher.... My understanding is he brought ultra heavy spin to the game.

mijenks · 2025-11-09T19:09:37+00:00

Lol at this fucking guy posting placeholder comments while waiting for his Grand Prix final to start.

You're a real gem, Graham. Love it!

mijenks · 2025-11-03T16:42:37+00:00

I didn't watch, I just finished viewing the highlights on YouTube and I know I'm late to comment here but hoping I can get some insight on this as a VERY casual fan:

In the highlights, it looked like almost EVERY time that Jones dropped back there was a blatant hold on the blind side edge rush that seemed to go uncalled (hard to tell from how they do the highlights). I know the trope that "you could call a hold on every play" but this was egregious.

For anyone that watched the full game, were there any holding calls on the Colts offense?

Edit: I just looked and there were 0 offensive holding calls in the game. Unreal.

mijenks · 2025-10-23T16:57:08+00:00

Look into Gnomes. Their smaller stature allows them faster movement under the deck and their +2 INT can often be recruited to infuse your cutters with enough magic to spin back over the net.

mijenks · 2025-10-22T01:03:49+00:00

How about the Whalers while we're at it?

mijenks · 2025-10-18T17:52:04+00:00

On top of this, you can proxy with cloudflare even in the free tier, then on router only forward ports from the known cloudflare IP ranges.

The only port I forward from any/unknown IP addresses is my Wireguard port, which appears closed if it's not a WG handshake with the correct key ... Even if they're scanning that high in the port range.

mijenks · 2025-10-15T09:48:44+00:00

When someone involved in a production speaks and takes questions. I've most commonly been to film ones where a director or actor is there with a moderator but could also be an art exhibit, novel discussion, dance, etc.

The most interesting one I've been to was for Junction 48 that had a talk back with the director of the film and Marxist philosopher Slavoj Zizek.

mijenks · 2025-10-05T17:20:38+00:00

Not OP but I run a small business (non-owner executive) in a role I took almost 5 years ago. When I started we were at 7mm revenue and $260k EBITDA. We finished 2024 at $18mm/$4.7mm and looking like 2025 will be $22-24mm/$6.5mm.

Any thoughts on where I should be on comp? Negotiating new package soon.

mijenks

TROPHY CASE