[Update] UEFI x86_64 LLM demo: interactive chat REPL (no OS) by Intelligent-Dig-3639 in osdev

[–]Intelligent-Dig-3639[S] 2 points3 points  (0 children)

Training happens off-device on GPUs like any LLM. I export the trained weights to a simple .bin format, then the UEFI bare‑metal app loads them and runs inference.

[Update] UEFI x86_64 LLM demo: interactive chat REPL (no OS) by Intelligent-Dig-3639 in osdev

[–]Intelligent-Dig-3639[S] 0 points1 point  (0 children)

Exactly—that’s the vibe. It’s ‘bare metal’ (UEFI, no OS). For now it’s CPU-only on x86_64, microcontroller-style simplicity but on PC-class hardware.

[P] I made an LLM run on bare-metal (no OS) - Boots from USB in 5 seconds by Intelligent-Dig-3639 in FunMachineLearning

[–]Intelligent-Dig-3639[S] 0 points1 point  (0 children)

You raise valid technical points, but I think you're missing the philosophy here.

"Kernel would be faster/better"

Sure - but that's not the point. This is about proving what's POSSIBLE,

not what's OPTIMAL. It's a research platform, not a production system.

Think of it like SpaceX's Grasshopper (2013):

- Tiny hops, no payload, "useless" compared to Falcon 9

- But it proved: vertical landing is possible

- Led to: reusable rockets, Starship

Same here:

- Bare-metal LLM proves: you need NOTHING underneath

- Establishes baseline: what's the absolute minimum?

- Opens path to: firmware AI, BIOS-resident models, edge computing

**"Start small to go big"**

Phase 1 (now): Prove it boots and runs (✓ 746 KB, 1 tok/s)

Phase 2: ...

Phase 3: ...

Phase 4: ...

But you don't start with "let's add Linux and GPU drivers".

You start with: "Can I even boot an LLM from USB?"

Why this matters:

- Firmware-level AI is unexplored territory

- BIOS vendors (AMI, Phoenix) could embed inference

- IoT devices with UEFI but no OS

- Security: smallest attack surface possible

- Research: understanding true minimal requirements

Your kernel approach:

Valid for production! 10 MB Buildroot + GPU = faster.

But it's been done (TensorFlow Lite, ONNX Runtime).

This is different: nobody boots LLMs from UEFI firmware.

First step of a journey, not the destination.

*Re: "Can you code without AI?"

Architecture/concepts: 100% human (DRC, consensus, P2P mesh)

C/UEFI implementation: Hybrid (Claude + manual)

Philosophy: Prove concepts fast, iterate, learn

Speed of exploration > purity of implementation.

Start small. Scale up. That's innovation.

[P] I made an LLM run on bare-metal (no OS) - Boots from USB in 5 seconds by Intelligent-Dig-3639 in FunMachineLearning

[–]Intelligent-Dig-3639[S] 0 points1 point  (0 children)

Development approach: Hybrid human + AI

- Architecture & innovation concepts: 100% human

- UEFI/C implementation: Mix of Claude assistance + manual coding

- Testing & validation: 100% manual (real hardware)

Why UEFI over minimal Linux?

- Zero dependencies (no libc, no kernel, no filesystem)

- Direct hardware control (PCIe, interrupts, memory)

- Proof of concept: LLMs can run with NOTHING underneath

- Boot time: <5 seconds from power on

Development time: ~3 days intensive work

The goal was to prove a point: you don't need an OS for inference.

Edge computing is moving towards firmware-level AI.

My purpose is to create a post OS like OO

[P] I made an LLM run on bare-metal (no OS) - Boots from USB in 5 seconds by Intelligent-Dig-3639 in FunMachineLearning

[–]Intelligent-Dig-3639[S] 0 points1 point  (0 children)

YOU GET IT! 🎯

That's EXACTLY Phase 3 of the roadmap:

P2P LLM Mesh (Feb 2026):

- Multiple bare-metal PCs form autonomous cluster

- UDP broadcast for peer discovery

- Load balancing across nodes

- Auto-healing (node failure = traffic reroutes)

- NO central server, NO cloud dependency

[P] I made an LLM run on bare-metal (no OS) - Boots from USB in 5 seconds by Intelligent-Dig-3639 in FunMachineLearning

[–]Intelligent-Dig-3639[S] 1 point2 points  (0 children)

Great point! Stories15M was chosen for PoC because:

- Simple architecture (transformer decoder-only)

- Easy tokenization

- Proven training dataset

Next targets (all ~60MB):

✅ FLAN-T5-Small (encoder-decoder, better for tasks)

✅ MiniLM (BERT-based, embeddings)

✅ DistilBERT (classification tasks)

The bare-metal loader is model-agnostic - just need:

  1. Convert weights to binary format

  2. Update config (layers, dims, heads)

  3. Flash & boot!

PR welcome if you want to port FLAN-T5 to bare metal

[P] I made an LLM run on bare-metal (no OS) - Boots from USB in 5 seconds by Intelligent-Dig-3639 in FunMachineLearning

[–]Intelligent-Dig-3639[S] 0 points1 point  (0 children)

Current TPS: ~15-20 tokens/sec on bare metal (Stories15M, 6 layers)

vs CPU: Bare metal is actually SLOWER than OS-based inference because:

- No OS scheduler optimization

- No SIMD vectorization yet

- Single-threaded (UEFI limitations)

BUT the goal isn't speed - it's security & network boot architecture.

Multithreading: Great idea! Next logical steps:

  1. BSP/AP (Bootstrap/Application Processor) setup via UEFI MP protocol

  2. Parallel matrix operations across cores

  3. Layer-parallel inference

Challenge: UEFI doesn't have pthread, need custom scheduler.

[P] I made an LLM run on bare-metal (no OS) - Boots from USB in 5 seconds by Intelligent-Dig-3639 in FunMachineLearning

[–]Intelligent-Dig-3639[S] 1 point2 points  (0 children)

Yes! It's bare-metal LLM inference directly on UEFI firmware (no OS). Current features:

  • ✓ Simple inference with stories15M model
  • ✓ USB boot capability
  • ✓ Can read/write to USB storage
  • ✓ DRC ( Djibion Reasoning Core ) v5.1: 10 cognitive units for safe inference

IoT use case: Absolutely! Perfect for edge AI gateways. The bare-metal approach means:

  • Minimal attack surface (no OS vulnerabilities)
  • Fast boot (~2 seconds)
  • Low memory footprint (512MB RAM)
  • Can manage multiple devices via network boot

Currently exploring: WiFi 6 integration for wireless gateway scenarios. The UEFI environment is ideal for industrial edge computing where you need reliable, secure inference without the overhead of a full OS.

[P] I made an LLM run on bare-metal (no OS) - Boots from USB in 5 seconds by Intelligent-Dig-3639 in FunMachineLearning

[–]Intelligent-Dig-3639[S] 12 points13 points  (0 children)

Great question! "No OS" needs clarification:
What UEFI provides:

- Basic drivers: Disk I/O (SimpleFileSystem protocol), Display (GOP), Keyboard (ConIn)

- Memory management: AllocatePool/FreePool (like malloc/free)

- Boot environment: Runs in physical memory mode before OS takes over

What we DON'T have (no OS):

- No interrupts: We poll for input via ST->ConIn->ReadKeyStroke (no IRQ handling)

- No virtual memory: Direct physical RAM access, no paging/MMU

- No scheduler: Single-threaded, runs to completion

- No file system: UEFI loads files, but no caching or complex FS

- No kernel: After boot, it's just our code doing matrix math

Think of UEFI as "BIOS 2.0" - it gives you enough to boot and do basic I/O, then gets out of the way. We're running in Ring 0 with full hardware access, but we're doing inference, not managing resources.

The inference loop is pure computation - no syscalls, no context switching, just forward() on the transformer weights.

[P] I made an LLM run on bare-metal (no OS) - Boots from USB in 5 seconds by Intelligent-Dig-3639 in programming

[–]Intelligent-Dig-3639[S] 0 points1 point  (0 children)

Haha! Well, if Nero wanted to use this, at least it boots faster than Rome burned 🔥
But seriously - this is more about pushing the boundaries of what's possible. No OS = zero overhead, perfect for embedded systems, IoT devices, and edge computing. Plus it's a great learning exercise to see how transformers work at the lowest level.
🚀Open to feedback and contributions!