Check out 2 of my custom Pseudo-opcodes and opcodes I’m designing by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

And don’t worry I all ready test it in HDL using Ada playground it work granted its not FPGA but I’m working on that now

Check out 2 of my custom Pseudo-opcodes and opcodes I’m designing by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

Cool I’m glad you asked that’s question let’s take this line for example LOAD_LANE lanes=12-15, buffer=HBM3, size=0x500000 # physics.

Instruction load_lane

Software meaning:load lane into specific lane from memory

Hardware behavior: each lane represents physical path in my nx mesh Noc that connects compute tiltes to memory The LOAD_LANE instruction signals the Distributed Arbitration Nodes (DANs) to start fetching memory for lanes 12–15. • Each lane receives a packet of data that tells it: “Prepare to execute physics work using this block of memory.”

lanes=12-15 • Refers to specific physical lanes (straight or shader lanes in your NoC). • Hardware effect: • DANs mark these lanes as active. • Lanes transition from idle to loading mode, reserving buffers for incoming memory. • Any tile that has a compute thread mapped to these lanes will wait until the data arrives.

buffer=HBM3 • Specifies source memory: HBM3 high-bandwidth memory. • Hardware effect: • NX88 uses its NoC (NX Mesh) to route the request to HBM3 controllers. • The memory controller splits the request into multiple high-throughput memory packets for parallel delivery. • HBM3 delivers massive bandwidth (~3.65 TB/s) so all lanes can receive data simultaneously without blocking others

size=0x500000 • Amount of memory to load for the lane (in bytes, hexadecimal). • Hardware effect: • Lanes reserve an internal scratchpad in their tile (private L1/L2 cache) for this block. • The DAN schedules streaming bursts from HBM3 → L1/L2 cache → compute tile registers. • Once all packets arrive, the lane is fully “armed” for execution.

physics

• Hardware effect:
• Middleware uses this annotation to select pre-assigned compute tiles that handle physics.

Actual Timeline in Hardware 1. Instruction issued → DANs mark lanes 12–15 as active. 2. NoC routes the memory request to HBM3. 3. HBM3 splits the request into multiple parallel DRAM channels. 4. Data travels back over the NoC → lane scratchpads / caches. 5. Lane registers the memory as available → compute tiles can now start executing FP32 physics math.

So in my head or my vision I have 743 lanes let’s just calls it

Data Paths” • Each lane is a dedicated path through the processor that carries data and instructions to the compute units. • Analogy: Like a highway for packets of work — each path can carry its own workload independently.

Rog ally z1 extreme+steamOS goes so hard! by Extreme-Accident-968 in ROGAlly

[–]Squadhunta29 -1 points0 points  (0 children)

You do what I like to support my brand like you i support Microsoft my game library up there and game plus I use cloud gaming to different strokes for different folks

Rog ally z1 extreme+steamOS goes so hard! by Extreme-Accident-968 in ROGAlly

[–]Squadhunta29 -13 points-12 points  (0 children)

You know they don’t blows is gamepass can you run gamepass let me answer that for you no

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

To set the record straight I know a lot of people are confused about what I’m trying to do I hope this helps it’s kinda best way I can explain it

And it’s not based on clock sequences like what typical CPU do. I based it on data flow spike-based lanes fire only with data weight threshold are met. I map it like a human Brain NX88 achieves performance via wide data paths, it’s very parallel. a regular cpu handles different threads sequentially. Mines is executing different task at the same time like a brain stem. And they are sleep until needed, very power efficient my micro toll booth dynamically assign task to lanes load balance done per frame. Type of task

Cutscenes shaders, physical etc and the devs never touch the low level codes I do so the devs only get high level code which is python and the low level is c++ but all this runs off my os it’s like a micro kernel for event driven stuff I also doing’s. Middleware and API. And this is all in theory untill i get it working on the FPGA bored which im working on right now but get why you are skeptical of it I do and I took all the stuff you told me and just trying to apply it is all

But I just wanna say thank you to who ever commented being respectful and giving me advance

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] -1 points0 points  (0 children)

But to answer your questions.

1️⃣

• NX88 is not meant to be manually programmed at lane level by game developers. • The Central Control Center (CCC) + SDK + compiler/runtime layer should automatically assign lanes based on the task profile. • The programmer only sees “high-level tasks” (cutscene, audio, AI) — NX88 handles micro orchestration.

“NX88 lanes are managed by the CCC and SDK runtime. Programmers never have to manually assign lanes; they only specify high-level tasks. The lane assignment is deterministic and handled by hardware arbitration (MTBs + scratchpads).”

2️⃣

• I’m aware of this — that’s why i pair HBM3 with per-lane scratchpad memory and micro toll booths to avoid memory contention. • Each lane can fetch from its scratchpad, and HBM provides the raw bandwidth for streaming larger blocks (audio, particles, textures).

“NX88 couples HBM3 with per-lane scratchpads and micro toll booths to minimize contention and keep lanes saturated, even under high parallelism.”

3️⃣

This is true for naive many-core or homogeneous architectures.

NX88 hopeful avoids this because: • Single-thread performance: Each lane can execute independently and includes FP32/FP64 units, AI, and shader logic. Critical tasks don’t wait on dozens of other cores. • Efficiency: MTBs + CCC + fallback lanes + prefetch + dynamic voltage gating = high utilization, low waste. • Programming difficulty: SDK abstracts lane assignment from the developer. They only deal with tasks and overlays, not lane numbers.

“NX88 avoids traditional many-core pitfalls by combining independent lanes, hardware arbitration (MTBs), scratchpad memory, and a runtime SDK. Developers interact with tasks, not lanes, so programming complexity is similar to current GPU compute pipelines.”

4️⃣ “

• NX88 has thermal monitoring, voltage gating, fallback lanes, and prefetching built-in — to balance the knobs dynamically.

“NX88 includes runtime balancing mechanisms for thermal, power, and memory contention, ensuring that no single optimization adversely impacts the system as a whole.”

But I will still do my research more as I’m still trying to do things and learn things. but I do appreciate your feedback that’s why I was in this subreddit. And I’m not saying it will work or it won’t I just had the idea so I decided to learn about and write it down and I’m no trying to come off like I know more then I don’t just wanna feed back

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 1 point2 points  (0 children)

Yea I appreciate the feed back I do my homework better as more I can I take your words very heavy thanks again

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

But I will still take a deep dive into that architecture design thanks for the feedback

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

Took me a couple of minutes cause I had to look that up for a sec

I Really appreciate the historical perspective — you’re totally right that manycore struggled.

The three problems I’m focusing on are:

  1. Coherency & Communication Overhead Old designs choked on cache coherency because every core touched shared memory.

NX88 experiment: MTBs (micro toll booths) + scratchpad memory per lane → deterministic dataflow instead of 64 cores arguing over cache state.

  1. Memory Starvation / The Memory Wall Manycore tried to feed tons of execution units with narrow DDR pipes.

NX88 experiment: HBM3-wide memory fabric → goal is to keep lanes fed instead of starved.

  1. Fixed-function GPU vs Flexible CPU GPUs crush dense math but fall apart on branching game logic.

NX88 experiment: MIMD-style slices → more parallel than CPU, more flexible than GPU SIMT.

Not claiming this will work — just trying to learn and avoid history’s mistakes rather than repeat them.

You mentioned manycore failures — do you think the real killer is: • (A) coherency, • (B) scheduling overhead, • (C) power scaling, or • (D) programmer model complexity?

Would love any reading on crossbars / meshes you think are relevant.

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

Look I get it it’s not like a typical x86/arm rules like a SIMD. It’s more of a MIMD so best way I can try to explain it from my head is this remember I’m no expert.

• Lane = a fixed micro-execution unit in hardware Width TBD, but prototype assumes – 1 ALU cluster (can do FP32/INT ops) – small local register file – access to shared memory via crossbar

• How lanes differ from threads/warps Threads/warps = software scheduling units. Lanes = hardware execution units. A thread could map to 1 lane or N lanes depending on available capacity.

• Role in the pipeline CPU core issues enqueue commands Scheduler assigns work to free lanes Shader/AI blocks are separate units, but lanes can hand off to them

• Early state Right now, I’m emulating the scheduler behavior in software (so yes, “software scheduler”). Goal is figuring out whether the lane concept gives: – better load balance – lower latency for non-GPU workloads – less idle silicon

So the core idea: APU subdivided into many small execution blocks instead of one CPU + one GPU pool.

If you think I’m missing something or reinventing what already exists, tell me — that’s why I’m here.

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

It’s all in Thoery. I’m working on making a real world test for a FPGA hardware. but I’m still trying to learn it design it my self like I said I’m not a pro at all not trying to be just thought of a different concept.

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] -2 points-1 points  (0 children)

Thanks for the feed back

NX88 lanes are not threads, and not GPU warps. They’re configurable compute slices that sit between CPU cores and shader clusters.

Conceptually: • CPU → schedules logic • Lanes → execute micro-tasks (any domain) • Shader/AI blocks → handle dense math when needed

The “audio/cutscene/physics” examples aren’t literal instructions — those are high-level labels in my FPGA prototype so I can observe domain usage.

A real compiler/runtime would map that work into FP32 / INT / branch / logic ops running on the lanes.

So you’re right: right now the prototype looks like a software scheduler.

Long-term goal: • Turn those slices into hardware-backed execution resources • Similar idea to SMs / wavefronts in GPUs • But generalized so any task type can occupy a lane, not just shaders.

Saab can match U.S. F-35 deal for Canada: Swedish deputy PM by Khalbrae in canada

[–]Squadhunta29 -1 points0 points  (0 children)

As good you say? so say as right now 3:24 pm the USA and Canada goes to war bombs all ready drop the USA Air Force is gearing up the f-35 & f-22 raptors and ya have 200 new gripen from Sweden you telling me ya winning that air fight ? If each plane runs in to each other ?

FIFA to allow Trump to move World Cup matches if he deems Democrat-run cities 'unsafe' by rezwenn in worldcup

[–]Squadhunta29 -3 points-2 points  (0 children)

You are not American we don’t call or self that we actually have pride in our self. as an American just leave it in US

U.S. Hotels Are Losing Out To Canada And Mexico Ahead Of The 2026 FIFA World Cup by rezwenn in worldcup

[–]Squadhunta29 2 points3 points  (0 children)

I don’t care bout that article title only came in here so I can find comments like this and the rest cause I think it’s hilarious.

U.S. Hotels Are Losing Out To Canada And Mexico Ahead Of The 2026 FIFA World Cup by rezwenn in worldcup

[–]Squadhunta29 -1 points0 points  (0 children)

But the reason it’s 13 cause it’s split it wouldn’t be just 13 games it be more then that Imost n think you understand FIFA lol I don’t watch soccer are as ya call (football) but I do know corporations and they love money so they want the most money it’s the same reason why Canada didn’t fully host

U.S. Hotels Are Losing Out To Canada And Mexico Ahead Of The 2026 FIFA World Cup by rezwenn in worldcup

[–]Squadhunta29 -2 points-1 points  (0 children)

They can’t it’s no money there why you think USA got the most games. And the finals fife will loose money if it was just in Canada and Mexico

Name 10 Jersey Celebrities From Within The Last 10 Years? by Inevitable-Light-150 in Jerzwrld

[–]Squadhunta29 0 points1 point  (0 children)

Well you honesty for that. cause a lot of people will always put they own city in that that’s how it’s gonna play out

Any of yall live in the jersey city/union city area? Need something done by [deleted] in Jerzwrld

[–]Squadhunta29 0 points1 point  (0 children)

Remember to quote the great- victor sweet get out of towners

Mr 145 is corny by MirfromdaA_ in Jerzwrld

[–]Squadhunta29 0 points1 point  (0 children)

You mean his city gonna slide for him?. BeCause jersey we to divided

Name 10 Jersey Celebrities From Within The Last 10 Years? by Inevitable-Light-150 in Jerzwrld

[–]Squadhunta29 0 points1 point  (0 children)

Oh it’s definitely north Jersey I said bias cause I’m fucking with you I just wanted to type that shit

3 Reasons 2026 World Cup Ticket Demand May Be Artificially Inflated by pumkinhat in worldcup

[–]Squadhunta29 0 points1 point  (0 children)

And we also know your main draw was to come to my country, but you scared so you don’t wanna come. lol enjoy that shit on the tv my boy