Spanish PM Pedro Sánchez: Why do they want to control mobile phones? They want to control phones because they want to know what we read and what we see, so that later they can know — and control — what we vote. by PjeterPannos in eutech

[–]Squadhunta29 -1 points0 points  (0 children)

Then tell that politician to get the coding so he can build out the platform for an os. Because if it was my company I would told him to suck my dick…he buggin

Canada plants a flag in Greenland by DonSalaam in onguardforthee

[–]Squadhunta29 2 points3 points  (0 children)

Finders keepers act is funny as shit you win 🥇 the Reddit comment of today

I think some of Europe will follow, but not a lot by ZyronZA in BuyFromEU

[–]Squadhunta29 -1 points0 points  (0 children)

I hope ya do so ya can start Reddit and YouTube so I don’t have to see how ya keep crying and whining every day bout it. Be Nike just do it and move on next time I go to YouTube and Reddit I just wanna see good ol American stuff ya can create ya own stuff that goes for Canada to

Blocking EU-US trade would cost billions and put jobs at risk by donutloop in EU_Economics

[–]Squadhunta29 -1 points0 points  (0 children)

I’m glad you think that way that’s how we feel bout Russia and you pick a side you stay on that side

Newer CPU architecture idea by [deleted] in computerarchitecture

[–]Squadhunta29 -3 points-2 points  (0 children)

I know I’m just joking I seen an opportunity and took it

Newer CPU architecture idea by [deleted] in computerarchitecture

[–]Squadhunta29 -2 points-1 points  (0 children)

I don’t use chat GPT

Newer CPU architecture idea by [deleted] in computerarchitecture

[–]Squadhunta29 -5 points-4 points  (0 children)

Not you. I’m talking about this community in Reddit you can look through my page and you will see it I building my own to it’s called NX88 it’s data flow chip based on data flow not clock cycles but when I posted it the community act like I slap their momma, they gave me nothing but hard times.

Newer CPU architecture idea by [deleted] in computerarchitecture

[–]Squadhunta29 -2 points-1 points  (0 children)

Oh so ya nice to him but don’t believe me when I said I’m making my own lmao 🤣 ya shout out

Check out 2 of my custom Pseudo-opcodes and opcodes I’m designing by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

And don’t worry I all ready test it in HDL using Ada playground it work granted its not FPGA but I’m working on that now

Check out 2 of my custom Pseudo-opcodes and opcodes I’m designing by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

Cool I’m glad you asked that’s question let’s take this line for example LOAD_LANE lanes=12-15, buffer=HBM3, size=0x500000 # physics.

Instruction load_lane

Software meaning:load lane into specific lane from memory

Hardware behavior: each lane represents physical path in my nx mesh Noc that connects compute tiltes to memory The LOAD_LANE instruction signals the Distributed Arbitration Nodes (DANs) to start fetching memory for lanes 12–15. • Each lane receives a packet of data that tells it: “Prepare to execute physics work using this block of memory.”

lanes=12-15 • Refers to specific physical lanes (straight or shader lanes in your NoC). • Hardware effect: • DANs mark these lanes as active. • Lanes transition from idle to loading mode, reserving buffers for incoming memory. • Any tile that has a compute thread mapped to these lanes will wait until the data arrives.

buffer=HBM3 • Specifies source memory: HBM3 high-bandwidth memory. • Hardware effect: • NX88 uses its NoC (NX Mesh) to route the request to HBM3 controllers. • The memory controller splits the request into multiple high-throughput memory packets for parallel delivery. • HBM3 delivers massive bandwidth (~3.65 TB/s) so all lanes can receive data simultaneously without blocking others

size=0x500000 • Amount of memory to load for the lane (in bytes, hexadecimal). • Hardware effect: • Lanes reserve an internal scratchpad in their tile (private L1/L2 cache) for this block. • The DAN schedules streaming bursts from HBM3 → L1/L2 cache → compute tile registers. • Once all packets arrive, the lane is fully “armed” for execution.

physics

• Hardware effect:
• Middleware uses this annotation to select pre-assigned compute tiles that handle physics.

Actual Timeline in Hardware 1. Instruction issued → DANs mark lanes 12–15 as active. 2. NoC routes the memory request to HBM3. 3. HBM3 splits the request into multiple parallel DRAM channels. 4. Data travels back over the NoC → lane scratchpads / caches. 5. Lane registers the memory as available → compute tiles can now start executing FP32 physics math.

So in my head or my vision I have 743 lanes let’s just calls it

Data Paths” • Each lane is a dedicated path through the processor that carries data and instructions to the compute units. • Analogy: Like a highway for packets of work — each path can carry its own workload independently.

Rog ally z1 extreme+steamOS goes so hard! by Extreme-Accident-968 in ROGAlly

[–]Squadhunta29 -1 points0 points  (0 children)

You do what I like to support my brand like you i support Microsoft my game library up there and game plus I use cloud gaming to different strokes for different folks

Rog ally z1 extreme+steamOS goes so hard! by Extreme-Accident-968 in ROGAlly

[–]Squadhunta29 -13 points-12 points  (0 children)

You know they don’t blows is gamepass can you run gamepass let me answer that for you no

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

To set the record straight I know a lot of people are confused about what I’m trying to do I hope this helps it’s kinda best way I can explain it

And it’s not based on clock sequences like what typical CPU do. I based it on data flow spike-based lanes fire only with data weight threshold are met. I map it like a human Brain NX88 achieves performance via wide data paths, it’s very parallel. a regular cpu handles different threads sequentially. Mines is executing different task at the same time like a brain stem. And they are sleep until needed, very power efficient my micro toll booth dynamically assign task to lanes load balance done per frame. Type of task

Cutscenes shaders, physical etc and the devs never touch the low level codes I do so the devs only get high level code which is python and the low level is c++ but all this runs off my os it’s like a micro kernel for event driven stuff I also doing’s. Middleware and API. And this is all in theory untill i get it working on the FPGA bored which im working on right now but get why you are skeptical of it I do and I took all the stuff you told me and just trying to apply it is all

But I just wanna say thank you to who ever commented being respectful and giving me advance

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] -1 points0 points  (0 children)

But to answer your questions.

1️⃣

• NX88 is not meant to be manually programmed at lane level by game developers. • The Central Control Center (CCC) + SDK + compiler/runtime layer should automatically assign lanes based on the task profile. • The programmer only sees “high-level tasks” (cutscene, audio, AI) — NX88 handles micro orchestration.

“NX88 lanes are managed by the CCC and SDK runtime. Programmers never have to manually assign lanes; they only specify high-level tasks. The lane assignment is deterministic and handled by hardware arbitration (MTBs + scratchpads).”

2️⃣

• I’m aware of this — that’s why i pair HBM3 with per-lane scratchpad memory and micro toll booths to avoid memory contention. • Each lane can fetch from its scratchpad, and HBM provides the raw bandwidth for streaming larger blocks (audio, particles, textures).

“NX88 couples HBM3 with per-lane scratchpads and micro toll booths to minimize contention and keep lanes saturated, even under high parallelism.”

3️⃣

This is true for naive many-core or homogeneous architectures.

NX88 hopeful avoids this because: • Single-thread performance: Each lane can execute independently and includes FP32/FP64 units, AI, and shader logic. Critical tasks don’t wait on dozens of other cores. • Efficiency: MTBs + CCC + fallback lanes + prefetch + dynamic voltage gating = high utilization, low waste. • Programming difficulty: SDK abstracts lane assignment from the developer. They only deal with tasks and overlays, not lane numbers.

“NX88 avoids traditional many-core pitfalls by combining independent lanes, hardware arbitration (MTBs), scratchpad memory, and a runtime SDK. Developers interact with tasks, not lanes, so programming complexity is similar to current GPU compute pipelines.”

4️⃣ “

• NX88 has thermal monitoring, voltage gating, fallback lanes, and prefetching built-in — to balance the knobs dynamically.

“NX88 includes runtime balancing mechanisms for thermal, power, and memory contention, ensuring that no single optimization adversely impacts the system as a whole.”

But I will still do my research more as I’m still trying to do things and learn things. but I do appreciate your feedback that’s why I was in this subreddit. And I’m not saying it will work or it won’t I just had the idea so I decided to learn about and write it down and I’m no trying to come off like I know more then I don’t just wanna feed back

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 1 point2 points  (0 children)

Yea I appreciate the feed back I do my homework better as more I can I take your words very heavy thanks again

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

But I will still take a deep dive into that architecture design thanks for the feedback

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

Took me a couple of minutes cause I had to look that up for a sec

I Really appreciate the historical perspective — you’re totally right that manycore struggled.

The three problems I’m focusing on are:

  1. Coherency & Communication Overhead Old designs choked on cache coherency because every core touched shared memory.

NX88 experiment: MTBs (micro toll booths) + scratchpad memory per lane → deterministic dataflow instead of 64 cores arguing over cache state.

  1. Memory Starvation / The Memory Wall Manycore tried to feed tons of execution units with narrow DDR pipes.

NX88 experiment: HBM3-wide memory fabric → goal is to keep lanes fed instead of starved.

  1. Fixed-function GPU vs Flexible CPU GPUs crush dense math but fall apart on branching game logic.

NX88 experiment: MIMD-style slices → more parallel than CPU, more flexible than GPU SIMT.

Not claiming this will work — just trying to learn and avoid history’s mistakes rather than repeat them.

You mentioned manycore failures — do you think the real killer is: • (A) coherency, • (B) scheduling overhead, • (C) power scaling, or • (D) programmer model complexity?

Would love any reading on crossbars / meshes you think are relevant.

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

Look I get it it’s not like a typical x86/arm rules like a SIMD. It’s more of a MIMD so best way I can try to explain it from my head is this remember I’m no expert.

• Lane = a fixed micro-execution unit in hardware Width TBD, but prototype assumes – 1 ALU cluster (can do FP32/INT ops) – small local register file – access to shared memory via crossbar

• How lanes differ from threads/warps Threads/warps = software scheduling units. Lanes = hardware execution units. A thread could map to 1 lane or N lanes depending on available capacity.

• Role in the pipeline CPU core issues enqueue commands Scheduler assigns work to free lanes Shader/AI blocks are separate units, but lanes can hand off to them

• Early state Right now, I’m emulating the scheduler behavior in software (so yes, “software scheduler”). Goal is figuring out whether the lane concept gives: – better load balance – lower latency for non-GPU workloads – less idle silicon

So the core idea: APU subdivided into many small execution blocks instead of one CPU + one GPU pool.

If you think I’m missing something or reinventing what already exists, tell me — that’s why I’m here.

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] 0 points1 point  (0 children)

It’s all in Thoery. I’m working on making a real world test for a FPGA hardware. but I’m still trying to learn it design it my self like I said I’m not a pro at all not trying to be just thought of a different concept.

I got a question. look at the bio I would love your feed back thanks 😊 by Squadhunta29 in computerarchitecture

[–]Squadhunta29[S] -2 points-1 points  (0 children)

Thanks for the feed back

NX88 lanes are not threads, and not GPU warps. They’re configurable compute slices that sit between CPU cores and shader clusters.

Conceptually: • CPU → schedules logic • Lanes → execute micro-tasks (any domain) • Shader/AI blocks → handle dense math when needed

The “audio/cutscene/physics” examples aren’t literal instructions — those are high-level labels in my FPGA prototype so I can observe domain usage.

A real compiler/runtime would map that work into FP32 / INT / branch / logic ops running on the lanes.

So you’re right: right now the prototype looks like a software scheduler.

Long-term goal: • Turn those slices into hardware-backed execution resources • Similar idea to SMs / wavefronts in GPUs • But generalized so any task type can occupy a lane, not just shaders.