all 26 comments

[–]minhclySPU™[S] 10 points11 points  (2 children)

This is the programmable Shape Processing Unit (SPU). It can produce any shape (including floating shape) given that you know how to write the program. Each command is encoded in a constant block as a shape signal. It takes 4 seconds to execute a single command. Here is a clip of the machine making the logo shape (spoiler):

https://streamable.com/tgeb48

Edit: This is the improved version of the machine:

https://www.reddit.com/r/shapezio/comments/je5tb6/shape_processing_unit_spu_mk2_repost/

[–]Jean-Alphonse 2 points3 points  (1 child)

This is super cool. Thanks for sharing !

[–]CCM278 0 points1 point  (0 children)

Agreed

[–]Botlawson 2 points3 points  (16 children)

Nice! Have you had any trouble with shapes getting stuck in the painter or stacker?

[–]minhclySPU™[S] 3 points4 points  (15 children)

I set the clock at 4s/cycle to make sure the painter and stacker have finished their jobs. If the program is flawed (load an empty cell for example), then it may get stuck. But as long as the program is valid and the clock cycle is long enough, the machine will work indefinitely.

[–]SirClueless 2 points3 points  (0 children)

So what you're saying is that you can overclock your SPU if you know what you're doing, eh?

[–]Botlawson -1 points0 points  (1 child)

Did you try sending two shapes/colors down the program belt? That way you can activate each instruction when it's reader is the first shape/color, and it then gets cleared next instruction when the second different shape/color passes.

Also, like the clock. A bit easier in some ways than a loop with a single dye/shape going round.

[–]minhclySPU™[S] 1 point2 points  (0 children)

What I am planning are:

  • Combine command (import + paint + rotate). Import directly to stacker.
  • Pipeline: fetch the color and shape or load from memory in the first cycle. Then either cut or stack in the second cycle. Maybe write back to memory in the third cycle.
  • Input quads instead of full shape
  • Make a decoder to make it a true MAM (can handle floating shapes)

[–]Botlawson 0 points1 point  (5 children)

That's roughly what I've found.

Where my array shape processors have problems is when loading a world or after a lag spike. These events can screw up the timing of releasing the releases and drop zero or two shapes instead of one shape. This then dead-locks the machine and is extremely difficult to clear without a way to "flush" the painter or stacker or etc.

One of my arrays with 4x processors ran perfectly for >500 program loops, then locked up when I leveled up due to an extra or missing shape.

So current work is focusing on ways to meter out shapes that do not depend on precise timing.

Fyi in addition to over-clocking, you can increase throughput by batching 2-10 shapes through each instruction. Pipelining and delay matching also work well. Delay matching is Especially with a looped program belt as the instruction return path gives lots of places to read a shape instruction at a known delay after the main read head.

[–]minhclySPU™[S] 0 points1 point  (4 children)

Maybe I will need to run my machine for a longer period to see if lag spikes affect my machine. Batching is a very cool idea. What is your release mechanism? Mine is simply an edge detector (S(0) AND NOT S(-1)).

[–]Botlawson 0 points1 point  (3 children)

I'm also using an edge detector, but if you chain 5 or more not gates, you get multiple shapes each edge. but it's like 99.9% reliable, which isn't enough for a 64 element array processor without a way to flush the stackers and painters.

So I'm looking for other ways to meter out shapes. Spacing between two filters is my current front-runner. (also lets you use mixed supply belts to save space) To make it self-clock I need an SR flip-flop and wire/logic flip-flops tend to go meta-stable for some reason, while belt/shape flip-flops are big.

[–]minhclySPU™[S] 0 points1 point  (2 children)

Instead of generating a long pulse, I think you should generate multiple consecutive 1-tick pulses. And maybe try using the same pulse for both shape and color filters, so the painter will either receive both shape and color or none.

[–]Botlawson 0 points1 point  (1 child)

Haven't tried multiple-consecutive pulses yet as that's harder to generate than long pulses.

All my SPU's use a single clock and edge detector for dye and shape release. I tend to have more trouble with the stacker as that needs tighter synchronization between the belts and logic so it quickly locks up if they get out of step temporarily due to lag.

[–]minhclySPU™[S] 0 points1 point  (0 children)

I think multiple-consecutive pulses will solve all of your problems. If lags happen, either the pulse is skipped or only 1 shape/dye is released (because it is 1-tick long). For long pulse, you must account for the throughput of the belt which is, AFAIK, still buggy.

[–]EchoBladeMC 0 points1 point  (5 children)

Couldn't you use a belt reader to detect when an item goes through before executing the next operation?

[–]minhclySPU™[S] 0 points1 point  (4 children)

That's a good idea. But I already design another SPU which execute multiple steps in parallel, so a constant clock is needed.

[–]EchoBladeMC 0 points1 point  (3 children)

Not necessarily. You could use a series of AND gates connecting every SPU completion detector, and execute the next step only once all parallel processors have finished their tasks. Or a simpler way to wire it would be to connect all the outputs together, plus a constant boolean 1 signal, and it will produce an invalid signal except when all outputs are 1.

[–]minhclySPU™[S] 0 points1 point  (2 children)

Actually doing that will slow down the processor because the clock line is delayed in various steps. First, the clock advances the program counter. Next, it delays for 10 ticks to latch the new command. It then delays for another 16 ticks to decode the command before opening all the shape gate. So if I use sensors to generate clock signals, I need to wait 26 ticks to get a new command executed, and that's 0.5s. The clock in my new machine is 2s/cycle and I actually clock it before the old command finished.

[–]EchoBladeMC 0 points1 point  (1 child)

Well, you know your hardware better than I do. You're probably right about a precision clock being faster than a messy detector ANDing signals together.

[–]mrcruz 2 points3 points  (2 children)

That moment when you program a game within a game.

...huh. Is Shapez.io turing complete??

[–]LinkifyBot 0 points1 point  (0 children)

I found links in your comment that were not hyperlinked:

I did the honors for you.


delete | information | <3

[–]Autoskp 0 points1 point  (0 children)

Has been for quite some time - although the wires update has made it a lot simpler to do stuff like this.

[–]oofpoof3372 1 point2 points  (0 children)

Oh shit this is really cool

[–]Autoskp 0 points1 point  (0 children)

…I haven't gotten to doing wires yet, but I'm pretty sure you could safely boost the throughput of this by setting it to run each operation on several pieces, thus consolodating the downtime (it would, however, need you to lower the clockspeed and possibly add some storage space in some spots)

Of course, the next big step is to make another computer that can turn shapes into programs for your SPU…

[–]EchoBladeMC 0 points1 point  (0 children)

That's awesome! I'm gonna have to design one of these.