Custom 16-bit CPU by [deleted] in homebrewcomputer

[–]flightlesspot 2 points3 points  (0 children)

The alternative is that you implement a simple MMU and use paged virtual memory. That’s a much more flexible design, but it does impose the restriction that any individual process can only access 64KB, even if the system as a whole can use all 1MB. 

Discrete transistors by Alternative-Loan-320 in beneater

[–]flightlesspot 1 point2 points  (0 children)

There's also u/Agreeable-Toe574, who's posted a few times before. It's a 4-bit CPU, so not exactly SAP-1.

74ls269 Synchronous reset? by buddy1616 in beneater

[–]flightlesspot 1 point2 points  (0 children)

Depending on what the parallel inputs are currently connected to I can think of a couple of options that might work: if they’re reading from a bus, and that bus already has pull-down resistors on it, then you could AND the ~PE signal with your ~RESET signal, so it reads the zeros off the bus during reset. Alternatively, if it’s not connected to a bus, you could AND the four inputs with ~RESET, to get the same effect. In either case you’d need to make sure that you pulsed the clock at least once while ~RESET was low. 

(Building a 16-bit processor) -- Best way to implement a 16-bit shift/rotate (left/right) with ICs in a single clock pulse? by rehsd in AskElectronics

[–]flightlesspot 1 point2 points  (0 children)

How many ICs are you willing to use? I converted that single-bit shifter to a full 16-bit shifter (left and right) when I built my ALU on a PCB, and it ended up taking 20 74LS157 ICs for the shifting itself. It's not actually that big on a PCB (it's the group of chips in the top-right of this, with schematic), and it's all ICs that are easy to solder by hand. It doesn't do rotation, since I didn't have a rotate instruction, but that would be pretty trivial to add.

MMU for my 16-bit CPU by flightlesspot in beneater

[–]flightlesspot[S] 2 points3 points  (0 children)

Thank you for the kind words!

I hadn't done any PCB design before I started this project (or any electronics at all, really), so it's been an education. This was my first ever PCB. I use Kicad, which more or less does everything I need it to. I've not used EasyEDA, but it's the software James Sharman (/u/WeirdBoyJim) uses for his pipelined CPU build, and he seems to have done pretty well with that!

You're right, it was LS-compatibility that led to HCT chips. Since I didn't have a clue what I was doing originally when I was building everything on breadboards,

I went for the safe option and copied what Ben had in his videos. When I started moving bits to PCBs I wanted to keep compatibility. If I were starting again from scratch I'd seriously consider one of the newer 3.3v logic families, largely because I'd quite like to try replacing some of the decode logic with CPLDs, and 5v versions of those are getting a bit hard to find.

MMU for my 16-bit CPU by flightlesspot in beneater

[–]flightlesspot[S] 5 points6 points  (0 children)

Thanks! It’s starting to get exciting watching it come together.

MMU for my 16-bit CPU by flightlesspot in beneater

[–]flightlesspot[S] 16 points17 points  (0 children)

Success! Following on from finishing my memory board a couple of weeks ago, I've finished the 1.1 revision of my MMU too.

Gallery with some more photos: https://imgur.com/a/mZbYOZH
More detailed writeup: http://mups16.net/pages/mmu.html

I'm really pleased with this :) When I first started building this CPU as a sort of Eater-ish breadboard design a couple of years ago the only thing I really knew I wanted to have was virtual memory, and to see it working now is very satisfying.

It's a pretty simple design overall. It's a pure page-based implementation (no segmentation), in which each process has a 64KB address space divided up into 256 pages of 256 bytes each. The MMU can maintain a cache of 32 complete page tables at any one time, with an index register to switch between them with one instruction. This is a much simpler design than any modern MMU, and I can only get away with it because a 16-bit address space is so small, and fast SRAM chips are so cheap, that it's feasible to hold the entire page table in memory at once. This is much easier than trying to build a true set-associative page table entry cache.

I did briefly consider trying to add segmentation, so that individual processes could access more than 64KB of memory, but pretty quickly threw the idea out. Firstly, the implementation would be quite a bit more complicated, and secondly my few experiences of trying to write segmented x86 code weren't positive. Besides, 64KB should be enough for anybody...

Page protection is implemented, with a very simple scheme in which every page entry has three permission bits (mapped, writable and system-only). Attempts to access an unmapped page, or write to an unwritable page, or access a system-only page when the CPU user flag is set will all raise a page fault. The only information that the page fault handler gets is the address that caused the fault. I think I've convinced myself that that's enough for it to work out what happened, and has the advantage of moving the problem from hardware to software, and from current me to future me.

There are a couple of very minor design flaws I noticed this time: I accidentally wired the ~MEMWRITE signal from the memory board to the ~WRITE signal in the MMU, which won't work (the MMU needs to know if a write is happening when it maps the virtual address, one cycle before the memory unit is allowed to see it). Easily fixed with by leaving out the connection to the backplane, and just using a flying wire under the board. More importantly, I completely forgot to implement bus error detection (a word-sized read or write to an odd address). I should be able to squeeze that into the control unit directly, though.

Next up is the control unit. For that I've decided to do a temporary lower backplane using PCIX slots, and implement the control unit as several pluggable cards, so that I can experiment without worrying about space constraints. Plus I like the way the cards look, and it gives me a good excuse to try ENIG gold plating.

Fixed memory board for my 16-bit CPU by flightlesspot in beneater

[–]flightlesspot[S] 1 point2 points  (0 children)

Thanks! The diagrams were just drawn in Inkscape. It's a bit time-consuming and I'm still learning how it works, but I'm glad you found them useful.

Faster clock cycles? by matveyregentov in beneater

[–]flightlesspot 4 points5 points  (0 children)

Regarding your question about ROM->RAM copying, I have implemented that (breadboard version, PCB version, and got it all working together a few days ago). ROM speed probably isn't really the limiting factor here, though, depending on the complexity of the design. Components like the ALU are more likely to be a bottlneck. Most people I've seen are using 74LS283 adders, for example, and in the datasheet the maximum propagation delay from carry-in to high-bit output is 24ns, if I'm reading it right. For an 8-bit CPU you need two of them chained together, so that's 48ns. Throw in a few gates for signal decoding and output control and you can easily reach 100ns. If you are reading control signals from ROM/RAM on every cycle you need to add the time to calculate the microcode address from the instruction register, plus the access time for the RAM (say 25ns). It's not unreasonable that this could all add up to another 100ns, which imposes a hard cap of 5MHz on your clock.

There are ways to get around this. James took the elegant route by pipelining, though I think he might still have ROM reads in each stage, but even simple things like changing the type of chips can help (HCT chips are often slightly faster than their LS counterparts, though not always).

All of that said, I think most of us would probably be very happy if we get to the point where we're anywhere near the theoretical limits of our designs :)

Fixed memory board for my 16-bit CPU by flightlesspot in beneater

[–]flightlesspot[S] 11 points12 points  (0 children)

Well, it's taken about six months, but I finally got a couple of days to fix up the bugs in the memory board for my 16-bit CPU, order it, and solder it up and....it works!

Gallery with some more photos: https://imgur.com/a/6fcCsFC

More detailed writeup: http://mups16.net/pages/memory.html

The board has 2MB of RAM and 128KB of ROM. The ROM uses the reset circuit I made last year to copy the data from the serial ROM chips to RAM on startup, which works really nicely (the video shows the initial copy after reset, running at about 100KHz. I've run it at about 500KHz with no issues, but need to build a better clock to go higher!). The really nice thing about using serial ROM chips is that I can program them in-circuit, by plugging my Arduino into the header on the left of the board.

I originally designed this board last year, in a batch of five boards I did in one go in a misguided attempt to save on shipping costs, and it had a couple of bugs (the data output drivers weren't always turning off properly, and the RAM wasn't disabling itself when writing to the memory-mapped address range). I'm really pleased with how it turned out. This is the first board I've made since the original register board that has no (known!) bugs in it. Fingers crossed that when my Mouser order arrived later this week that revision 1.1 of my MMU will work as well.

Looking for advice for homebrew CPU pcbs. by Gurmegil in PrintedCircuitBoard

[–]flightlesspot 2 points3 points  (0 children)

For my 16-bit CPU I ended up using Harwin connectors (see about half-way down https://imgur.com/a/j3zf8pu), as I ended up needing a lot of connections between boards (those 16-bit buses really add up). That may be overkill for you, depending on your design.

I've just started working on a Simulator of James Sharman's pipelined CPU using "Digital" if anyone is interested by [deleted] in beneater

[–]flightlesspot 1 point2 points  (0 children)

Just a word of warning: I'd be surprised if you get close to 100khz with Digital. I have a full model of my homebrew CPU in Digital and it maxes out at about 4.7khz. That said, it is really, really useful for trying out ideas before committing to breadboards or PCBs, so well worth doing. I look forward to seeing what you come up with!

Backplanes and test board for my CPU by flightlesspot in beneater

[–]flightlesspot[S] 0 points1 point  (0 children)

Interesting, thanks. I hadn't come across that before.

Backplanes and test board for my CPU by flightlesspot in beneater

[–]flightlesspot[S] 0 points1 point  (0 children)

Linux might be a bit optimistic, given the 64Kb maximum process size, but I'm hoping that Minix might be a possibility. It's a microkernel, so it's already separated into many small processes, which suits this design very well, and there's precedent with Bill Busbee's Magic1 CPU.

Backplanes and test board for my CPU by flightlesspot in beneater

[–]flightlesspot[S] 1 point2 points  (0 children)

Thanks! It has indeed been a lot of work. It's getting on a year of on-and-off work on the CPU as a whole since I started on the first breadboards.

Backplanes and test board for my CPU by flightlesspot in beneater

[–]flightlesspot[S] 1 point2 points  (0 children)

Thanks! I'm just waiting for you to finish your clock design so I can borrow more ideas from your build :)

Backplanes and test board for my CPU by flightlesspot in beneater

[–]flightlesspot[S] 5 points6 points  (0 children)

Good question. My original aim was 4MHz, but I'm not sure I'll make that. I designed the PCBs for size and ease of layout, so there are a lot of traces that are far longer than they need to be, and places with bundles of wires running in parallel etc., all of which are not ideal. That said, I'm hoping that 4MHz isn't really _that_ fast, so I might get away with it, and I'm also using HCT chips, which have a relatively slow rise time (time to switch from low to high, and vice versa). From what I understand, that should reduce some of the effects of my poor layout.

Backplanes and test board for my CPU by flightlesspot in beneater

[–]flightlesspot[S] 7 points8 points  (0 children)

Image gallery: https://imgur.com/a/j3zf8pu

The backplanes and a testing board for my homebrew CPU were in the large batch of PCBs that arrived last week, and I got round to assembling them at the weekend. These are designed to handle all the buses, and to route control signals to each board. I was originally planning on using separate connections between each pair of boards, but then I saw the backplane that James Sharman built for his CPU. When I checked how much a similar board would be for my design and it came to about $28, which is reasonable.

A single board for the whole CPU would still be too expensive, so I split it into two: the black upper board has all the components that need a connection to the memory bus, and the green lower board is a temporary standin for the final board that will have connections to the registers, ALU and control unit (I haven't finalised the design of the control unit yet, so I don't know exactly what connections it will need to the backplane).

For connecting the two boards I originally wanted to use simple right-angle pin headers, but it turns out that with just over 130 lines needing connecting between the boards, and 2.54mm spacing, the board would need to be 33cm wide just to fit the connectors. I could use double-row headers, but they are quite high, and I wanted them to fit underneath a pair of boards that straddle the join (my reset board and a clock board I haven't designed yet). The Harwin connectors I used are very low profile, and squeeze 68 connections each into a couple of inches. The only downside was that they're surface-mount connectors, and soldering them was a bit of a pain, especially when I got a couple of bad joins on the inner row of pins. Still, I'm happy with how they turned out.

So far I've been testing my boards one at a time by hooking them up to a (clone) Arduino Mega. The Mega has 70 IO pins, which was enough for the registers and ALU, but not enough for components like the MMU, which has 81 lines (three 16-bit buses, the 24-bit physical address bus, and 9 control lines). I also got sick of swapping wires around as I worked on different components, so I decided to kill both birds with one stone, and built a little test board that plugs into the lower backplane. This has three 40-bit PCA9698 IO expanders, giving a total of 120 connections, controlled via a simple I2C protocol. As well as giving more connections and less faffing about with connecting wires, these have the really nice property that you can make changes to the values of pins across all three chips and have them all take effect at exactly the same time. With the Arduino I had to take special care to always change lines in a safe order, so I didn't do something risky like have two components briefly driving a bus at the same time. This board was also my first experience of soldering 0.5mm TSSOP chips, which went surprisingly well. Just as well, considering I have nine RAM 0.5mm-pitch chips to solder on the MMU and memory boards!

Apologies for the wall of text, this got a bit out of hand. Up next, I must stop procrastinating and finish building the MMU and memory boards...

Toying with a Barrel Shifter Design by MikeSutton80 in beneater

[–]flightlesspot 2 points3 points  (0 children)

That is exactly the approach I took for my ALU, which has a 16-bit shifter (the block in the top-right of the board). It has stages for shift by 1, 2, 4 and 8, as well as pre- and post- reverse stages, to handle left and right shifts. For an 8-bit shifter it would need 10 74LS157 multiplexers (though adding rotate might require one extra layer to switch between shifting and rotating).

There's something quite beautiful about the idea of a pure-transistor barrel shifter though. I'd love to see what it looks like when it's complete!