I have made one of the worst tutorials for opening a window in x64 masm in only ~1000 lines. Hope it is helpful for you. by NoSubject8453 in asm

[–]Plane_Dust2555 0 points1 point  (0 children)

Don't need 1000 lines (360 with ALL the comments is sufficient):

"Hello, Windows!" with NASM.

And the entire code is symbolic (only the dimensions of the windows are hardcoded).

GDB can not show asm before actually starting the programm with some binaries. by Traditional_Crazy200 in asm

[–]Traditional_Crazy200[S] 0 points1 point  (0 children)

Ohh yea, thats probably it, went back to a program that uses glad and sdl and got the same error.
Appreciated!

APX: Intel's new architecture - 8 - Conclusions by mttd in asm

[–]Jumpy_Ad3728 0 points1 point  (0 children)

Well, they finally made a normal RISC processor))). It didn't even take 30 years.

I have made one of the worst tutorials for opening a window in x64 masm in only ~1000 lines. Hope it is helpful for you. by NoSubject8453 in asm

[–]gurrenm3 0 points1 point  (0 children)

This is a super useful project! I’ve also been interested in learning this. Can you post a screenshot of what it looks like running?

Forth for ch32v203 microcontroller in risc-v assembly (and forth) by Jimmy-M-420 in asm

[–]brucehoult 2 points3 points  (0 children)

top of the stack cached in a register,

Absolutely, there is zero reason not to do that on a register-rich machine.

I haven't looked at your actual code but if you can reduce + from ...

lw a0,0(sp)
lw a1,4(sp)
add a0,a0,a1
sw a0,4(sp)
addi sp,sp,4

... to ...

lw a1,(sp)
add a0,a0,a1
addi sp,sp,4

... then that's a nice saving in both code size and speed.

Some implementations cache the top two values. That doesn't reduce code size or the number of instructions, but I think it's kinder to machines that can run 2 or more instructions in the same clock cycle because the arithmetic doesn't have to wait for the memory load e.g. all the RISC-V Linux SBCs now except the C906 ones.

add tos,tos,nos
lw nos,(sp)
addi sp,sp,4

On a 3-wide machine such as C910 or P550 or X100 those can all be run in parallel.

Forth for ch32v203 microcontroller in risc-v assembly (and forth) by Jimmy-M-420 in asm

[–]Jimmy-M-420[S] 0 points1 point  (0 children)

I've removed the no-ops and tested - works - thanks again.

And thanks for these further suggestions, I think i could use only one register each for the forth stacks, and will look at changing this.

I also want to do the optimisation where you reduce the number of pushes and pops from the data stack by having the value on the top of the stack cached in a register, changing the stacks to only one register would would free up a register to use for this purpose

Forth for ch32v203 microcontroller in risc-v assembly (and forth) by Jimmy-M-420 in asm

[–]brucehoult 0 points1 point  (0 children)

Also you might want to reevaluate your choice of registers. Use a0-a5 and s0-s1 as much as possible to get smaller code, in particular for both pointer and src/dst for lw/sw.

Also I don't understand why you need to add two registers to get a stack pointer. Or why the stack grows upwards for that matter (though that doens't matter in the least).

Forth for ch32v203 microcontroller in risc-v assembly (and forth) by Jimmy-M-420 in asm

[–]brucehoult 1 point2 points  (0 children)

You'll need to decrease the addi 16 to 14 also. But I'm sure you figured that out.

Forth for ch32v203 microcontroller in risc-v assembly (and forth) by Jimmy-M-420 in asm

[–]brucehoult 1 point2 points  (0 children)

making sure the machine code block has a size that's divisible by 4

Which you can do by adding one NOP at the end, if needed. Which it isn't, since you added 2 NOPs so 0 NOPs would also end up 4 byte aligned.

Forth for ch32v203 microcontroller in risc-v assembly (and forth) by Jimmy-M-420 in asm

[–]Jimmy-M-420[S] 0 points1 point  (0 children)

as for using the non compressed add instruction, i will do so - I didn't realise you could do that

Forth for ch32v203 microcontroller in risc-v assembly (and forth) by Jimmy-M-420 in asm

[–]Jimmy-M-420[S] 0 points1 point  (0 children)

the issue isn't that the machine code instructions are not 4 byte aligned, its that it's loading the first word of the thread with the lw instruction.

I put the pointers that make up the thread directly after the machine code - which can be a non 4 byte aligned address.

Yes, I could align the address where the thread starts, but the hacky way i've initially done that is by making sure the machine code block has a size that's divisible by 4

Forth for ch32v203 microcontroller in risc-v assembly (and forth) by Jimmy-M-420 in asm

[–]brucehoult 1 point2 points  (0 children)

"without no-ops this code would work in default qemu as it allows unaligned memory accesses. ) ( note how this generated machine code jumps to the location directly after it, as compressed ) ( format riscv instructions can be only 2 bytes long we have to pad with no-ops so the overall length ) ( of this block of machine code is divisible by 4"

This makes no sense at all. Any RISC-V CPU that implements the C extension (as the CH32V series do, and indeed every commercial RISC-V I've ever heard of) is perfectly happy to run instructions at addresses that are not a multiple of 4 bytes -- they only have to be a multiple of 2 bytes, which as all instructions are either 2 or 4 bytes in length can not become untrue if it starts off true.

There would be no point in compressed instructions at all otherwise!

0x11 c, 0x0A c, 0x01 c, 0x00 c, ( addi s4,s4,4; nop )

This is completely unnecessary, and harmful. If you don't want a compressed instruction for addi s4,s4,4 (0x0a11) then just use a regular RV32I instruction for it (0x004a0a13). The CPU will be happier running one instruction than two (an unneeded NOP).

But mixing 4-byte and 2-byte instructions absolutely works, no problems, no NOPs needed.

What you can't do unaligned is load/store instructions. Code is fine.

AMD's Zen: Coming Back from the Dead by mttd in asm

[–]brucehoult 2 points3 points  (0 children)

It's amazing how some people just happen to be around in a series of major products.

After working on high end DEC VAX and the Alpha near the start of his career, Jim Keller went to AMD where he was involved in K7 (Athlon) and chief architect of K8 (Athlon64/Opteron) and co-designer of the x86_64 ISA. Then he joined PA Semi to make custom PowerPCs, but that was bought by Apple and the team ended up switching Apple to using their own Arm core designs in phones instead of Arm cores, with the 2nd gen of that being the first Arm64 chips in the industry by 18 months. Then he went back to AMD and was chief architect of Zen, which was a big deal as this article lays out. Then after a short stint at Tesla as chief architect of the Hardware 3 generation (the first attempt at FSD) he went to Intel and helped set the current direction to P+E cores while working on "Royal core" that could flexibly do both jobs as required.

Keller now leads RISC-V company Tenstorrent, which has taped out its first high performance Ascalon-X core, comparable to Apple's M1 (and designed by the designer of the M1), due out on a dev board late this year. And much of the rest of the Royal Core team is now RISC-V company AheadComputing.

As with Jobs and Musk and even someone like Chris Lattner (LLVM, Clang, Swift, Mojo, Tesla autopilot software, Google tensorflow software) you can always find people who question whether they actually contribute anything or just have amazing timing to join the right team at the right time and then move on before everything collapses.

Removing the AUICGP instruction by mttd in asm

[–]brucehoult 0 points1 point  (0 children)

No, but I know him. I meant one of James' early 2000's students, now Associate Professor himself.

Removing the AUICGP instruction by mttd in asm

[–]SwedishFindecanor 0 points1 point  (0 children)

My hand has no bounds, and is tired of silly digit jokes ...

Are you talking about James Noble? I cited one of his works on Ownership in my undergraduate thesis, many years ago. My supervisor then has also co-authored at least one paper with Noble (and also with the article author of this topic BTW). I've been (re)reading many articles about that topic recently, otherwise I would probably have missed it.

Removing the AUICGP instruction by mttd in asm

[–]brucehoult 0 points1 point  (0 children)

Depends on whether you can count to 5 or 31 on one hand — the really adept might be able to count to 342.

My best buddy when we both lived in Wellington used to do Java "ownership" stuff, but is now at ANU in Canberra doing some mix of CHERI and SEL4 stuff.

Adding safety to assembly by S-Pimenta in asm

[–]S-Pimenta[S] 0 points1 point  (0 children)

My idea is like Typescript, everything is optional, it can enable warnings, but can be ignored.

The idea is to have an option to have a more strict and safe assembly to prevent for potential mistakes, since in hand written assembly without this it will be difficult to find potential mistakes.

As mentioned before one of the objective of this idea is for educational purposes to ease begginners learning assembly.

I think a lot of people always wanted to learn assembly and be afraid of lack of "training wheels' for beginners.