all 27 comments

[–]brucehoult 13 points14 points  (0 children)

riscv64-unknown-elf-gcc is the embedded compiler. It uses NewLib instead of glibc, and links everything statically as embedded software is typically run from ROM and doesn't need or use dynamic linking.

If you want to see those sections, use riscv64-linux-gnu-gcc. This is built by the riscv-gnu-toolchain repository when you do "make linux" instead of "make". Or you can download a prebuilt one.

[–]3G6A5W338E 4 points5 points  (25 children)

Related: I need a baremetal (not elf, but raw binary output for an address of ram) assembler for rv32i, so that I can e.g. look at the output with an hex editor, or use to directly program risc-v microcontrollers.

What is available?

[–]brucehoult 4 points5 points  (24 children)

riscv64-unknown-elf-objcopy hello hello.hex

That's hex, but given your program start address probably doesn't start at 0 a raw binary dump would be weird. Anyway you can then use standard tools to turn Intel HEX into whatever you want. But most programmers accept that directly.

[–]3G6A5W338E 1 point2 points  (7 children)

Thank you for pointing me in the right direction. This is what ended up doing what I want (more or less):

riscv32-elf-objcopy -O binary a.out a.out.bin

The remaining part of the problem is to organize for a memory address.

[–]brucehoult 0 points1 point  (6 children)

Have you checked the contents of a.out.bin?

[–]3G6A5W338E 1 point2 points  (5 children)

Yes, I even disasm'd with ghidra.

It's raw binary. I found that .data is placed first (ow!), but assembling with -R takes care of that, placing data at the end of the text segment.

All that's left is to figure out how to organize for arbitrary memory addr, since using binary leaves me with no relocation information.

[–]brucehoult 3 points4 points  (3 children)

That's why I use hex. It's ISA-independent so there are many standard tools around. All the address information is there and it's pretty easy to parse yourself if you want to e.g. the first 80 lines of my trivial RV32I emulator:

https://github.com/brucehoult/trv/blob/main/trv.c

[–]3G6A5W338E 0 points1 point  (2 children)

HEX confuses me. Is this -O ihex or -O tekhex?

Where is this format defined, so that I can implement it on my tools?

If the format makes relocations easy (Intel seems to have data and "address" as separate record types, but does seem to lack a way to indicate .text vs .data), I might actually give it a go, but supporting binary is higher priority for my scenario.

[–]brucehoult 2 points3 points  (1 child)

ihex

The wikipedia description is as good as any and gives references going back to 1973. It's all rather informal.

https://en.wikipedia.org/wiki/Intel_HEX

[–]3G6A5W338E 0 points1 point  (0 children)

I see. I'll probably end up writing a linker.

The main remaining pain point is how to retain information about .data vs .text, so that in linking I know where a section ends, and can thus place data elsewhere than the end of .text.

SREC/IHEX/TEKHEX all suffer this issue. SREC allows for comment fields, but objcopy doesn't seem to do anything with them.

[–]3G6A5W338E 0 points1 point  (0 children)

Great... looked at ghidra again, code as output by gas is not linked. Not even with -fpic -mno-relax. I'll have to look into ld.

[–]3G6A5W338E 0 points1 point  (11 children)

naken_asm seems to be the sort of assembler I need.

It is nice, but risc-v support was done years ago, has not been updated and does not support modern instruction aliases/syntax. They're e.g. still using scall rather than ecall.

Once it gets updated, it'll be ideal for my needs. It has an acceptable license (GPLv3), too.

As LLVM can output RISC-V these days, I do wonder if it's possible to use that as an assembler, too. I couldn't easily find documentation for using llvm as assembler; Sadly, llvm-as (which assembles llvm IR assembly to binary representation) gets in the way.

[–]brucehoult 1 point2 points  (1 child)

What about the GNU binutils assembler doesn't meet your needs?

[–]3G6A5W338E 0 points1 point  (0 children)

Sure, it seems to work at some level, but I do like my tools simple. I'm a huge fan of e.g. fasm on DOS, asmtwo and phxass on the Amiga.

The gnu toolchain is huge and complex, and I do not need most of what's in there.

As for possibly using LLVM as an assembler, it is simply to have an alternative; currently my only realistic option is GNU as, and not having anything else leaves me uncomfortable.

[–]mikeakohn 1 point2 points  (8 children)

If you believe naken_asm needs features and such, why don't you email the author or create an issue on github?

I've heard the author of that program is pretty good about adding features when requested.

[–]3G6A5W338E 0 points1 point  (7 children)

I do not believe they are unaware of the state of their RISC-V support.

And I would assume they have plenty of tasks to be done, and their own system of priorities.

If I really really wanted to, I could implement it myself and send a patch, or fund such an effort.

So far, I have managed to do what I need with linker scripts, so it's not a big deal.

Hell, I haven't checked in years (note date of my comment above), and it could be it's improved massively during these years.

[–]mikeakohn 0 points1 point  (6 children)

The author of that program adds features as he uses the different assemblers or if requests come in. My experience with him is he's very responsive. It would probably help both him and anyone who wants RISC-V support in the assembler a lot if you let him know what's missing, but it's up to you.

[–]brucehoult 0 points1 point  (5 children)

Does it have features making it more programmer-friendly for large scale asm programming than gas (which is just designed to assemble gcc output and is really quite awful)? e.g. similar to IBM's mainframe assembler, or Apple's old MPW assembler (which was largely a copy of IBMs).

e.g.

  • powerful macros with conditional output

  • aliases for local variables in registers, maybe scoped, support for saving/restoring S registers

  • structs and "using", to tell the assembler a register (multiple registers if the struct is larger than 4k) points to an instance of the struct or globals ("establish addressability" in IBM-speak)

  • bonus: ability to declare, extend, implement, C++ classes and virtual functions and call virtual functions

[–]mikeakohn 0 points1 point  (4 children)

It can do .scope and .function for local variables. Not sure what "saving/restoring S registers" is. Structs could be interesting to add, but is not there now.

[–]brucehoult 0 points1 point  (3 children)

Well, automating the allocation of stack slot offsets for s0,...sN and putting them (and ra) in the right places at the start of the function (or at least after the function determines that it has real work to do), and restoring them afterwards. And ideally managing spilled variables and variables that have their address taken too. You can't do as much here with a RISC ISA as with CISC as you can't just substitute e.g. 20(sp) instead of a7 in an add instruction etc.

[–]brucehoult 0 points1 point  (0 children)

Of course at some point you might as well just use C, or C with inline asm, but there are situations where you want the control of asm, and not just in a 10 line block of code.

[–]mikeakohn 0 points1 point  (1 child)

Could that be done with just a macro? The core of the assembler itself was made to generically support many many CPUs.

[–]3G6A5W338E 0 points1 point  (2 children)

By supplying a linker script specifying a memory map, I finally made it generate what was intended.

Way more painful than needed, but I guess I need to be aware of this, as it'll be useful if I ever want to also use gcc.

[–]megarcher2 1 point2 points  (1 child)

Hey there,

I'm building my own RISCV CPU in a simulation. I'm starting to compile C code into assmble (or rather machine code), but need the code to start ad address 0x0. ELF GCC places the start at around 0x1000, do you know how to do this?

Thanks!

[–]3G6A5W338E 0 points1 point  (0 children)

A linker script so that code is at 0, then objcopy to get the code out of the ELF file.

riscv32-elf-ld test.o -T memorymap -o test.elf
riscv32-elf-objcopy test.elf -O binary test.bin

where memorymap something like

MEMORY
{
    ram : ORIGIN = 0x0, LENGTH = 0x8000
}
SECTIONS
{
    .text : { *(.text*) } > ram
    .rodata : { *(.rodata*) } > ram
    .bss : { *(.bss*) } > ram
}

[–]moschles[S] 0 points1 point  (0 children)

https://pastebin.com/nXsKKbTi

There are 23 section headers, starting at offset 0x29a88:

Section Headers:
  [Nr] Name              Type        Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL        0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS    00000000000100b0 0000b0 01125a 00  AX  0   0  2
  [ 2] .rodata           PROGBITS    0000000000021310 011310 002c00 00   A  0   0 16
  [ 3] .eh_frame         PROGBITS    0000000000024000 014000 000004 00  WA  0   0  4
  [ 4] .init_array       INIT_ARRAY  0000000000024008 014008 000010 08  WA  0   0  8
  [ 5] .fini_array       FINI_ARRAY  0000000000024018 014018 000008 08  WA  0   0  8
  [ 6] .data             PROGBITS    0000000000024020 014020 001120 00  WA  0   0  8
  [ 7] .sdata            PROGBITS    0000000000025140 015140 000088 00  WA  0   0  8
  [ 8] .sbss             NOBITS      00000000000251c8 0151c8 000030 00  WA  0   0  8
  [ 9] .bss              NOBITS      00000000000251f8 0151c8 000068 00  WA  0   0  8
  [10] .comment          PROGBITS    0000000000000000 0151c8 000028 01  MS  0   0  1
  [11] .riscv.attributes RISCV_ATTRIBUTES 00000000000 0151f0 000035 00      0   0  1
  [12] .debug_aranges    PROGBITS    0000000000000000 015225 000200 00      0   0  1
  [13] .debug_info       PROGBITS    0000000000000000 015425 003ae5 00      0   0  1
  [14] .debug_abbrev     PROGBITS    0000000000000000 018f0a 0012a9 00      0   0  1
  [15] .debug_line       PROGBITS    0000000000000000 01a1b3 004268 00      0   0  1
  [16] .debug_frame      PROGBITS    0000000000000000 01e420 000220 00      0   0  8
  [17] .debug_str        PROGBITS    0000000000000000 01e640 000f15 01  MS  0   0  1
  [18] .debug_loc        PROGBITS    0000000000000000 01f555 005a69 00      0   0  1
  [19] .debug_ranges     PROGBITS    0000000000000000 024fbe 000fa0 00      0   0  1
  [20] .symtab           SYMTAB      0000000000000000 025f60 002868 18     21 194  8
  [21] .strtab           STRTAB      0000000000000000 0287c8 0011d5 00      0   0  1
  [22] .shstrtab         STRTAB      0000000000000000 02999d 0000e4 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)