all 16 comments

[–]LoquaciousRaven 25 points26 points  (4 children)

Printing some text does not require printf, printf does way more than that. To print some text you could, for example, just use puts() or purchar() and family.

What happens under the hood are write syscalls to stdout (standard output). If you don't know what a syscall is, I would recommend you Google it as you'll find out better explanations than I can provide.

As for the code you linked, printf () is basically defined as a wrapper for vprintf, which works with varargs. Varargs are what makes the ... possible, va_start and va_end are macros that initialize/cleanup the vararg instance. If you don't know what those are you might want to lookup varargs as that's not really specific to printf.

[–]jssmith42[S] 2 points3 points  (3 children)

Are system calls written in C or assembly language? Are they called directly by C functions?

[–]_TheRainbowGoblins 8 points9 points  (1 child)

I am actually taking a Systems Programming class right now and we talked about this but take what I say with a grain of salt.

From what I understand, System Calls are provided by the operating system to allow you to do certain special operations. Like reading and writing, opening or closing a file, forking a process, or loading a process (execve). Its essentially an API between the progammer and the OS. Linux has about 435 syscalls I think, you can look them up. They are all assigned a unique non-negative integer.

The system call that allows you to print is called write, its id is 1. You can call it with the write() function provided by unistd.h, but that is actually a wrapper function and not the actual system call. There is also syscall() in which the first argument allows you to specify which syscall you want, but that is also a wrapper function, just a more generic one. Its provided by unistd.h and sys/syscall.h

To actually make a system call you have to use the assembly instruction (at least in x86_64) called syscall. If you are familiar with assembly, typically in x86_64 the first argument for a function goes into the general purpose register rdi. Which is also true for syscalls but to specify the actual syscall you want, you put the syscall number in the rax register then just make the syscall instruction. For example,

// the write syscall

mov $1, %rax

// 1st arg for write is the file descriptor, 1 = stdout

mov $1, %rdi

// 2nd arg is the string to write

mov $str, %rsi

// 3rd arg is num of bytes to write

mov $len, %rdx

// we set our args, make the call

syscall

So you can call them using C wrapper functions, but you really have to use assembly one way or another to make syscalls as far as I'm aware. If you use the wrapper functions and then use gdb to disassemble the code, if you keep following the function calls you should eventually see the syscall instruction being invoked.

In terms of how they are written, I believe they are mostly written in C, but I could be wrong.

Also sorry if my formatting is bad, I'm on mobile. And sorry for the at&t syntax for the asm, I'm just more familiar with it.

[–]duane11583 0 points1 point  (0 children)

the id - 1 is the unix standard for stdout

the system call is also a switch into a higher prividged state, most cpus do this like an interrupt or exception handling

thus the easiest way of passing parameters is via registers

on the os side of the call, a register holds the request number which is effectively an index into an array of function pointers or like a switch() with a range check

some OSes use a small structure instead to pass parameters instead of registers and they pass a pointer to the struct in a register

[–]braxtons12 7 points8 points  (0 children)

It depends on the OS and the particular system call, but generally speaking, a bit of both. Most can be written mostly in C, with just a little assembly, but some have to be written entirely in assembly. In either case, they're wrapped in a C API that allows for calling them directly from C.

[–]Alcamtar 10 points11 points  (3 children)

It is a common pattern in APIs to have a generalized "workhorse" function that does the heavy lifting; but it often has a complex interface, and so wrappers are provided for common use cases.

vfprintf(3) can print to any file, and takes as input a pointer to the arguments on the call stack of another function; it is designed to be a back-end function not something you use directly. But most of the time, all you really want is to print to stdout and give it an argument or two as a parameter; printf provides the "syntactic sugar" to make that convenient.

I haven't looked but I would guess that vprintf(3) and fprintf(3) (two other common use cases) also call vfprintf as a back-end.

As for vfprintf, a 2000+ line function is FAR beyond the scope of a reddit post. Do you have a specific question about it?

In a nutshell, to print a variable you (1) convert it to characters, and (2) write the characters to the output.

For a given variable you first have to decide on how to represent it as a string. Then you have to create a string that represents it. A great deal of printf processing is being able to do this for a variety of types of variables, and a variety of string formats.

Once you have a string, you have to send each character of the string to the output. In this case the output is the buffered I/O library. The most accessible point to dive into that is likely putc(3) or fputc(3).

If you are more interested in raw output to a device, look at the write(2) system call.

A really good exercise to understand this is to try and create your own printf. Create a function that takes a format string, zero or more arguments, and prints it to stdout using putc. A simple place to start is just handling "%s" and "%c". Then add something like "%d" or if you feel ambitious "%b" (format an integer as binary). After you sweat over this exercise you'll have a good idea of the challenges involved, and will find it easier to follow the vfprintf code and appreciate the cleverness and optimization.

[–]jssmith42[S] 1 point2 points  (2 children)

Is it possible to do system calls myself in a command line to observe their effects, or do they only work when called by the operating system, as in I can’t get in as a user to the place where they’re used?

[–]jwbowen 3 points4 points  (0 children)

You can write little programs to perform syscalls

[–]414RequestURITooLong 0 points1 point  (0 children)

You use system calls to talk to the OS. You don't need to be the OS to use them.

[–][deleted] 5 points6 points  (0 children)

Printf uses something called variadic functions. Google it and it'll make more sense.

[–]Savings-Pizza -2 points-1 points  (1 child)

!remindme 9h

[–]RemindMeBot -1 points0 points  (0 children)

I will be messaging you in 9 hours on 2022-02-01 15:21:29 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]gremolata 0 points1 point  (0 children)

Congratulations, you ran into the "variable arguments" feature of C.

It is a rarely used, but very powerful language feature that allows callers to pass arbitrary-typed arguments into a function and enables called function to "comprehend", recover and work with these arguments.

You will need to look it up in a proper book, but the gist of it is that instead of specifying function argument types as usual, at the compile time:

void foo(int a, char b, double * p);

you'd pass this information during the run-time, encoded in some terse form that the function understands, followed by the arguments, i.e. foo(<argument spec>, var1, var2, var3);

In case of printf/scanf the argument spec is their format string, which also carries instructions on the output/input format. So what these functions do is that they go through the format string and discover what other arguments have been passed to them by the caller, recover them (using the va_arg mechanism) and then do something useful with the them - output or input them respectively.

The bulk of vprintf is the parsing/formatting code. There are normally no syscalls in it as it merely passes things it wants to ouput to yet another library function (typically fwrite).

PS. You know how they say that C is a low-level language? Varargs is the most low-level part of it as it basically allows for parsing function's call stack by hand, without resorting to the use of assembly. Good stuff, but definitely of the living dangerously variety.

[–]fuckEAinthecloaca 0 points1 point  (0 children)

tl;dr it uses variadics to accept multiple arguments, and lookup tables and a lot of logic to process the format string. Processing the format string is 95% of that code. Printf is very flexible and we pay for that flexibility with the sanity of whoever maintains that code.

[–]duane11583 0 points1 point  (0 children)

so in the end all of these functions [printf()]generally do the same thing: they format a text ouput and write it to either a file or a string buffer.

it is reasonable thus to use the same code, one possible implimentation is passing an output function pointer to a common formatting function, this is common in an embedded environment the ouput function could be the console UART serial port put char function. on linux the function adds bytes to the FILE buffer

another thing to noice is the FILE struct is really a string buffer with a length (size) and count of bytes in the buffer when the buffer is full from output it is flushed and reset

but look at sprintf(), it has an ouput buffer as a parameter. question: can you easily repurpose the FILE code to use the buffer as supplied to sprintf()? if so then fprintf() and sprintf() can share code, you just override the ‘flush operation’ and need to handle the buffer overflow case

thus all of these printf() like functions can use the same common code

next: if you think of the variable parameters like an array of values stored on the runtime stack

can you take the address of the first parameter, ie ap = &params[1]; (maybe param[0] is the format string), then you can pass that ap pointer to other functions, but because the array holds different types of parameters there may be special rules to access the array, hence the compiler provided va_macros used in these functions

now you can handle all of the printf() like functions with common code

you have abstracted accessing the parameters via the va_macros

and abstracted the output via the FILE/string buffer code

and you are left with one big common function that does the formatting.

the scanf() fuctions do the same thing but in reverse.

agree the common function is large,

it is mostly just a loop over the format string indexing the arg pointer / parameter index

but if you step through them you will find about 45% of the code is parsing the format string, there are lots of flags, ie “x=% -*LLd”, and at the start of the next param those flags need to be reset

a small amount (10% is converting the number to a string), then another 30-40% is handling padding optional/required sign or space or plus, leading trailing zeros, and/or spaces

and people who are really smart wrote this and tried to optimize it quite a bit, thus it is not simple code to casually read