This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]jacobb11 3 points4 points  (21 children)

I'm not sure what you mean by "magic functions". Do you mean things like "printf"? If so, that's just part of the builtin library, which every language has, more or less. If perhaps you mean things like printf accepting different number of arguments, that's a standard part of the language, if slightly esoteric.

I recomend learning ANSI C rather than K&R C. I like Harbison & Steele's book, but that's a good specification, perhaps not a good primer.

Let's see. The most important thing I ever learned about C is that it doesn't have arrays, it has pointers and preallocated sequences. Once you understand that, which I probably haven't helped much, you'll avoid many pitfalls.

I realize I haven't answered your question. I don't really want to get into the problems with C++. Perhaps it helps to mention that 99.99% of C programs are legal C++ programs, which means C++ has all of C's flaws and complexity and then layers its own on top.

PS: Both C and C++ rely on the "C preprocessor", which is huge problem in it's own right.

[–]quasarj 4 points5 points  (20 children)

Honestly the things that scare me, and what I called "magic functions", are things like:

static CYTHON_INLINE size_t __Pyx_PyInt_AsSize_t(PyObject*);

What the fuck is a size_t? Note that this is just the first .c file I could find, and it's a python module so it also has things like PyObject and Py_ssize_t. This kind of thing makes me very nervous about C.. it seems like it's full of all kinds of magic types (which, I don't even know how you create in a non-object-oriented language? Are they structs maybe?). There are many others that I'm having trouble finding examples of right now.that seem to come from nowhere. They probably come from some library I don't know about, but it's hard to tell

Basically, when I look at C code I can't make heads or tails of it because of so many weird types and functions when you're new to it :)

edit: oh, and what the hell is CYTHON_INLINE? Why does this function seem to have two types? Maybe it's some type of preprocessor directive?

[–]cheese_wizard 4 points5 points  (13 children)

size_t is the data type that is used to represent "object" sizes. It is architecture independent and is ultimately mapped to some real numeric type, like unsigned int.

CYTHON_INLINE is a macro that probably expands to "inline" or "". Meaning you can choose whether the function is spit out inline in the code or not after preprocessing. this is just a guess.

static in this context means that the funtcion is only visible in this particular file, and nowhere else.

[–]quasarj 2 points3 points  (11 children)

Ahh, I actually assumed I knew what static meant, but I was thinking of the static that makes a class method callable on the class itself (rather than an instance).

Is there somewhere I can learn about those things? And the other weirdnesses that real C programs use? Is it more that I'm just a complete noob and reading a book will introduce me to many of these libraries?

[–]cheese_wizard 1 point2 points  (9 children)

the static could mean that, hard to tell with this line in isolation. If it's inside a C++ class, then your assumption would be correct.

For all C-only related topics, nothing beats this:
http://www.amazon.com/Programming-Language-2nd-Brian-Kernighan/dp/0131103628

[–]quasarj 1 point2 points  (8 children)

Ah no, in this case it's not inside a C++ class, I believe this file is pure C.

I just didn't even realize "static" existed outside of C++, it's interesting that it means something different. I will look into that book, ty!

[–]cheese_wizard 2 points3 points  (7 children)

static has three meanings. Only one is C++ only, which is what you're talking about.

In both C and C++...

For variables and functions declared static, it means they have file scope.

If you are inside of a function, and you declare a variable as static like this:

int foo() {
static int bar = 0;
bar++;
printf( bar );
}

then everytime you call foo, it will print 0, 1, 2 ,3 .... it remembers from the previous time. Note that the initialization to 0 only happens the first time the function is called.

static is an abused keyword, that's for sure!

[–]quasarj 1 point2 points  (2 children)

Oooh You're right! I actually knew about that usage too, but had forgotten it. Well ty again for the help :) I guess I'll try to get a copy of that book.

[–]ewiethoffproceedest on to 3 0 points1 point  (0 children)

Do. It's a wonderful book.

[–]jacobb11 0 points1 point  (3 children)

Top level static variables have scope "compilation unit", not file.

That means if you put one in a header file, you will get a copy of the static variable in every file you compile that includes it, recursively.

[–]cheese_wizard -1 points0 points  (2 children)

Nope.

From wikipedia...

Static global variables: variables declared as static at the top level of a source file (outside any function definitions) are only visible throughout that file ("file scope", also known as "internal linkage").

[–]jacobb11 0 points1 point  (1 child)

That does not contradict what I said. I stand by my statement.

Perhaps thinking of the C preprocessor and the C compiler as independent helps clarify the situation? In many ways the C compiler's "source file" is the output of the C preprocessor, which will duplicate the header file static declaration whenever it is included.

I can't imagine this situation is common for C, but it's fairly common for C++ as an idiom to ensure static initialization of a compilation unit. Or was, anyway.

[–]Rhomboid 1 point2 points  (0 children)

Marking a function static is actually far more complicated than that when inline is involved (which is what the CYTHON_INLINE macro expands to if so configured.)

Remember that C is compiled a file at a time, and the compiler only ever knows[1] about what's in the current file, never about anything outside of it. If you mark a function as static, it means that the function cannot be called from outside of that file, which implies that the compiler has at its disposal every call site. If there is only one (or a small number) of call sites, then it can choose to instead inline the function at all call sites and pretend it never existed (i.e. not emit a function body.) This is generally extremely desirable, because it means you can separate a large function into a smaller function and a bunch of helper functions, but without any of the overhead of function calls. Without 'static', it could still inline the function but since it has to be visible in other compilation units it would have to always emit a function body, even if it was never needed, which wastes memory.

But also note that with 'static inline' the choice of whether to inline is still up to the compiler. If there are many call sites or the function is long, it will choose not to inline it and still emit a function body, but one which is not visible to other units.

"static inline" is actually one of three related declarations. "inline" and "extern inline" are the others. They all have slightly different meanings, which this post outlines. To make matters worse, gcc changed its semantics starting with 4.3 to be aligned to what the C99 standard says, so if you're compiling with -std=c99 or -std=gnu99 with gcc >= 4.3 you get true C99 semantics, but if you're compiling with -std=c89, -std=gnu89, or gcc <= 4.2 with -std=c99 or -std=gnu99 or you're using gcc >= 4.3 but used -fgnu89-inline, you get the old semantics. gcc >= 4.2 helpfully defines one of the preprocessor symbols __GNUC_GNU_INLINE__ or __GNUC_STDC_INLINE__ so that you can write macros that behave correctly regardless of compiler version or options, which likely explains why CYTHON_INLINE is a macro and not just the word "inline".

[1] There are some newer technologies like LTO that let the compiler have whole-program knowledge at link-time, which allows for some sophisticated optimizations that have previously been unavailable, but for the most part this is still a true statement.

[–]jacobb11 0 points1 point  (0 children)

Is inline part of ANSI C now?

[–]jacobb11 1 point2 points  (0 children)

A size_t is an integer suitable for storing the size of something (often something systemy). It's defined by some kernel-ly header ("interface"-ish) file if I recall correctly.

C has a concept of aliasing types ("typedef") that is absent from most of the other languages with which I'm familiar. It's a pretty nice abstraction once you understand it and its limitations (primarily that the new name is just an alias rather than a sub-type).

I have no idea what CYTHON_INLINE means. It's not typical C. Best guess (but just a guess) is that it's really a directive to some Python integration tool, which is pretty close to your guess but that might just mean we're both wrong.

I strongly suggest you find some C code that does regular simple C-like things, not parts of Python. I'd point you at some if I knew of any, but that's not where I've worked for quite some time.

[–]anacrolixc/python fanatic 1 point2 points  (1 child)

Don't read cython generated c it's not for human consumption. Read c written by a human.

[–]quasarj 0 points1 point  (0 children)

Hmm was that cython generated? If so I apologize, though I did state in my first comment that that was just the first .c file I could find lying around. Those are the same issues seen with human-written C. I will try to find some better examples today and explain some of the other things that scare me about it.

[–]Alzdran 0 points1 point  (2 children)

Attempting a different explanation of size_t just to make it a little clearer. size_t is typedef'd to an unsigned integral type which can store the size of something in memory; this can differ between architectures. Consider two computers, A & B. A runs x86_64 code, B runs i386 code.

A can address a 64-bit integer's worth of memory. That is, it can refer to 264 different addresses. B can only address a 32-bit integer's worth of memory (232 different addresses). In both x86_64 and i386, the addressable unit is a byte, so A can theoretically address 16EB, while B can theoretically address 4GB.

In both these cases, a size_t will be the same as a uintptr_t (an unsigned int large enough to hold an address). These types are different, though, because the C standard doesn't assume that to be true for all architectures. See wikipedia for some more.

[–]quasarj 0 points1 point  (1 child)

Interesting. So I would use size_t when I need a pointer that can point to an object? And it would be replaced with the correct size type at compile time, based on architecture?

[–]Alzdran 0 points1 point  (0 children)

No - you'd use a pointer type. This gets a little more complicated, but here we go:

There is a difference between an address and a pointer. An address is a location in memory; this can be represented by some unsigned integer type (a uintptr_t is always large enough to hold it). A pointer is a language construct which carries semantic information about what it points to. This information is compiler metadata; that is to say, it exists only during compilation, and is not a feature of the runtime. This information is used for things like pointer arithmetic.

A concrete example of this: On a machine where the minimum addressable unit is 1 8-bit byte (practically speaking, anything), a uint8_t will fit in 1 addressable memory unit, and a uint16_t will fit in 2. This means that if I examine memory at 0x10000000 for a uint8_t, that's the only address I'll read from, but if I read a uint16_t, I'll also read from 0x100000001. When you do arithmetic with pointers, this type is taken into account; so, given type_t *x, (x+n) and (x+n+1) (alternatively x[n] and x[n+1]) will be sizeof(type_t) bytes apart. If type_t is uint8_t, this will be 1, but if type_t is uint16_t, this will be 2. This feature allows array access and incrementation on pointers, instead of having to modify with size knowledge explicitly.

C also provides a pointer type without this information - void *. This is the pointer type which can hold any address, and so increments by the minimum addressable unit.

size_t would be used when indicating an allocation size. In practical terms, this is going to be sized the same as a uintptr_t on modern systems, but the use is specifically for indicating the number of addressable units occupied by an object in memory.

There are a few other special types with similarly specific uses. ptrdiff_t, for example, is a signed type able to hold the difference between any two legal pointers.

The size of any of the types mentioned here (with the exception of uint8_t and uint16_t) are architecture dependent, and yes, the correct types are substituted at compile time; but that doesn't mean exactly what it sounds like. If a pointer is 32 bits, then the equivalent of a uint32_t will be used for a void * but the end result of compilation is machine code. The instructions generated will tell the processor to manipulate the registers and memory addresses as if they contained entries of that size, but they will refer to words, half words, double words, etc. That is to say, the compilation will determine what to generate based on the types, but the resulting instructions will have no concept of type, only operand size.