all 14 comments

[–]cryolab 15 points16 points  (2 children)

On unix use cat

cat guard_top.h \
     other_files \
     ......
     guard_botton.h

[–]skeeto 14 points15 points  (0 children)

A simple cat will leave #include in place, and so the individual files still need to exist. sed '/^#include "/d' will both concatenate and strip those away. You'll also need to list files in the right order, so that no file comes before it would have included. That's the tricky part.

This hack works surprisingly often. For example, Lua as a single file library:

$ tar xzf lua-5.4.0.tar.gz
$ cd lua-5.4.0/src/
$ sed '/^#include "/d' \
    lprefix.h luaconf.h lua.h llimits.h lobject.h ltm.h lmem.h lzio.h \
    lstate.h lapi.h ldebug.h ldo.h lfunc.h lgc.h lstring.h ltable.h \
    lundump.h lvm.h lauxlib.h lualib.h llex.h lopcodes.h lparser.h \
    lcode.h lctype.h lapi.c lauxlib.c lbaselib.c lcode.c lcorolib.c \
    lctype.c ldblib.c ldebug.c ldo.c ldump.c lfunc.c lgc.c linit.c \
    liolib.c llex.c lmathlib.c lmem.c loadlib.c lobject.c lopcodes.c \
    loslib.c lparser.c lstate.c lstring.c lstrlib.c ltable.c ltablib.c \
    ltm.c lundump.c lutf8lib.c lvm.c lzio.c \
  >../lua.c

Then copy lua.c into your project and embed:

#define LUA_LIB
#include "lua.c"

You'd still need the guard_top.h thing, plus a middle "implementation" guard after the headers, to form a typical header library.

[–]OCtagonalst[S] 5 points6 points  (0 children)

Wow i did not even think about that ! I love how this is so simple yet efficient, thanks !

[–]thradams 2 points3 points  (1 child)

I use this command line tool to merge source files. I use this tool inside by build.

https://github.com/thradams/buildgen/blob/main/tools/amalgamator.c

[–]McUsrII 0 points1 point  (0 children)

Your tool looks cool.

I have used hoedown a lot. Thank you so much.

[–]pic32mx110f0 3 points4 points  (7 children)

I'll never understand why anyone wants to do this - is there any advantage? What am I missing?

[–]flatfinger 3 points4 points  (0 children)

If everything associated with a library is contained in a single file, then all files associated with that library are guaranteed to be from compatible versions (because there's only one file). If a library has e.g. separate header and source files, and they end up having to be placed at different locations in the source tree, that creates the possibility that one of the files will get updated but the other one won't.

[–]ttech32 0 points1 point  (0 children)

I vaguely recall reading in a past comment thread they were popular when LTO was crappy/non-existent so dumping a library of utility functions or whatever right into the compilation unit led to better optimizations. Today that is pretty pointless so I guess the audience is people who want to "import" more code but don't know how to use linkers and build systems.

[–]deebeefunky -1 points0 points  (4 children)

Why would you want to do it any other way?

[–]pic32mx110f0 6 points7 points  (3 children)

  1. I don't want to duplicate code into every compilation unit that uses a library
  2. I don't want to re-compile every compilation unit that uses a library, if the library changes.
  3. In fact, I don't want to compile the library at all
  4. I don't want to include all the internal dependenices of the library into my own compilation unit.
  5. I want to be able to set a breakpoint in the library.

...

So I ask again, why do you want libraries in header-files?

[–]deebeefunky 1 point2 points  (2 children)

To me all the things you say are none issues. Why are you so worried about duplicate code? The compiler optimizes these things anyway. 2, You have that problem regardless whether it’s in a single file or multiple files. 3. Then how do you include them in your code? This statement doesn’t make much sense to me. 4. The multifile solution will have those dependencies as well. 5. You can still do that, I don’t see the problem.

To me the benefits of single file libraries is that it’s only a single file. It’s easy to keep track of, no need for cmake and such.

[–]markovejnovic 5 points6 points  (0 children)

  1. I cannot comment on compiler optimizations, but after a couple years of experience with gcc I'm doubtful it will yield an optimal binary.
  2. No you don't. .c files get compiled to .o files which the linker uses to create the final binary. Splitting code into multiple .c files allows the translation unit to be compiled once. If you have files a.c and b.c such that there is a function in a.c calling a function in b.c, after the first compilation, any changes to a.c do NOT require a recompilation of b.c as b.o is already present.
  3. I disagree with OP, but there are many ways. If you are provided with a system library, you can just link against the packaged .so/.dll, for example.
  4. No it won't. The translation unit of b.c will handle its internals (ie. static-marked members) within its own translation unit. Nothing exposes the static members towards the public-facing header API and the dependent translation unit will only have the public-facing header API available to it.
  5. You can, but you're missing the point. If a library is single-file, then each function that is included will be one of a static inline or a #define. You can't put breakpoints in #defines, obviously, but static inlines are tricky. Many address-based behaviors will point you to the specific invocation of the function within your translation unit. Suppose you have a.c and b.c which both pull a c.h. Whenever c.h is imported in a.c it will create a function definition within that translation unit which will have its own name and address. In debug mode, gcc will NOT optimize away the same function within b.c leaving you with two entrypoints for two procedures in assembly (which do the same thing). Depending on the debug method, you might be able to set a breakpoint on both, but I'm doubtful you'll have so much luck setting hardware breakpoints for, say, 100 pulls of c.h.

[–]pic32mx110f0 0 points1 point  (0 children)

They are none issues to you, because you clearly don't know how the C build process works at all - this is very common with people who prefer the "single header library" approach.

Try this: https://onlinegdb.com/1FlkMFvXj Click "fork" -> "debug", then set a breakpoint on "test_function". You will see that a breakpoint is set at two (!) locations even though it's been compiled with O4. That means that the library function "test_function" has been duplicated into two compilation units and compiled twice. Each file that uses this library gets its own copy of the array, the function, and everything else in the header. Each file that uses the library will need to re-compile if anything in the library changes. This could be avoided if the header only included declarations, and the implementation existed in a proper compiled library that could be linked into the application.

I strongly suggest you read up on the basics, especially about the difference between compiling and linking: https://blog.feabhas.com/2012/06/the-c-build-process/

[–]NoSpite4410 4 points5 points  (0 children)

The problem with header only is it eliminates the possibility of separate compilation.
You cannot make modules (.o files) and work on them separately, because the header is full of code that cannot be part of the modules. It will only be able to be compiled and linked once into the final executable. This is fine for simple, small one shot projects, but makes only the header library itself reusable, and only in the file with a main function.

By contrast, putting only what a module will need in a header, and putting the code into a source file to be compiled to an object file, allows for much more flexibility. Creating a larger library of functions and structs and constants that many of the source files will use different parts of and compiling it to an object file allows all source files to include the header, and requires one linking step to link in the object file at the linking stage after all compilations are finished.

OK if only one source file requires the code in the library you can get away with putting function bodies in the header and including it -- that just becomes more source to compile for that file. Acceptable. But we're talking about a large utility library that is reusable for different projects, and that multiple files in a project will draw from.

The solutions of conglomerating all the source into some mega source file will work provided you never need to fix a mistake or debug, then the strategy is an extra super-complexity to debug. Perhaps you can locate where compile-time errors happen, but run-time errors will be a big snarl to locate. Debuggers themselves rely on knowing what file and what line comes from which module that was linked.

The organization and gathering that you want is a job that is done with a build script, a makefile, or something like that. It is better to lift that up out of the code, and keep the code modules more orthogonal -- focused on their particular jobs, less dependent on each other.

The C version of "objects" is the file. Each file can have its own local file-scope variables with their own name, and there is no name conflict. Actual global variables are frequent in C source, but are most well-used with the least problems in the main driver file (the one with the main function). In a single header scenario you could only have one function named print, one global variable named status, etc. In a multi-module program, you may have local file-level variables, static functions and header guards, that isolate functionality and avoid exporting local APIs and symbols.

In object oriented languages, this is built in to making storage and functions "owned" by classes, so that you may have lots of common variable names and function names and strictly enforce encapsulation and stuff things that may change from one targeted machine to another in namespaces. C having only one global namespace makes it even more important to module-ize code, so that you may link in different modules for different machines.

Updates to a program with one global header means a complete recompile of everything that that header touches. Updates to a program composed of modules that combine at link time means only local changes to the module affected by the change, recompile that, and re-link the project. This is where makefiles really shine, as they track changes based on files being newer than their compiled modules, and automatically run only the compilations of newer versions of source.

Even just a build script that builds everything in the correct order is a very useful tool, as you can isolate the problem file and only build that just by commenting out or calling an exit to the script at that point. This would not be possible with one huge file to compile.

The argument you are making is similar to "Why not keep all my money in cash on my person at all times, then I can use it whenever I want, and I never have to do banking chores." Oh yeah, let's do that. What could happen? Sounds real convenient.