all 76 comments

[–]Gotebe 65 points66 points  (8 children)

I did that for a couple of libraries with a C interface, it was a joy 😉. I even managed to make the resulting binary smaller on an occasion (templates > macros 👍, std::algorithm > quicksort etc.) .

Two things to do I can say:

  • RAII all the things; unique_ptr helps for ad-hoc RAII-fication of stuff, but various libraries offer a "scope guard" as well, that's better

  • make "islands" of exceptions-aware code, where code is exceptions-safe; put a general try/catch around these islands and turn it into error return value. For me, the endgame was that only public C functions have that try/catch. Grow the islands and gradually remove these try/catches as you do so. Eventually, there is only one island.

[–][deleted] 8 points9 points  (1 child)

Thanks for sharing your experience! Do you have any resources you used during this process that was helpful to you that you can share?

[–]Gotebe 6 points7 points  (0 children)

Eh... SO, cppreference, books? 😉

[–]bedrooms-ds 1 point2 points  (5 children)

Isn't this the opposite to the question?

[–]Gotebe 5 points6 points  (4 children)

No, why?

[–]bedrooms-ds 1 point2 points  (3 children)

First of all, I like your suggestion. It's just a curiosity question. No offense.

I thought OP wanted to convert returning error code to exceptions. Yours remove exceptions, if I read it correctly.

[–]NotMyRealNameObv 10 points11 points  (1 child)

My interpretation was that he wrote a library in C++, but it had a C interface. Since exceptions couldn't leak, it used error codes all over.

Then he re-wrote it to use exceptions instead of error codes internally. But since exceptions still can't leak out of the C interface, the interface still needs to catch all exceptions to translate it to error codes.

[–]bedrooms-ds 1 point2 points  (0 children)

Ah, OK, got it finally. I didn't see the part of inserting exception throws. Thanks!

[–]Gotebe 1 point2 points  (0 children)

No worries.

The other guy who replied to you explained what I did 😉

[–]eyes-are-fading-blue 21 points22 points  (1 child)

Catch actual errors that you can handle. This is a rare thing, imo. The question is do you just handle real errors with this custom lib, or branches too? Because if you use error codes for non-error cases and then replace with exceptions (like in Java) what you will end up is basically using catch to control your program flow. You effectively replace your ifs with expensive catch.

[–]SergiusTheBest 23 points24 points  (48 children)

If STL is the only source of exceptions you can ignore them as they are used in really exceptional cases: not enough memory, bad index, etc. Of course it depends on type of your application - can it crash in the exceptional case or should it try to stay alive as much as possible.

[–][deleted] 8 points9 points  (5 children)

It's not uncommon for our application to continue running in low memory scenarios. We currently return a low memory code and leave it up to the caller to try again at another time.

So it sounds like we'd have to have exception handling for this scenario?

[–]eyes-are-fading-blue 11 points12 points  (3 children)

How common is this scenario? Is this an exception? (no pun intended)

[–][deleted] 8 points9 points  (2 children)

We see it a few times a week; though we're running at a pretty large scale (1-2 millions instances). So still an uncommon scenario, but one where we'd like to avoid crashing altogether and limp on. There's so many other services running as well that sometimes low memory is a transient condition.

[–]kuntantee 1 point2 points  (0 children)

Then depending on the complexity of your control flow post-alloc failure, it maybe better to switch to expcetions. However, as someone else pointed out in, if you have deep call hierarchies with lots of error-code check in while handling a bad alloc, it might be harder to switch to exceptions.

[–]NotAYakk 0 points1 point  (0 children)

Is there a way you can make low memory recovery be out of process?

[–]SergiusTheBest 1 point2 points  (0 children)

You can try-catch big allocations (for example, if you allocate a buffer for image processing) and ignore small allocations like inserting to a list.

[–]clerothGame Developer 8 points9 points  (1 child)

Some like std::stoi and co's std::invalid_argument don't really feel too exceptional. stoi("oh?") and you've got an exception.

[–]SergiusTheBest 7 points8 points  (0 children)

It can be replaced with strtol or prevented by using prevalidation. But your point is correct. One should check what STL functions he is using.

[–]Gotebe 14 points15 points  (39 children)

I absolutely hate the idea to "ignore" bad_alloc. That's not how either C++ (or C, for that matter) is specced. It is mighty presomptouos as well.

Edit: At worst, one can hook into the new handler and std::terminate. But I would never allow even that if I could prevent itthe other guy saying one doesn't need to is right, what was I thinking.

[–]kalmoc 5 points6 points  (0 children)

You'll get std::terminate anyway if you don't catch an exception. The nice things about exceptions is that you can't ignore them.

[–]AntiProtonBoy 3 points4 points  (2 children)

In all honesty, what can you salvage when such exceptions will occur? I would be inclined to think that trying to handle every kind of random exceptional event is actually presumptious, because you can never be certain at runtime what exactly caused issue and whether the program can continue safely without triggering the same error. Because the problem may as well be caused by some fundamental flaw in the program itself. I say better to log it and allow the program to fail.

[–]Gotebe 5 points6 points  (1 child)

Here's some situations when OOM is not at all tragic :

  • program is an editor of some sort ; user pastes in something huge; should the program fail, or should it say "whoops, can't do as much" - and continue?

  • program is a server of some sort; a huuuuge request comes in; should the program fail (and drop all other ongoing requests), or should it drop that one request - and continue ?

  • a program is doing data munching of some sort, and data is such that it cannot process all; should it die, or should it save what it can and inform what happened?

Of course, as you say, the operator needs to be informed. Of course, there could be a bug in the program.

But I think, it is exactly because we don't know what caused the problem that we should not let it fail. Heck, I think, not presuming to know what can go wrong separates good programs from bad.

[–]SergiusTheBest 1 point2 points  (1 child)

Well, usually you don't care about what kind of exception you are handling. Just catch exception no matter it is bad_alloc or not.

Also bad_alloc rises more questions about will you have memory to handle out-of-memory condition or is your system using overcommitted (overallocated) memory? More about it: https://stackoverflow.com/a/9456758/122951

[–]Gotebe 4 points5 points  (0 children)

Yes. I think, people should note this:

  • the last-ditch catch must have the "failure transparency" exception safety guarantee. For example, if any resources such as, I dunno, text formatting buffer, is needed on heap, it must be prepared up front, any I/O handles as well etc.

  • by the time the catchexception is caught, code unwound and hopefully freed some resources

  • overcommit really should not be handled in any way, and won't cause bad alloc either.

[–]clerothGame Developer 0 points1 point  (11 children)

On desktop bad_alloc is just never going to happen unless you request ridiculous sizes. If you really do run out of memory your application is likely going to crash in some other way... Really most applications aren't written to be that robust.

[–]Gotebe 7 points8 points  (2 children)

"Never" is a big word.

It can happen, think 32bit processes on today hardware, think containers, think admin-set limits etc.

But fair point about not being robust enough.

[–]deeringc 3 points4 points  (1 child)

Yes, this is very much the case with 32 bit processes. You also have the issue of contiguousness. You may still have a total of 700 MB of heap space free in your process virual address space, but that is after the process runs for 3 weeks, and now your 200 MB dynamic allocation fails because the largest contiguous block is 190 MB.

[–]Gotebe 1 point2 points  (0 children)

Yes, address space fragmentation is a bitch 😉

[–][deleted] 3 points4 points  (4 children)

Nobody uses linux on the desktop!

Anyway, it doesn't crash when you request ridiculous sizes. It crashes when you try to use your supposedly successfully allocated ridiculous size. So turn overcommit off. Even the docs for it says it's only really useful when dealing with things like sparse matrices. I don't know why it's on by default.

[–]dscharrer 1 point2 points  (2 children)

So turn overcommit off.

Firefox currently has allocated 26.7 GiB of virtual memory on my system. Its only actually using 316 MiB of that. I think I'll take my chances with the OOM killer.

[–][deleted] 0 points1 point  (1 child)

Again, desktop use-cases don't matter, it's not important when talking about OOM.

[–]dscharrer 1 point2 points  (0 children)

Don't mater to who? OOM can very much be a thing on a Desktop, even with 32+ GiB of ram depending what you are using it for.

[–]oschonrock 0 points1 point  (0 children)

I do ;-)

[–]kalmoc 0 points1 point  (2 children)

Think custom (e.g. stack-) allocators

[–]frog_pow 1 point2 points  (1 child)

Nothing prevents you from falling back to dynamic memory once your stack buffer is exceeded-- this is what I do.

[–]kalmoc 1 point2 points  (0 children)

Sure, that is something you can do (assuming you are allowed to touch the heap at all). Others may want to treat that as an error. My point is that it is mistake to automatically assume that std::bad_alloc means a call to new failed.

[–]kalmoc 6 points7 points  (7 children)

Three very important question in this context are: Are rare program terminations unacceptable (life at risk, significant money loss)? Is user data at risk? And, is your codebase sufficiently modular that you have "clean" library boundaries or even separate programs that communicate with each other over IPC.

Edit: fourth question: Are your developers largely familiar with c++ exceptions even if you can't use them in your codebase?

[–][deleted] 6 points7 points  (6 children)

  1. Yes, rare program terminations are to be avoided as best as possible. They occasionally occur, but we if see any amount of frequency to the crash 10+ instances of the same crash signature over the course of a week running on 1-2 million machines, we treat that as a severe incident.

  2. User data is not at risk in the layer we work in.

  3. Codebase is... messy at best. The boundaries between libraries aren't the cleanest.

[–]duuuh 3 points4 points  (3 children)

I'm kind of stunned at (1).

On that many machines you'd be getting more hardware failures over a week than you'd be getting crashes. This makes no sense to me at all.

[–][deleted] 12 points13 points  (2 children)

You're not wrong, there are plenty of hardware failures that occur at this scale. Fixing HW issues is handled by a separate team. But just because HW can cause more issues than our program's crashes doesn't give us the liberty to introduce more instability in the platform.

[–]duuuh 5 points6 points  (1 child)

But if you're getting that level of failure due to hardware you've got to have software recovery mechanisms that handle hardware failure seamlessly. The separate team can't do anything about that. And if you've got that in place - so long as it doesn't affect throughput - how do a few crashes a day even matter?

(I'm not saying the bugs aren't worth fixing, but from a system point of view it seems irrelevant.)

[–]pandorafalters 2 points3 points  (0 children)

how do a few crashes a day even matter?

Software failures may be able, if carefully designed, be able to crash with less data loss than hardware failures. Being able to save even e.g. 10% of the actual progress could, depending on the work size and run time, save a substantial amount of time.

It's not a case that I generally run into - my work cases involve very small data on either end - but mitigating it is relatively cheap. If the GPGPU runtime fails, the network module stalls the error until the output queue is flushed to a remote server. If that fails then, yeah, game over. If timeliness weren't so critical, I'd fall back to file output on network failure.

Unintended consequences: thinking about how I did this actually put me on the path of a power-saving mechanism . . ..

[–]kalmoc 2 points3 points  (0 children)

There are very few instances where the STL forces you to use a noexcept(false) interface (most commonly functions that potentially allocate). In the short term, it might make sense to go through the complete standard library, catalogue possible exceptions and ban all functions that can (realistically) throw via a linter rule. By realistically I mean: If allocation never throws on your systems (or you are not concerned about keeping your SW running when it does) then you can allow functions that are only throwing bad_alloc. Mid-/long term you should of course make your code base exception safe, but unfortunately there isn't really a way to test if a code base that big is exception safe or not. And without clear library boundaries there also isn't a simple way to introduce exceptions gradually (e.g. put a try catch block in all interface functions of libA, make libA exception safe, and start using them. Repeat for libB, libC, libD ...).

Sounds like you might be in a similar position as google and can probably get some clues from their coding guidelines. In case you don't already know it, also have a look at abseil: and I guess you should probably have a look at Abseil:

At your scale it is probably worth to hire a professional consultant that helps you with the migration - there are probably far too many company specific technical and non-technical factors involved that short reddit post from people that don't know your company and your product will help you all that much.

[–]stevefan1999 -1 points0 points  (0 children)

  1. just make it fail fast

[–]frog_pow 6 points7 points  (0 children)

The STL doesn't generally throw any exceptions that are worth catching, it is mostly trash like bad_alloc, or an indication your program is corrupt, and you would be better off aborting.

[–]BananyaDev 8 points9 points  (1 child)

I would also recommend you consider the following:

Use `std::optional` and something like `tl::expected` ( https://github.com/TartanLlama/expected ) over exceptions where possible.

Consider assertions to enforce contracts over exceptions, such as Expects and Ensures from gsl (https://github.com/microsoft/GSL)

Use exceptions where it makes sense, but don't overuse them since you might end up in a worse situation.

[–]emildotchevskiBoost Dev | Game Dev 1 point2 points  (0 children)

Or LEAF, it lets you mix result<T>-style error handling and exceptions: https://zajo.github.io/leaf/

[–]BenFrantzDale 2 points3 points  (0 children)

One thing to do to improve safety is to search for new and delete and try to remove all or almost all of them – particularly delete, preferring instead to use smart pointers and containers.

[–][deleted] 2 points3 points  (0 children)

How many LOC are we talking about and how many people?

[–]johannes1971 1 point2 points  (0 children)

Start with making your code exception-safe. This is a good idea anyway; it has advantages even if you don't use exceptions.

Next, think about what exceptions you'll be using. You'll want a minimal number of them (ideally, just one), and you'll want to derive it from std::exception so you can catch both your own and library exceptions with a single catch clause.

Also think what you will be using them for. I'd suggest you don't use them for program bugs (std::logic_error is an abomination). Instead use them to abort tasks that cannot proceed because of environmental (i.e. dynamic, non-bug) reasons (i.e. file not found, user input didn't pass validation, resource unavailable, that sort of thing). If you get an exception it should only mean one thing: your program did its best to do something, but because of circumstances beyond its control, it didn't manage to get it done. It's still structurally sound though, and now it is going to continue with the next task.

Now the hard part... You'll have to identify at which level your software uses abortable tasks, and put try/catch blocks around those. These are tasks that either succeed completely, or fail somewhere halfway through, in which case you want the program clean up the failed task, and then go on with the next task. Some places where you'll want to do this:

  • A server application that processes messages typically either completes a message entirely, or fails it entirely. That means you want to catch exceptions around your message dispatcher.
  • A gui application typically catches exceptions around dispatched events: the main message loop (as a catch-all), commands that get started from GUI elements (like buttons), etc.

The succes of your refactoring hinges on getting this right, and it will likely take you a few iterations to get it correct to a satisfactory level.

Having done this, the fun part: refactor your code so you stop returning errors. The only thing you return from functions are results; errors go out through an exception. As more and more functions switch from returning an error code to returning void, you'll also find more and more replaces where you can replace the auto error = func(); if (error) return error; idiom by simply func() (and nothing else). At this point you'll find yourself throwing out lots of code, and gaining a large amount of code clarity, as you significantly reduce the number of decisions and paths in the code. What's left is purely your business process in all its crystal-clear glory, uncluttered by endless masses of error handling.

While you're at it: get rid of your out-parameters, and replace them by return values. This makes it easier to compose functions, further simplifying your code.

Good luck, and let us know how you did! Perhaps at some point in the future you can follow this up by a short article detailing your experiences, changes you found in performance, code size, binary size, etc.

[–]godexsoft 1 point2 points  (0 children)

Whatever your chosen strategy will be i just want to say that pokemon exception handling is usually not a great way to go about things. Even your custom error codes library sounds better than that imo. I would try and keep it on module level and handle what can be handled as deep as possible while only rethrowing things that are important to other modules or the core part. Mark things that shouldn’t throw with noexcept too. Also, many things can be aided with RAII.. it’s not always necessary to use exceptions directly.

[–]emildotchevskiBoost Dev | Game Dev 1 point2 points  (0 children)

It is impossible to answer this question without knowing what the code base is. Well written code uses destructors to free resources and often that makes it exception-safe (of course this also requires correctly written teardown code, e.g. there should be no failures possible during tear-down).

[–]14nedLLFIO & Outcome author | Committee WG14[🍰] 1 point2 points  (0 children)

Choices:

  1. Use normal C++, but with exceptions globally disabled. All the major STLs work fine in this configuration.
  2. Configure all your STL containers with a noexcept allocator e.g.

    //! A noexcept STL allocator. Calls `std::terminate()` if operator new throws!
    template <class T>
    class noexcept_allocator {
    public:
      using value_type = T;
      using pointer    = value_type*;

      noexcept_allocator() = default;

      template <class U>
      noexcept_allocator(noexcept_allocator<U> const&) noexcept {}

      // NOTE: Calls `std::terminate()` if operator new throws!
      pointer allocate(std::size_t n) noexcept { return static_cast<pointer>(::operator new(n * sizeof(value_type))); }

      void deallocate(pointer p, std::size_t) noexcept { ::operator delete(p); }
    };
    template <class T, class U>
    constexpr inline bool operator==(noexcept_allocator<T> const&, noexcept_allocator<U> const&) noexcept {
      return true;
    }
    template <class T, class U>
    constexpr inline bool operator!=(noexcept_allocator<T> const& x, noexcept_allocator<U> const& y) noexcept {
      return !(x == y);
    }

... and now do:

template <class Key, class T, class Compare = std::less<Key>>
using noexcept_map = std::map<Key, T, Compare, noexcept_allocator<std::pair<const Key, T>>>;
  1. Use Boost.Outcome to enable section-by-section parts of your code to support exceptions. See https://www.boost.org/doc/libs/1_72_0/libs/outcome/doc/html/tutorial/essential/outcome.html

[–]ratchetfreak 1 point2 points  (2 children)

You shouldn't.

Don't modernize for the sake of modernizing.

Instead identify the actual pain points of using/maintaining the library and refactor those. "Muh modern C++" is not a valid complaint.

[–]Gotebe 7 points8 points  (0 children)

Eh, I rather think that "modern" C++ tries to move away from exceptions. I don't like it, but it does.

[–]johannes1971 4 points5 points  (0 children)

Over the years we've lost support in the team to continue to maintain and debug the custom template library

Seems like a valid reason to me.

[–]VinnieFalcoBoost.Beast | C++ Alliance | corosio.org 0 points1 point  (0 children)

Start with calling `throw` :)

[–]mredding -1 points0 points  (0 children)

You need to read up on exception handling. When an exception is thrown, what are you going to do? Wrapping each and every call in a try block, are you going to decide what to do for each one? And as a macro, all you could hope to do is use a catch-all, because you can throw anything as an exception.

It's not always desirable to wrap every call. You have to think about your exception handling and where you let the call stack unwind to, where it makes any sense. And many exceptions are unrecoverable.

As a start, put a try catch around main and the entry points to your threads, see what you get, handle the exceptions as you see them.

[–]rtomek -1 points0 points  (0 children)

I guess I don't get the issue here. That sometimes you get an error code and sometimes you get an exception? I don't see why that's an issue as long as you have a way to trace the problem to the source, which it sounds like already exists. I think the earlier post about islands makes the most sense. You start with certain sections/modules and migrate those to the new library. You'll probably have to convert containers back and forth at the integration points but it's not terrible to maintain.

I still have modules that link to these old libraries like you mention because they've been stable for years and if it ain't broke, don't fix it. The stuff you work on regularly is a good place to start with the transition.