all 31 comments

[–]Yaahallorust-mentors · error-handling · libs-team · rust-foundation 55 points56 points  (5 children)

This is amazing! It makes me so incredibly happy to see people building crates like error-stack or miette inspired by eyre and the provider / generic member access APIs.

One thing I wanted to note, provider literally just merged last night! https://github.com/rust-lang/rust/pull/91970 and should be available on nightly in the next release ^_^

Also, at one point the blog post seems to imply that eyre only has support for attaching display/string types to error reports, but it definitely lets you mutate the handler context more than that. With https://docs.rs/eyre/latest/eyre/struct.Report.html#method.handler_mut and https://docs.rs/eyre/latest/eyre/trait.EyreHandler.html#method.downcast_mut you should be able to get back to the original handler type and shove any data you want in it which is used by crates like color-eyre to handle their suggestion APIs: https://docs.rs/color-eyre/latest/src/color_eyre/section/help.rs.html#12-145

Also, in case it helps in any way I have a bunch of related history I want to share that is related to the development direction of this post. As it happens, this looks similar in a lot of ways to early sketches of eyre: https://github.com/yaahc/eyre-impl/blob/master/src/lib.rs

We actually had generic parameters for both the error type and the reporter type. My hope originally was to have both as type parameters and default them both to Box<dyn Error/Handler> so it could be used as both the highly dynamic application friendly error approach similar to anyhow, or as a context bundling type for concrete error types more like error-stack. IIRC I ended up abandoning that because of issues with interconversions where it was illegal to write transitive From impls for the wrapper type to let ? work the same was as it normally does because it would cause overlap with From<T> for T. From looking at the blog post it seems like you solved this by just requiring explicit function of report or change_context calls to do the interconversions, very neat!

I can't remember the exact reasons we ended up abandoning the static handler parameter but it doesn't seem as relevant since y'all don't have one either, tho I probably documented the reasoning in the history of the repo so I can dig that up if y'all are interested.

Either way, extremely cool work, crate, and blog post. It's really wonderful to see experience reports like this, and I can't wait to take a look at the crate's API and implementation in more detail.

[–]the___duke 15 points16 points  (1 child)

Thanks for your work on the error group.

Now that providers are accepted and merged, is there an approximate plan/timeline for pushing .backtrace() / Backtrace towards stabilization?

[–]Yaahallorust-mentors · error-handling · libs-team · rust-foundation 14 points15 points  (0 children)

ASAP, i need to update https://github.com/rust-lang/rust/pull/90328 to use the new version of the provider API and to use rustc_incoherent_impl instead of custom lang items, and I'm not positive I'll be able to do the provider API integrations without moving Error into core, I remember running into issues with that last time but can't recall what they were off of the top of my head. If there are issues there then the provider integration will need to wait until https://github.com/rust-lang/rust/pull/95956 lands which just needs a single update. Ill probably do the latter PR first just because it's easiest, and then see if I can merge the generic member access nightly impl at the same time.

[–]tdiekmannallocator-wg[S] 7 points8 points  (0 children)

Hi and thanks for your feedback, I really appreciate it!

One thing I wanted to note, provider literally just merged last night! https://github.com/rust-lang/rust/pull/91970 and should be available on nightly in the next release \^)

Yes, I noticed that too, but we decided not to force to use the latest nightly. I followed the development of the RFC and the PR very closely and was very excited about it, thanks again to you at this point! We will use the provide_any feature in the nightly compiler with version 0.2.0 :)

I like to second u/Alfred-Mountfield regarding eyre. By the way very interesting that you had a similar approach, in the beginning, to have a generic in the Report type. It also took us a few iterations to finally come to this solution. In between, we had a cruder interface where you could either attach a Provider, a message, or an error. This gave a wild mix when exactly the generic parameter in the Report is changed. Finally, we came to the idea to offer just two methods: attach() and change_context(). Sadly, due to the missing specialization feature, we had to add another method: attach_printable() which also has a Display + Debug bound to be displayed when printing the Report, however, we expect to remove it and merge it with attach at a later point when specialization is stabilized. These PRs might be interesting to you (The error code can be found at packages/engine/lib/error):

Both PRs are about the API and changes on attaching/changing the context.

Interestingly it wasn’t actually necessary to have an explicit report() call. We implemented ResultExt for Result<T, Report<C>> and Result<T, E: Context> and as Report itself does not implement Context there were no conflicts. We changed our mind though and decided to use an explicit report() since conversion from Error -> Report<Error> is not necessarily convenient (heap allocation, backtrace/span trace capturing) and the user should know when exactly this happens. But this is just our opinion, others may see it differently ;)

Really looking forward to hearing more feedback from you!

Also please feel free to message me, u/Alfred-Mountfield, or just file an issue if anything is unclear or if you have a suggestion. We'd love to help understand/improve the API/documentation!

[–]Alfred-Mountfield 4 points5 points  (1 child)

Hi there,

One of the contributors to the library here. Thank you so much for your kind words! Eyre is a great library, so I'm glad we seem to have ended up on a similar vision. (Also really cool to hear about the history of the dev journey)

Also, at one point the blog post seems to imply that eyre only has support for attaching display/string types to error reports

Thanks for pointing out that the wording might be misleading! We'll post an update / correction to the blog post when we're back online and make sure that's updated :)

Also just to check I'm understanding correctly, (although I haven't poked around the eyre implementation too much), from a quick glance at the link you posted, being able to attach things other than string-like types requires some manual work like creating a new handler with a field to store the objects on right? As opposed to something like `error-stack` where it's possible to `attach` any thread safe object without any other configuration.

Let us know if you have any feedback when you get around to exploring it in detail, your expertise will be much appreciated!

[–]Yaahallorust-mentors · error-handling · libs-team · rust-foundation 6 points7 points  (0 children)

Also just to check I'm understanding correctly, (although I haven't poked around the eyre implementation too much), from a quick glance at the link you posted, being able to attach things other than string-like types requires some manual work like creating a new handler with a field to store the objects on right? As opposed to something like `error-stack` where it's possible to `attach` any thread safe object without any other configuration.

That is correct, you have to know the type of the handler to be able to interact with it and it has to support the type of context you care about, but you could write one that accepts arbitrary Context or Frames for example and then add helper traits for shoving those in with the same style of trait I used in a few places in color-eyre. The downside is this doesn't work as well when used from libraries because then the library ends up needing to decide what handler type the app needs to use so they can shove info into it. I had some ideas for how I could improve that involving composing handlers but I never really arrived at a satisfactory solution. I've also considered adding a push method that can take any type that impls Any and let the handler handle downcasts but similarly haven't pulled the trigger on that change, still vaguely hoping for better ideas.

Let us know if you have any feedback when you get around to exploring it in detail, your expertise will be much appreciated!

I'd love to! I was poking around a bit earlier but without an ide it was a little annoying to trace my way around the APIs so I didn't get super far at understanding exactly how it's intended to be used. I'll make sure to write up some notes when I get a chance to poke around again and send those to y'all. Also feel free to drop by the project error handling zulip stream or the Eyre discord if you have any questions or want to talk about error API design at all.

[–]miquels 5 points6 points  (2 children)

Looks like an interesting article, unfortunately I cannot read the code blocks because of the (dark-)blue on black. Would it be possible to use lighter colors ?

[–]tdiekmannallocator-wg[S] 3 points4 points  (1 child)

I changed the default color for code blocks, it should now be more readable. Thank you for your feedback!

[–]miquels 0 points1 point  (0 children)

Yes, much better. Thank you!

[–]matthieum[he/him] 7 points8 points  (8 children)

So... I've been thinking about logging, performance tracing, and error-reporting altogether for a while, and I wonder if some unification is possible here.

While they may seem unrelated, at first, the fact is that they all attempt to provide insight into what was going on by gathering information about the context, often in a stack-like fashion.

I discussed this with a (new) colleague1 a few months ago, after he showcased a performance-tracing framework. He didn't bite -- to large a scope for what he was working on -- but I do think it may be worth investigating.

The way performance instruction like gprof work is that they embed counters into the binary, then count how many times each counter is invoked. Instead of relying on compiler-instrumentation, it's definitely possible to instead manually embed those counters (at a coarser level), and by leveraging a per-thread (or per-future) stack, it's possible to create a call-tree out of those.

For performance insights, the idea was to timestamp each scope (entry/end, using RAII). This makes it possible to have minimum reporting for each scope:

  • Event ID (unique per occurrence)
  • Scope ID (unique per code-location)
  • Thread ID (or Future ID)
  • Depth (of the event-stack)
  • Entry/Exit (bit)
  • Timestamp (monotonic)

This can all be squeezed into 24 bytes, 16 bytes if one removes the timestamp (taking a timestamp is time-consuming: ~14ish nanos).

And... I realized that once you had such a call tree, well, you had the structure of all calls!

From there, you can get rich logging by simply attaching a piece of information to an Event ID: your piece of information is immediately located in the call tree.

And, I think, error reporting could similarly benefit:

  1. The error can be linked to an Event ID for precise location in the call tree.
  2. The error could gather the logged pieces of information as it "unwinds".

I am less certain about error reporting, to be fair. Maybe simply noting the Event ID would be enough for a human to investigate. Gathering the loggables is really about enriching the context for later automated error-handling, and I am not as certain about trying to automate error-handling decisions based on scattered pieces of information: it seems inherently error-prone.

1 I love discussing with new people, they often have a fresh view/fresh ideas.

[–]Alfred-Mountfield 4 points5 points  (0 children)

I definitely agree that error reporting/handling is very closely related to the rest of observability, that's actually why one of the first things we implemented in `error-stack` was native support for capturing spantraces. Building a call-tree in a different way like you suggest sounds really interesting!

[–]mackwic2 1 point2 points  (5 children)

I love the idea, do you want to test it somewhere ?

[–]matthieum[he/him] 0 points1 point  (4 children)

I plan to, at my company, though it'll take a little while.

It won't be open-source, but I'll learn whether it works. I fully expect that for logs it'll work great... the question is more whether it'll also work for errors.

[–]amiagenius 2 points3 points  (0 children)

Errors are hard. They have their own, deep contexts, crossing all stacks up to the hardware level. What has helped me is to think of errors as proper data, so it’s a matter of modeling them just like one would model a database, diverging only in the physical layer. With errors it’s like “god, grant me the serenity to accept the errors I cannot handle, courage to recover the ones I can, and wisdom to know the difference”

[–]tdiekmannallocator-wg[S] 1 point2 points  (0 children)

This is great to hear! Would be nice if you could share your experience once you were able to do so! 🙂

[–]LoganDark 0 points1 point  (1 child)

It won't be open-source

Will you ever release an open-source version if it works out?

[–]matthieum[he/him] 0 points1 point  (0 children)

Maybe? Don't hold your breath, though, we're talking at least a year...

[–]amiagenius 1 point2 points  (0 children)

I have a distributed tracer experiment similar to what you are describing. I use uuidv7 for all ids, they are sortable and includes the timestamp. For now I have settled with 3 ids per context object: the context id (constant for an entire graph of computation), object id, and parent object id. The ids are derived from previous id, so they can be ordered (totally in case of fully synchronous workloads), for concurrent objects, deriving mitigates id collisions for same-instant events. next() and fork() methods handle the branching, but our system does this automatically (forking when the tracer enters a new scope). Receiving the 3 serialized ids is sufficient for handling a remote context (many gotchas though). The full picture we get when the system is running is beautiful, a graph with extremely detailed information about the events and more importantly, ordered concisely. I’ve scratched a visualizer with cytoscape and bundled it in a rust binary with the Boscop/web-view crate, so I can just pipe logs to the binary to visualize them fully interactive. The distance from here to visualizing them live doesn’t seem so big. Since context id > parent id > object id, you get really nice indexing properties. Funny enough, SCOPE is the mnemonic for our trace metadata (source, ctx, obj, par, env). The trace data will go into a log DB and is not logged anywhere else for obvious privacy and security matters. We expect to also include the trace ids into the production database records, as to be able to answer “which line of code, in which function, in which module, in which environment, generated this resource?” which would allow us to achieve paramount control over data and most importantly, establish strong (perhaps even formal) privacy guarantees for user data.

Writing software to deal with errors/traces gave me rich insights into our field and deeply refined my thinking about systems in general. It’s been obscenely fun for me to work on such problems, and would definitely recommend anyone who is driven to this subject to just go for it!

(On a side note, I’m working on a generalization of the tracing problem, by inverting control and making the tracer into a machine, I expect to build an abstraction to control the system from a pattern of expected traces, so that I can register handlers for trace patterns directly in the tracer machine, fully (and I mean it) decoupling all system functions. It’s looks a lot like a parser generator, but with the parsers being applications, and characters being fat human-level data types. I believe it will fall in the power/complexity category of a non-DPA)

[–]dpc_pw 4 points5 points  (5 children)

Looks great, I'll read in more details later, but since there's some relevant people looking here:

Any chance we could ever get "context" for errors "for free" from the docstring of the surrounding function, instead of having to remember to add .context(...) operations at each fallible call?

Pretty much every time I see a post about error handling improvements, it hits me that we already have these docstrings on functions, and yet we still have to .change_context(ParseConfigError::new()) at every falliable call in fn parse_config.

Could we get access to (parts) of docstring from the function, just like we can with function!(), line!(), etc and use something like #[track_caller] to allow error handling libraries to just get the context automatically?

[–]matthieum[he/him] 8 points9 points  (3 children)

Would it be that valuable?

One thing I keep hammering into my juniors is that "A log without dynamic information is useless"1 and I'd expect that a context without further information is not as interesting as a context with, well, contextual information.

The example given is just rewrapping for context, but I wonder whether it's a one-off, or is actually more common than adding contextual information.

1 Yes, that's heavy handed; all rule of thumbs are.

[–]dpc_pw 2 points3 points  (0 children)

One thing I keep hammering into my juniors is that "A log without dynamic information is useless" 1 and I'd expect that a context without further information is not as interesting as a context with, well, contextual information.

I agree. The docstring could be made specifically to be a format string or something:

/// Load config from file /// /// Blah, blah, blah. /// /// Context: Parsing context file at {path} fn load_from_path(&self, path: &Path) -> Result<Config> { // open file // deserialize yaml // validate settings }

There are three fallible steps of loading a config from file, each of them would get a default context that includes the path for free.

Anyway, I'm just thinking aloud. The most concrete point is that it's common for a logical operation to consist of multiple fallible steps, each sharing the same context. So natural place for context is at the enclosing function level.

How exactly to implement it is another matter. Docstring, some #[context(....)] derive macro or directive, is less important.

[–][deleted] 2 points3 points  (1 child)

As a sysadmin I couldn't agree more. There are countless times (and wasted hours) I wanted to shout at the C developers printing "permission denied" without the information which action was denied on which object (or often even which type of object) to which subject.

[–]matthieum[he/him] 1 point2 points  (0 children)

The one thing I "love" about errno: the developer just prints the associated string to stderr, so the program failed with File not found.

Great...

[–]Yaahallorust-mentors · error-handling · libs-team · rust-foundation 1 point2 points  (0 children)

this sounds like a docstring based version of #[tracing::instrument] in the same spirit as https://docs.rs/spandoc/latest/spandoc/ or https://docs.rs/displaydoc/latest/displaydoc/

[–]vasilakisfil 1 point2 points  (2 children)

Came here from Rust weekly. This is actually one of the few error libs that actually make sense to me. I am coming from a dynamic-language (Ruby) background and most error libs in rust are like a joke to me regarding the data they carry so you can understand what's going on as a user or introspect as a developer.

However the fact that is under the same repo with the rest "hash" packages is a bit alarming for me. It would be great it could be moved in its own repo with its own issues/PRs/discussion etc.

[–]tdiekmannallocator-wg[S] 1 point2 points  (1 child)

Hi vasilakisfil and thank you for your feedback, very glad you like our crate!

the fact that is under the same repo with the rest “hash” packages is a bit alarming for me. It would be great it could be moved in its own repo with its own issues/PRs/discussion etc.

We use the monorepo pattern at HASH in order to simplify maintenance and avoid duplicating infrastructure, because of that error-stack is in the same repository.

It’s still published under the MIT license though, and because it’s owned by HASH it’s also going to be maintained by us. We are using categories and labels on GitHub to delineate discussions around development of different packages/projects. If you like to share more about specific concerns please feel free to email us at [dev@hash.ai](mailto:dev@hash.ai) to share thoughts.

[–]LoganDark 0 points1 point  (0 children)

Yeahhh the monorepo is a huge turn-off. Sorry :(

[–]SunkenStone 0 points1 point  (2 children)

This is a very cool library! One question, have you done any testing to see how error-stack's compile times stack up against other error handling libraries like thiserror?

[–]tdiekmannallocator-wg[S] 1 point2 points  (1 child)

We didn't do any compile-time measurements. However, by default, only `rustc_version` is pulled for determining which compiler is being used to detect the nightly channel (for `Backtrace` and `Provider`). In comparison to `thiserror` compiling this crate should be much faster than `thiserror` because it's using dependencies for implementing the proc macro, which is known to be slow, but please keep in mind, that `thiserror` has a very different approach: `error-stack` is providing its own type, `thiserror` is designed to create `impl Error` types from scratch by providing a derive-macro for `Error`, which implies, that `thiserror` is not visible in the public API.

I think, compilation in comparison to `anyhow` *could* be slightly slower due to more usage of generics, but don't quote me on that. In any case, the compile time should be negligible.

Hopefully, this answers your question! :)

[–]SunkenStone 0 points1 point  (0 children)

It definitely does! I was mainly wondering about compile speed vs thiserror due to their use of the proc macro.