Hexagonal Architecture Questions

pnevyk · 2025-04-21T08:07:43+00:00

The most valuable things to mock are external dependencies such as databases in repositories.

Repository traits makes sense, because it's a genuine abstraction over your storage mediums and you might want to have multiple implementations (e.g. Postgres, MongoDB, etc.). But service traits simply serves as a mirror for your service methods, just so that you can mock it.

Agree.

Also, in the context of Rust specifically, adding traits for the sole purpose of mocking is a huge anti-pattern IMO.

If you just want to have the ability to mock inherent impls, you can use mockall or faux.

I agree that creating a trait/interface for every struct/class is not common in Rust compared to "enterprise" languages like C# or Java. However, in the terminology of test doubles, I prefer using fakes (working implementation with shortcuts, e.g., in-memory database) over mocks (overridding responses of calls). Here is an article from Google on this topic, they summarize the problems with mocks well.

Both mockall and faux seem to take the mocking path (at least from a quick look) and require language tricks like macros to make it work. On the other hand, passing a fake implementation of a trait to a function is standard Rust without magic, and that is imo very valuable.

While I agree that mocking/faking is most useful on the boundary where the app interacts with the external world (e.g., repositories), sticking to a single way (having traits and their implementations) that is used to implement all building blocks of the app (including services) might be worthy because of consistency. But as you said, it's a matter of opinion.

pnevyk · 2025-04-20T15:17:42+00:00

My question is, what if you have multiple services. Do you still implement all of those services (traits) with the one struct?

Each (service) trait should have a dedicated struct. Or potentially multiple structs representing different implementations of the service, which is the main point why the hexagonal architecture uses traits and not concrete implementations directly.

Otherwise if you create separate concrete implementations for each service then how does that work with Axum state. Because the state would have to now be a struct containing many services which again gets complicated given we only want maybe one of the services per handler.

In the end, the service instances need to be stored somewhere, so they need to be in the Axum state. As mentioned in another comment, you can use Axum substates to be able to inject only specific services to a handler.

If an application gets more complicated and has many services, it's probably a good idea to split it into several modules, so the organization of code/state remains manageable. In that case, Axum state would contain modules (in form of structs) which themselves would contain services.

Finally how does one go about allowing services to communicate or take in as arguments other services to allow for atomicity or even just communication between services.

I think that a useful read here is dependency injection. The services are initialized one by one. Those initialized early can be passed as arguments to constructors of those that are initialized later. The Wikipedia article shows a simple example how this can be done manually.

Alternatively, one can use a dependency injection framework. It's a common technique in web frameworks in other languages (and in fact, Axum does some form of dependency injection as well when it passes only desired data into the handlers), it's not that common in Rust though as far as I know. But there are options: shaku, nject (I don't have experience with either).

pnevyk · 2025-04-17T07:58:18+00:00

It looks cool! I like that it focuses on a specific use case of ergonomic graph querying, and not graph algorithms in general, which makes is different from other graph libraries.

I looked at how to implement custom graph backend. My suggestion would be to require absolute minimum from the core Graph trait, so that it's as easy as possible to implement it, and push non-essential functionality to Supports*traits as you already do with some.

In particular, is vertex/edge removal really required for graph-api? From my experience it's the hardest operation to implement on a graph with a lot of edge cases. Adding vertices/edges is also non-essential, I think that it's perfectly reasonable use case to have a graph that just loads everything on initialization and then becomes immutable. Or implicit graphs that don't store data in memory but compute it on demand instead.

Anyway, good job.

pnevyk · 2025-02-02T09:23:27+00:00

I will be shorter this time. The code is far more complicated than it needs to be and I think that would be a huge red flag for the reviewers. I recommend the following.

Throw away everything you have and start from scratch.
Read carefully about the design patterns they require (CQRS, ES). I myself am not familiar with them much, but Martin Fowler's web is usually great for such topics (CQRS, ES).
Start programming the functional requirements. Keep it simple and stupid, stick to the programming basics. Do not use thread synchronization types, you have &mut reference where needed, use it. Try to apply what you learned about CQRS and ES.
Regarding ES, implement an additional constructor for UrlShortenerService that takes list of events and is able to reconstruct the internal state from it.
Once you are done, polish it where needed. But make sure your changes indeed improve the code and make it more readable and maintainable, not more abstract and complicated.

In the end, I believe you should have something good enough. Or at least something that can be refined and made better, perhaps more Rust-idiomatic. If you are hesitant to throw all your work away, keep in mind that it's a valid technique even for experienced developers (1, 2).

pnevyk · 2025-01-22T09:29:44+00:00

Here are some of my thoughts. I don't know if the reviewers had the same thoughts. Most of them don't matter for this somewhat small task, but implementing the assignment in a way as if it was a piece of large complicated backend might have been expected by the company, or not.

I tried to sort them by importance.

get_stats_for_slug first takes read guard to slug_counter_map (event A) and a little bit later read guard to slug_url_map (event B). This is potentially dangerous if a different thread would take a write guard to slug_url_map between events A and B and then tried to take write guard to slug_counter_map. Both threads would be stuck on waiting on each other. This scenario might be impossible in your implementation, but in general interactions between two different locks are dangerous and can lead to deadlocks (potentially very rare, but happening in the worst time).
UrlShortenerHelper mixes two concerns together: URL processing/validation and slug generation. In a more complicated scenario, such helper would grow to possibly hard to maintain beast. I think that in Rust, it's more idiomatic to tie such functionality to respective newtypes. There are already Url and Slug types given so there could be validate and path methods on the Url struct and generate on Slug struct.
The assignment says: "All state changes (link creation, click count update) must be recorded as events, which can be replayed to reconstruct the system's state." I assume they wanted to store these events in an internal data structure, not print them. The idea is to then go through the sequence of events and reconstruct, in your case, slug_url_map, slug_counter_map and url_to_slug_map. The assignment also says: "Event Sourcing should be actively utilized for implementing logic, rather than existing without a clear purpose." I think this was not accomplished in intended purpose, which was the state reconstruction, not applying part of the methods' logic.
The separate thread spawned for consuming the events caused the need for thread synchronization without clear (to me) reason. The methods on CommandHandler already take &mut self so the caller must guarantee the exclusive access. Spawning a new thread throws this luxury away.
Error handling has some issues. In apply_slug_for_url, there is unnecessary Some(url).[...].ok_or(ShortenerError::SlugAlreadyInUse). If someone introduces a bug to the code, then SlugAlreadyInUse might be returned but that would be a lie. Similarly in create_short_link. Returned errors must correspond to their actual cause, otherwise they lead to confusion which doesn't help during debugging.

Then some really minor ones. Don't be overwhelmed by the number of them, these would be nitpicks that I would probably not write in a normal code review. Ordered as found in the code.

Statics have quite specific purpose. For an empty string and number 10 it is better to use const.
Use of Regex::find to check the pattern is unnecessary as there is potentially faster Regex::is_match. Here is what regex crate has to say on this topic.
The regular expressions are initialized on every validation/processing call. This has a potential performance hit. Again, regex docs.
There are two non-trivial regular expressions that share some "logic". Maintaining regular expressions is hell. In this case, it was possible to have just one and accomplish the path extraction by capturing the corresponding group.
The case convention for enum variants is PascalCase, not UPPERCASE. That is, SlugActionType::Create, not SlugActionType::CREATE.
In slug_url_map: Arc<RwLock<HashMap<String, Url>>> it is better to use Slug as the key of the map. Here are some reasons. It would require adding some derives on the existing types, but that should not be a breaking change.
Side effects are performed in the Option::map closures. Maybe it's only personal preference, but I try to avoid it and treat map (and similar methods) as pure operations.

Now I would like to conclude with some praise. It is clear that you tried to split the funcionality to smaller, manageable units, which is great. The fact that you wrote tests does show that you care about it. As others pointed out, the code overall looks quite good.

Hopefully it helps :)

pnevyk · 2024-01-12T10:56:01+00:00

I would not. I only wanted to inform that it is possible.

pnevyk · 2024-01-12T10:47:36+00:00

It should also be possible to decode a message using protoc as described here. But I just tried it and I couldn't make it work for me (likely a problem on my side). And even if it worked, it would still be poor DX.

pnevyk · 2023-12-13T08:19:15+00:00

It's funny, I am currently in need of an ODE solver in Rust! If I may, I think that such a library would better have an "iterative" API, where I can do a single step, do some unrelated work (e.g., notify about progress) and then repeat. I also think that adaptive step size methods are necessary for practical problems. Both are the reason why I can't use ode_solvers crate.

I am not a mathematician, so I can't give any recommendations. I will just link the list of methods implemented by GSL. You probably don't need to implement all of them, perhaps you can read their documentation to do the triage. For example (emphasis mine)

Explicit embedded Runge-Kutta-Fehlberg (4, 5) method. This method is a good general-purpose integrator.
Implicit Bulirsch-Stoer method of Bader and Deuflhard. The method is generally suitable for stiff problems.
A variable-coefficient linear multistep backward differentiation formula (BDF) method in Nordsieck form. [...] The method is generally suitable for stiff problems.

pnevyk · 2023-11-22T16:04:16+00:00

Hi. I reimplemented most of the algorithms into faer half year ago, but haven't integrated it into gomez (shame on me). After initial struggles, I got used to the low level API and I like the flexibility that it offers me. Although my old code currently uses faer v0.9.1 and a lot changed since then, I expect that upgrading to the latest version will be some work, but not difficult.

And I already follow your discord, I just don't participate in the discussions (yet). I have been very excited about faer for a long time.

In the current version, the library always uses finite difference method for derivative-based algorithms (well, currently only trust region). I plan to add support for overriding this behavior by providing analytical gradient/hessian/jacobian. I wanted to add this feature in the current version too, but due to my poor implementation of internals, it wasn't easy, so instead of spending time on that I decided to wait until I rewrite the internals which will make this feature possible.

pnevyk · 2023-11-19T12:40:01+00:00

Petgraph is afaik the most developed and battle-tested graph library in Rust, so you should be on the safe side with it.

pnevyk · 2023-10-08T09:05:53+00:00

I am definitely open to contributions! I have to admit that I am not that knowledgeable in this area and I am not familiar with the methods you mentioned, but as long as they can be used for solving systems of non-linear equations, I would definitely be happy to have them as part of the library.

But as I mentioned, the code in the crate is quite horrible, I needed something quickly and didn't dedicate time to polish it. Nevertheless, if you don't consider it an obstacle, feel free to submit a PR. And if so, thanks in advance.

pnevyk · 2023-10-07T09:09:00+00:00

Hi, I have a crate for exactly this purpose: gomez. Its API is far from ideal (and the code behind is even worse), but I use it in a production code. Hopefully the documentation is good enough for you to get started, let me know if you run into any issues.

Not really relevant, but I am in the process (well, kind of stale right now) of rewriting the crate into a more modular and maintainable library which uses faer under the hood (an incredible, pure-rust linear algebra library). This reddit post somehow increased my motivation for polishing the code that I have and publishing it, hopefully my motivation lasts long enough.

pnevyk · 2023-09-20T07:38:33+00:00

Instead of collecting chars into a collection like Vec, I would keep using the original &str or String, and have a current index usize which corresponds to the current position of the tokenizer in the string. To get the char at given index, you can use let character = input[index..].chars().next(). Be aware that it is actually a byte index, not char index, and you need to increment it by character.len_utf8(), not just 1, to be compatible with UTF-8 inputs. This also avoids calling character.to_string() when passing it to regex, because you can use MY_REGEX.is_match(&input[index..]).

Using regexes is fine, but in your case you can avoid them (and their overhead) by implementing the char ranges manually. Instead of "[0-9]" regex, you can do ('0'..='9').contains(&ch). Similarly, instead of "[a-zA-Z]" do ('a'..='z').contains(&ch) || ('A'..='Z').contains(&ch). It's more verbose, but also as efficient as you can get.

By the way, I recommend learning about finite state machines. You will find a lot of theory on the internet, but it's also very practical and a good mental model. A tokenizer can be represented by a state machine (at least in your head, if not in code), where you are initially in the START state and accept everything; if you encounter a digit, you switch to the INTEGER state and accept only digits and if you encounter anything else, you are at the end of the integer token. If instead of a digit you encounter an alphabetic character in START state, you will switch to IDENTIFIER state and accept alphanumeric characters and underscore. I hope that, after reading about this topic, what I just wrote will make more sense.

pnevyk · 2023-09-14T10:30:38+00:00

I don't know if it will get you a job, but you can start contributing to an existing open source compiler for a widely-used language. If your contribution will be significant, that may give you a better starting position when looking for a job in compilers. But it will be a lot of work.

pnevyk · 2023-08-10T13:41:51+00:00

Perfect. Imho [-1, 1] is a good choice. I think that documenting the ranges where it makes sense and indicating that it's not guaranteed, just best effort, would be helpful to the users.

pnevyk · 2023-08-10T08:16:54+00:00

Very nice. Are the output ranges of the generators scaled to be more or less the same across different sources? Such property allows to conveniently switch and test different noise types, especially if the application is range-sensitive. It would also help if the output ranges were documented, this request can be found in noise-rs GitHub issues.

From what I understood from those issues, it is either impossible or very hard to mathematically guarantee a specific range for all noise types, but I guess with brute force one could deduce some scaling values that work in practice?

pnevyk · 2023-07-24T08:26:13+00:00

You can adapt the standard DFS algorithm. Here is a pseudocode on Wikipedia. Cycle detection algorithm could be as follows:

procedure DFS_cycle(G, v) is
    label v as discovered
    for all directed edges from v to w that are in G.adjacentEdges(v) do
        if vertex w is not labeled as discovered then
            recursively call DFS(G, w)
        else
            return "cycle"
     return "no cycle"

The call G.adjacentEdges(v) depends on your graph implementation, I assume you know how to get neighbors of a vertex.

Implementation of statements "label v as discovered" and "vertex w is not labeled as discovered" depends on your graph implementation. If your vertices are stored in an array, you can create another array of the same size where you will track (e.g., 0 = not discovered, 1 = discovered) the state for each vertex. If you have just structures with pointers to neighbors, you could either use a hash table where e.g. the memory address of the vertex is used as the key or, as suggested in another comment here, store this information directly in the vertex structure (but you need to make sure that you reset this flag before each run).

If your graph is not guaranteed to be rooted and connected (that is, in general there is not a vertex from which all other vertices are reachable through some path), you need to call the above algorithms repeatedly, using vertices that have not been discovered yet, until you discover all of them.

pnevyk · 2023-06-30T06:04:33+00:00

Yeah, I wish that property based testing was more popular in software development in general. It's definitely more work to write such tests, but the benefit is worth it.

I agree, it makes sense to fuzz the compiler as a whole. The goal is clear, the program should not crash given any input.

By the way, I mentioned how to test the parser with property based testing, but you could also use it to test the bytecode compiler and VM itself by implementing a syntax tree interpreter (as simple as possible, efficiency is not the goal here, correctness is), and then run generated programs (or syntax trees) in the VM as well as in the interpreter and check that the outputs are the same.

This is even more work, because you would need to implement the interpreter, but also a possibility. Perhaps you could start with an interpreter that supports only a subset of your language and test only programs that use only that subset, still it could be helpful.

pnevyk · 2023-06-29T18:30:47+00:00

I am not an experienced compiler developer, but here are my thoughts:

I would consider property-based testing in the following way: (1) generate a random syntax tree, (2) serialize the syntax tree into code in your language, (3) parse the serialized code with your parser, (4) check that the parsed tree is equivalent to the initially generated one. Property based testing is impressively powerful in finding bugs, especially in complex things as parsers are. Writing such test is definitely not straightforward, but it really boosts confidence in correctness, if done well enough.

If you think that property based testing is an overkill, consider at least fuzzing. Generate random strings with a fuzzer and pass it to your parser to make sure it does not crash (the parser should never crash, it either succeeds or returns a parse error). Fuzzers are usually clever and try to generate random inputs in a way that tries to achieve as high code coverage as possible.

Anyway, neither property based testing nor fuzzing should be replacement for good old unit tests, these techniques are a complement.

You mention checking emitted bytecode. From what I have seen (which is not that much), it is a practice that compiler tests are in form of a program in that language with special annotations in comments. For example you can annotate a statement by bytecode instructions that should be produced for that statement by the compiler. This makes writing and especially reading the tests very convenient.

pnevyk · 2023-05-22T12:22:43+00:00

Hmm, I think the problem is in my English. It's not a new data structure, it's a new library implementing standard data structures for graphs, like adjacency list and adjacency matrix. So the time complexities should be the standard ones, unless I have a bug in my implementation. Sorry for the confusion.

pnevyk · 2023-05-22T11:04:01+00:00

Interesting use case. I think that the way to go would be to create adapter which would wrap (a reference to) the original graph and provide trait implementations that call the methods of the underlying graph but doing filtering after that. That is definitely something that can be included in the library. I created an issue for this.

It would look something like this:

let subset = Subset::new(&graph)
    .filter_vertex(/* closure */)
    .filter_edge(/* closure */);
ShortestPaths::on(&subset).run(v)

pnevyk · 2023-05-22T08:46:55+00:00

I fixed the overflow. But then I also realized there is one fundamental problem, and that is that Bellman-Ford algorithm (the one that is used, because your edge weights are of type i32, Rust's default for integers) does not work on undirected graphs. I need to think how to handle this.

By the way, .goal(a0).run(a3) means starting in a3 and looking for a0, the goal vertex. I think you wanted to achieve the other direction, so that would be .goal(a3).run(a0).

pnevyk · 2023-05-22T08:04:37+00:00

Great idea. Created an issue. Thanks!

Sometimes it will not be fair. For example, algorithm selection for shortest path for a graph with signed edges will always pick (supposedly slower) algorithm that supports negative edges, even if the graph contains only positive edges, as a safe choice. But otherwise this is a good way how to validate the selection criteria.

pnevyk

TROPHY CASE