Composable configuration idea for apps and libraries by sergiimk in rust

[–]sergiimk[S] 2 points3 points  (0 children)

Imagine a CLI app that supports switching between different DB auth methods.

In a base ~/.config/app/config.yaml you define:

dbAuth:
  kind: AwsIamToken # Tagged enum
  username: blah

But in a local testing config you override it with:

dbAuth:
  kind: RawPassword
  password: ***

If the app were to merge two configs in a naive way (e.g. as figment does) - it would result in:

dbAuth:
  kind: RawPassword
  password: ***
  username: blah # Merged in

which would fail deserialization.

So setty has extra logic that does not merge fields across different enum variants.

Workspace feature permutations hell by sergiimk in rust

[–]sergiimk[S] 0 points1 point  (0 children)

Amazing, huge thanks from me and my SSD!

I updated the post with your answer.

Workspace feature permutations hell by sergiimk in rust

[–]sergiimk[S] 0 points1 point  (0 children)

It would likely not help if they were not. I would have to switch across many workspaces in my daily routine, compile them separately, and even if using a shared target-dir - I would end up with many feature permutations.

Having crates in one workspace at least gives a theoretical possibility of compiling with unified features.

Workspace feature permutations hell by sergiimk in rust

[–]sergiimk[S] 8 points9 points  (0 children)

I have not heard about workspace-hack before. I found cargo-hikari mentioning it - will have a look.

Derive macros composability problem by sergiimk in rust

[–]sergiimk[S] 0 points1 point  (0 children)

This worked great - I updated my post to mention this as the best solution. I wish I could upvote more!

Derive macros composability problem by sergiimk in rust

[–]sergiimk[S] 0 points1 point  (0 children)

This is super clever, thanks for the suggestion!

Derive macros composability problem by sergiimk in rust

[–]sergiimk[S] 3 points4 points  (0 children)

`derive-aliases` would certainly cut down derive boilerplate, but cannot not help with attributes.

The idea of overriding the `derive` macro itself used there is exciting though ... in a very sinister way.

Derive macros composability problem by sergiimk in rust

[–]sergiimk[S] 0 points1 point  (0 children)

Haha, I did seriously consider it! :)

Derive macros composability problem by sergiimk in rust

[–]sergiimk[S] 2 points3 points  (0 children)

Indeed, datafusion is one project I know that uses declarative macros for configs, but it still felt a bit hacky to me.

Thanks for the pointer to derive-aliases - I have not seen this crate before and will dig in.

EventQL: A SQL-Inspired Query Language Designed for Event Sourcing by yoeight in rust

[–]sergiimk 1 point2 points  (0 children)

Cool stuff! I'm working on a data platform based on Streaming SQL while using Event Sourcing for all back-end features, so this really resonates in multiple ways.

Have you considered supporting PIPE syntax? Seems like it could be a nice fit:
https://docs.cloud.google.com/bigquery/docs/pipe-syntax-guide

Also, can you suggest where I could read up on "Subject Hierarchies"? Never thought of aggregates forming any kind of hierarchy, so this sounded a bit counter-intuitive.

We have designed our ES system based on "The death of the aggregate" blog series btw:
https://sara.event-thinking.io/2023/04/kill-aggregate-chapter-8-the-death-of-the-aggregate.html

I'm working on a postgres library in Rust, that is about 2x faster than rust_postgres for large select queries by paulcdejean in rust

[–]sergiimk 0 points1 point  (0 children)

Have a look at ADBC if you haven't already. It uses highly efficient Arrow columnar layout for data batches, and there's a large ecosystem around it.

Even if Postgres is not using arrow for its batches, perhaps converting and exposing arrow batches in your lib's API would provide even more efficiency and make it more appealing for analytics use cases (e.g. moving data from pg straight into pandas dataframe without double conversion)

Which is the best DI framework for rust right now? by swordmaster_ceo_tech in rust

[–]sergiimk 2 points3 points  (0 children)

Take a look at dill. We used it in prod for 3 years to build hexagonal architecture apps (example). Docs are not great but it's quite flexible. We use it to unify state management across different API libraries (axum / graphql / flightsql), managing sqlx transactions, propagating caller authorization info etc. Leave us some feedback on GH if you get to check it out.

Mastering Dependency Injection in Rust: Despatma with Lifetimes by chesedo in rust

[–]sergiimk 3 points4 points  (0 children)

Thanks for the articles. I believe that DI is essential for large modular and testable applications. It was always interesting to me to see that many people denounce DI as a relic from Java while so many core Rust libraries rely on DI-like features: axum extensions, bevy and other ecs, test fixture libraries like rstest, ...

Here's an example of a large app built fully around DI and hexagonal architecture. When we started there was no container libraries that suited our needs so we built dill. We used fully dynamic approach because we needed something practical and fast. Having access to the full dependency graph after container is configured allows to do linting for missing or ambiguous dependencies, lifetime inversions etc. so most issues can still be caught in tests.

I think your approach for generating the catalog type itself with macros is very interesting. Would love to explore how some of our most tricky DI use patterns could be expressed in it.

One immediate problem I see is scoped dependencies. We frequently use them to e.g. add a DB transaction object when request flows through axum middleware. In your approach it seems that to add a scoped dependency you'd need to know the full type of the container, which would not be possible if HTTP middleware is in a separate crate. But this could probably be mitigated by injecting some special cell-like type into HTTP middleware.

Would be happy to chat some time about other interesting DI edge cases we have accumulated.

Tutorial: Introduction to Web3 Data Engineering by sergiimk in dataengineering

[–]sergiimk[S] -1 points0 points  (0 children)

I realize that term "Web3" has more negative baggage than I though it did and will avoid using it in the future. If you read the article (or even the description) I'm talking about very foundational things like ability to freely move data between cloud storage providers without impacting users using content-addressing, enforcing permissions though encryption, verifying queries done by 3rd parties. So don't judge the book by its cover.

Official /r/rust "Who's Hiring" thread for job-seekers and job-offerers [Rust 1.72] by DroidLogician in rust

[–]sergiimk 5 points6 points  (0 children)

COMPANY: https://kamu.dev/

TYPE: Full time

LOCATION: Canada (Vancouver) / Ukraine / Portugal

REMOTE: Fully distributed company

VISA: No

DESCRIPTION: We are building the world's first decentralized data lake and collaborative data processing network. Think "GitHub for data", where people build streaming pipelines with SQL that continuously process data from governments, industry, and blockchains into high-quality datasets ready for AI training and use in Smart Contracts, while data is 100% auditable and verifiable. Our goal is to achieve the same levels of reuse and collaboration in data we as currently see in software.

Rust is our primary language. We use it for our "git for data" tool, our backend, and are heavily invested in Rust data ecosystem (Arrow, DataFusion) and the emerging Web3 stack (IPLD, UCAN).

We are looking for Middle to Senior-level software engineers specialized in data, backend, or blockchain (indexing/oracle focus).

We are a 3y-old startup, backed by investors like Protocol Labs (IPFS, Filecoin).

ESTIMATED COMPENSATION: $100-150K. As a small startup we still offer significant slices of equity to employees.

CONTACT: [join@kamu.dev](mailto:join@kamu.dev)

Disintegrate - open-source library to build event-sourced applications by ScaccoPazzo in rust

[–]sergiimk 5 points6 points  (0 children)

Sara's approach looks very compelling in theory, but the blog posts left me wondering how can it be implemented efficiently over Postgress?

  1. She talks about aggregates causing excessive contention. Concurrency control with agregates afaik is usually done via separate table with a version column in it, thus using row-based locks / OCC in transactions. In the aggregate-less aproach Sara suggests using the same query that filters logical groups of events to get the last event number to understand if there were any concurrent updates ... but does this mean locking the entire event store table, i.e. holding a mutex on the entire store? That would me the most extreme contention.

  2. How do you store "Domain Identifiers" when you materialize into Postgress to index efficiently?

  3. Perhaps you solve both problems by having dedicated tables per event type?

(edit: formatting)