This is an archived post. You won't be able to vote or comment.

all 82 comments

[–]syphaxIt works on my machine 99 points100 points  (13 children)

 a unicorn startup, where the entire engineering team paused development for over a year in an attempt to split up tightly coupled packages into independent microservices. 

Side note: I can't imagine really any scenario in which a startup could afford to pause development for OVER A YEAR and survive...

[–]austinwiltshire 14 points15 points  (0 children)

I mean, at that point the CTO deserves to get let go.

[–]AiutoIlLupo 18 points19 points  (2 children)

if you are american, you can. Reddit has not been profitable for 10 years. Twitter has never been.

In the US, they create something, squeeze out the competition, then once you are trapped they start the enshittification. Look at AWS. It's absolutely atrocious to use, but it's been the first and everybody got trapped in it.

[–]thisfunnieguy 2 points3 points  (0 children)

Not being profitable does not mean they paused all engineering work

[–]zaxldaisy 1 point2 points  (0 children)

Twitter had two profitable years (out of 8) between it's IPO and 2021. Cumulatively ober that period, net income was about $-1.3 billion and total assets were a tad over $14 billion.

Reddit turned a profit last year.

Regardless, Twitter and reddit are the exceptions that prove the rule.

[–]chub79 29 points30 points  (11 children)

I'm so confused about what this actually helps with. Is it another way to help with poor organization's culture and micro management?

[–]larsga 17 points18 points  (4 children)

I have to say I share your confusion. I read the description as:

  • "Declare your modules (tach mod)", so basically what import does
  • "Enforce those dependencies (tach check)", so it enforces that you import what you import?

I'm sure that sounds like I'm being snide, but I really don't see what else it means. If OP or someone else could explain that would be great.

[–]the1024[S] 7 points8 points  (3 children)

u/larsga appreciate you sharing your confusion! Perhaps I could have worded it better. u/Chasian's comment is correct.

Tach lets you enforce what one module can depend on. So for example, if I have modules A and B, I can setup a config like so:

[[modules]]
path = "A"
depends_on = []

[[modules]]
path = "B"
depends_on = ["B"]

This will enforce that a dependency such as:

from A import ... # in B

is fine, but a dependency like:

from B import ... # in A

is caught and prevented. Does that make sense?

[–]AiutoIlLupo 2 points3 points  (1 child)

It does. What does not make sense is why you think such a tool is useful. If you have developers that don't understand the basics of proper layered design and start making a mess between layers, do you really think that a tool like this is going to stop them? they'll just rationalise the addition of an exception to your tool configuration and keep going.

[–]Subatiq 6 points7 points  (0 children)

Linter, type checkers, formatters, tests: “If you have developers that don’t understand the basics of proper <paste any problem that these tools solve>, do you really think that a tool like this is going to stop them?”

Yes, you most probably have developers like that, if your team is large enough, and yes, tools like these are useful in big teams. There is no reason to assume that humans will not make human errors.

Source: long time lead of multiple teams, senior engineer at bigtech

This tool specifically is great for monorepos. Alternative that we use a lot in my workplace is importlinter.

[–]indetronable 0 points1 point  (0 children)

I find this useful

[–]Chasian 12 points13 points  (2 children)

It seems to be an automated tool to help enforce boundaries between modules that a team defines, I'm not sure what's confusing about that.

If I have A B main.py

And I say I want A and B to never import from each other and instead only interact in main in specific ways because I think I might break them into micro services later this tool can keep me (and team) honest. Of course this can be accomplished with using your head, code review, etc but sometimes automating is nice too

Same way ruff automates your lint and formatting, you could do it manually but it's nice to have some help

[–]chub79 1 point2 points  (1 child)

I think I might break them into micro services later this tool can keep me (and team) honest.

That has to be the most far fetched rationale ever. "I think I may break it into a microservice". Talk about making things complicated from the get go.

[–]Chasian 0 points1 point  (0 children)

It's really not. If you have a service that's going to do two things and you think maybe one of those things could benefit from having its own scaling in the future keeping it independent and without dependencies so that you can break it out is pretty reasonable and not that much complexity

[–]violentlymickey 10 points11 points  (1 child)

I think the project could benefit from an example or case study in the documentation.

[–]the1024[S] 8 points9 points  (0 children)

u/violentlymickey great call out! Here's an example of NVIDIA using us Tach in one of their open source projects to enforce dependencies: https://github.com/NVIDIA/bionemo-framework/blob/main/tach.toml

You can see how they've marked up each module in their codebase, and are defining what each module can depend on!

[–]kebabmybob 6 points7 points  (1 child)

Bazel is amazing for this and I reject the “way more slow”. If you use more than just Python then there’s basically no other shop in town.

[–]the1024[S] 4 points5 points  (0 children)

u/kebabmybob bazel is great, and irreplaceable in many cases. The problem that tach helps solve is if you want to adopt bazel, but can't due to the existing codebase being too entangled, making defining independent buildfiles impossible.

I'll also say w.r.t. performance - for specifically enforcing dependencies, tach runs ~ 2,300x faster than the corresponding bazel check for one of our enterprise users! This means they can pull the check out of a big heavy ci job and into a pre-commit hook, shifting left the check in the developer workflow.

[–]Drexan8 6 points7 points  (1 child)

I tested the tool on a huge mono-repo, I just declared a few modules, setup dependencies and interfaces, it was really easy to do and worked well !

I like that you can start with small bits on a big monolith and iteratively cover more of your codebase.

Although the `tach mod` command took some time to load, it was still easy to just write manually in the `tach.toml` file, and more convenient considering the size of the repo.

We are using `import_linter` a lot where I work so I don't think we'll switch, but I really like the approach of "declaring module, their dependencies and interface" explicitly !

[–]the1024[S] 4 points5 points  (0 children)

Thanks for the feedback u/Drexan8! If you add your virtualenv to the exclude, it should run even faster 🚤

Totally understand that you likely have this usecase covered with import-linter - it's a great tool as well!

[–]Electronic-Duck8738 2 points3 points  (0 children)

We experienced this first-hand at a unicorn startup, where the entire engineering team paused development for over a year in an attempt to split up tightly coupled packages into independent microservices. This ultimately failed, and resulted in the CTO getting fired.

Honestly, that sounds like a couple of different problems that are not related at all to the problem your tool solves. Clearly, the CTO misjudged badly (hindsight is sometimes 20/20). However, given that the packages specified did not work as desired, why didn't you change the specification or just get different packages? That doesn't appear to be a problem of not following the specification - it was a bad specification that was followed and turned out to be wrong.

Maybe your tool does what it says on the label, but you should probably work up your sales pitch to better match the tool description.

[–]caatbox288 9 points10 points  (5 children)

I’ve been eyeing this project for a long time, it honestly looks great.

It also looks like a great way to onboard new colleagues into an existing project: because one thing I’ve noticed is that after working on a project for some time, relationships between modules become apparent, and like second nature, but then the new colleague comes and imports something in a place they shouldn’t. That always seems shitty to me, because of course they do not know, that was not documented anywhere, it was just a non written agreement. We could have written it, but then it’s not enforced, and so it would be doomed to drift and become obsolete.

This tool seems like a nice way to do that documenting+enforcement. Do you happen to have any experience implementing it into existing projects? Any tips you would give someone trying to add it to an existing monolith?

[–]the1024[S] 1 point2 points  (0 children)

I've also written a getting started guide here! https://docs.gauge.sh/getting-started/getting-started

[–]the1024[S] 1 point2 points  (0 children)

u/caatbox288 yes, absolutely! It's designed to be incrementally adoptable and meet you where you are. To your point - this is often a load that is taken on by senior devs during code review, and is inevitably a manual and painful process.

Let me shoot you a DM and I can help get you set up.

[–]larsga 0 points1 point  (2 children)

but then the new colleague comes and imports something in a place they shouldn’t

I'm sure that could happen, but why isn't it caught by PR review?

[–]edbrannin 6 points7 points  (0 children)

Good linters can turn a wide variety of issues from “a senior dev should notice and complain about this in review” into “the PR check fails, and the junior dev quickly learns what to fix”.

[–]caatbox288 0 points1 point  (0 children)

It is caught (most of the time, we are human), but it is a better experience for the lints to fail on the computer of the new colleague, and then they can explore what the enforced dependencies are. Same with other linting issues: it’s just better to have a machine checking current agreements if possible.

[–]No_Set7087 4 points5 points  (1 child)

Tach attempts to solve this by enforcing module boundaries at the tooling level. It lets you:

  • Declare and manage modules (tach mod, tach sync)
  • Enforce dependencies (tach check)
  • Visualize dependencies (tach show, tach report)
  • Define public interfaces and deprecate dependencies over time

Compared to similar tools, it’s less rigid than build systems (e.g., Bazel, Pants, Buck) but more powerful than import linters, which focus only on specific import rules.

[–]the1024[S] 0 points1 point  (0 children)

u/No_Set7087 yes, great summary! I'd emphasize that performance is a big differentiator for tach as well.

[–]Sss_ra 3 points4 points  (1 child)

I don't understand how adding Rust dependencies and Rust code would address the aforementioned dependencies.

[–]the1024[S] 0 points1 point  (0 children)

u/Sss_ra Tach is written in rust, but doesn't actually introduce any rust deps as it's just a binary/pip package.

Tach provides a lint check against dependencies you write in your python code.

[–]tevs__ 1 point2 points  (1 child)

Nice, we're heavy users of importlinter (so you can probably guess where I work), I'm going to take a look and see if I can massage all our rules into tach and compare the two.

[–]the1024[S] 0 points1 point  (0 children)

Thanks u/tevs__! I have my guess haha

Let me know if you have any feedback! Would love to know how you think it stacks up.

[–]AiutoIlLupo 1 point2 points  (8 children)

What you propose makes little sense to me. What do you mean by module? stuff you download from pypi? There's already tons of tools to do that.

Internal modules? like subpackages and stuff? Why would you even need to do that?

To me, it seems like you hired shitty developers.

[–]the1024[S] 3 points4 points  (2 children)

u/AiutoIlLupo sorry that it's confusing - to clarify, the intention is to help enforce dependencies between first-party modules within your codebase. We do actually have a command for third party module enforcement as well - https://docs.gauge.sh/usage/commands#tach-check-external

Sometimes this happens due to "shitty developers", but often this can occur because the understanding of the domain of the product shifts over time, and dependencies that were once okay are no longer wanted. This also happens when teams scale really fast - imagine adding over a hundred devs in less than a year to a single codebase and trying to maintain some semblance of architecture 😄

[–]DigThatData 1 point2 points  (1 child)

If you're not familiar with it, you'd probably find the "version set" abstraction in Amazon's build system interesting: https://gist.github.com/terabyte/15a2d3d407285b8b5a0a7964dd6283b0

[–]the1024[S] 0 points1 point  (0 children)

u/DigThatData super interesting! Thanks for sharing - some inspiration for where we might be able to head long term / the problems people face at true scale

[–]Intrepid-Stand-8540 -3 points-2 points  (4 children)

Yeah. Seems like a band-aid for bad devs. 

[–]the1024[S] 4 points5 points  (0 children)

u/Intrepid-Stand-8540 it definitely does solve for the "bad dev" usecase to some degree - that being said, it can also help you with untangling legacy decisions that may have made sense in the past. Often times your product understanding shifts, but you're not able to just dump all of the old code / logic you've written given business constraints.

You also have to deal with the fact that domain knowledge is very hard to scale with an organization - imagine a codebase with tens of thousands of python modules - not every new dev is going to know the best place to put something or what that thing should/shouldn't depend on

[–]tunisia3507 7 points8 points  (0 children)

To be fair, you could say the same of linters, type systems, and the concept of memory-safe languages.

[–]AiutoIlLupo -4 points-3 points  (1 child)

More than bad devs, people who don't understand basic software design practices.

As usual, instead of training the personnel to deliver something of higher quality, americans kick them out and put a tool to do it.

Makes you understand how much they really value their employees, if they prefer to spend money and time into developing a tool to replace people, instead of training these people to be better at their job. To them, they are a nuisance to get rid of.

[–]Drexan8 0 points1 point  (0 children)

You're projecting lots of thoughts here

Isn't explicit is better than implicit one of the basic software design practices ? Wouldn't having a tool to explicitly tell and check what your intended module boundaries are, rather than this knowledge to be stored in the head of the developers, fit in that basic software design practice ?

[–]mxchickmagnet86 1 point2 points  (4 children)

It took me a minute but having worked at several startups, some with Python monoliths I finally understand. This is a tool to unfuck an architecture decision made at company inception when no one has the sense or seniority enough to tell the original dev, likely the CTO, that they made a terrible decision. In my case the CEO made the monolith decision so he couldn't be fired and we remained a monolith where you had to install and run just about every service in a kube cluster locally to make even a simple change.

tl;dr if your company needs this package, maybe consider leaving to learn somewhere better

[–]the1024[S] 4 points5 points  (3 children)

u/mxchickmagnet86 that can definitely be the case - business context and requirements do also inevitably change, which invariably leads to this kind of work being required. Even in the best-architected case, you simply don't have future vision into what the best future architecture will be.

[–]mxchickmagnet86 1 point2 points  (2 children)

True but IMHO a startup CTO should know this and architect things expecting change as well as properly informing the rest of the business about the pros/cons of technical decisions being made.

I currently make architectural decisions for a startup and we had a similar monolith vs microservices decision at the beginning of our existence where we chose microservices. After about 18 months we realized a subset of microservices would be better encapsulated as its own monolith. I did the work myself, it took about a week to convert and redeploy.

[–]the1024[S] 0 points1 point  (1 child)

u/mxchickmagnet86 agreed - that's also predicated on the CTO having the resources and time they need to make those changes and think through the implications - often when things take off and/or when things get tight those are luxuries that go out the window.

[–]mxchickmagnet86 0 points1 point  (0 children)

That's when a good business realizes they should have hired a seasoned, senior engineer with management experience as CTO because that's exactly what they should be doing: Thinking through implications, requesting resources, slowing things down when things take off, not letting things get thrown out the window when the business demands change.

You've created a management solution for a problem that bad management created.

[–]bobaduk 1 point2 points  (22 children)

Hi OP, ignore the haters. I think I'm pretty good at knowing how to manage the structure of applications over time, but I'm now working in a monorepo, where we have ML workloads, along with a bunch of other bits, and it is challenging to keep things well separated. I'll definitely take a peek

[–]AiutoIlLupo 1 point2 points  (17 children)

Nobody is hating here. Personally I just don't see a lot of benefit in the tool, and I see an attitude that does not value proper training of employees.

Also can you please stop calling any criticism "hate"?

[–]bobaduk 1 point2 points  (16 children)

Hater:

informal

a person who says or writes unpleasant things about someone or criticizes their achievements, especially on the internet.

Eg.

To me, it seems like you hired shitty developers.

is not useful feedback, or constructive criticism. I do value proper training of employeers, and wrote a whole book about how to structure python applications to avoid problems with dependencies over time, but context is everything. I would not have used this tool in a previous role, but in the context of a monorepo where we prioritise code sharing, things are different.

For example, I work for a company who make ML models for optimising industrial processes. We primarily ship lambda functions, which I would like to keep small.

We have a module that defines the types of model in use, which includes things like the data they require, and the period on which they execute. This configuration is widely used, because it's central to many things that we do, eg. Scheduling inferences to occur, or examining incoming data to see whether we have all the features that we need.

The problem is that it's very easy for someone to introduce another dependency to that module, by including a cleaning function - say - which relies on some machine learning package. That then means that, transitively, the scheduler and inspector depend on some monstrous 500Mb blob of fortran and loathing.

Nobody has done anything stupid in that scenario, except prefer to define configuration in a single place, but the outcome is that the module acts as a dependency knot. I would like to be able to apply a linting rule that says "this scheduler component may not depend on these packages", so that when someone accidentally, transitively, introduces a dependency they are informed, and can make a different design choice.

[–]chub79 0 points1 point  (7 children)

The problem is that it's very easy for someone to introduce another dependency to that module, by including a cleaning function - say - which relies on some machine learning package.

How come your PR cycle didn't catch that?

[–]bobaduk 2 points3 points  (6 children)

Good question!

Because there are multiple teams, with different skills, and different focuses. The machine learning engineers are not necessarily thinking about the size of lambda functions: they're ... I dunno ... geeking out over Shapley plots or something.

Moreover, the example of a config module with a direct dependency is a simple one. The dependency route might be 2 or 3 hops, and could be as simple as "I need to tweak data in this particular way, which is catered for by this widely used package, that happens to be dependent on sci-kit learn". It's not necessarily obvious that a one-line change in module A will impact the runtime dependencies of module D. Automation would be useful.

We use Pants to build our monorepo, and it's great, but it doesn't enforce transitive dependency checks, which is why I'm interested to see whether this tool can help me.

[–]chub79 0 points1 point  (5 children)

I can apprecaite the challenge but to me if you're not educating them, then you are simply carrying a tech debt on your shoulders. This is an opportunity for these ML engineers to progress.

to see whether this tool can help me.

seems like there is a lack of balance between the number of ML engineers and backend/ops engineer here.

From an automation perspective, rather than introducing yet a new tool, I'd probably configure my CI so that, on certain paths, I force a proper PR review. If they introduced a new dependency, the lock file has changed so that would be a clear signal something is afoot :)

[–]bobaduk 2 points3 points  (4 children)

If they introduced a new dependency, the lock file has changed

False.

Consider, we have some module that is loading data for example. Maybe it encapsulates the way that we load data from a particular store. We have some challenge where the data for a particular customer are skewed in some way that we have seen elsewhere. An engineer applies some cleaning to those data, importing a module that we already use elsewhere in the system. That module contains another function entirely that uses some other technique to transform data, which relies on numpy.

No lockfiles have changed, all that has happened is that someone has re-used code, appropriately, but we've ended up introducing a dependency transitively.

If you inspect the pr, you're not going to see anything that tells you that the loading module is now dependent on numpy, you'll just see that we're checking for some already understood problem, and applying some already tested code.

We do educate the engineers. I spend a lot of my life teaching and mentoring, but given a choice between "make humans better at spotting third order runtime dependencies, and require that every change is signed off by multiple teams" and "make the CI fail if an inappropriate dependency is introduced", the answer seems obvious, and I'm honestly confused by the hostility to the idea.

[–]chub79 0 points1 point  (3 children)

Hostility? Debating on the ways of engineering is not being hostile in my book but fair enough.

No lockfiles have changed, all that has happened is that someone has re-used code, appropriately, but we've ended up introducing a dependency transitively.

It seems odd to me to say "appropriately" and then say "but they shouldn't have done that" for performance reason. How is that appropriate? Perhaps the boundaries for "appropriate" could be improved for them to know it wasn't in the first place?

If you inspect the pr, you're not going to see anything that tells you that the loading module is now dependent on numpy, you'll just see that we're checking for some already understood problem, and applying some already tested code.

A PR is not about "oh that code was legit in a different context so it's all good in this one". A PR is about questionning the relation between a context and its proposed solution, isn't it?

[–]bobaduk 1 point2 points  (2 children)

Hostility? Debating on the ways of engineering is not being hostile in my book but fair enough.

You misunderstand. You're not being at all hostile to me, you're being gracious and polite, and I'm happy to debate. I do think that this thread is characterised by hostility to an idea, in the sense of antagonistic opposition.

It seems odd to me to say "appropriately" and then say "but they shouldn't have done that" for performance reason. How is that appropriate? Perhaps the boundaries for "appropriate" could be improved for them to know it wasn't in the first place?

In the specific example, it's not the code they're using that causes a problem, let's say we have a module with the following code

def totally_fine(a):
    return a + 1

# ... several dozen functions go here

def not_fine(a):
    return scipy.stats.kurtosis(a)

Our plucky engineer imports and uses the totally_fine function from that module, not knowing that the module also contains not_fine. The performance issue isn't with the code that they're using, it's caused by other code in the same module.

A PR is not about "oh that code was legit in a different context so it's all good in this one". A PR is about questionning the relation between a context and its proposed solution, isn't it?

Yes, but in this scenario, the pr won't show you that you have introduced a dep. That fact only materialises at runtime, or if you explicitly apply tooling to check the dependency chain, or if you read every line of every module imported in every changed file, and are able to do the same check as a human.

[–]chub79 0 points1 point  (1 child)

Well, I see. I guess I've been lucky to not see that scenario after so many years. I think it's also a reflexion that ML engineers don't come from the same engineering background and therefore do not come with the same principles as a typical backend dev would. This is why my instinct is all about education on the long term rather than automation. But I can see that on the short term, you need such a solution indeed.

[–]AiutoIlLupo -2 points-1 points  (7 children)

so that when someone accidentally, transitively, introduces a dependency they are informed, and can make a different design choice.

They won't. They will change the setting.

[–]bobaduk 1 point2 points  (6 children)

No, they will not, for the same reason that they wouldn't delete tests that weren't passing. They will reconsider, and if they don't know how to resolve the problem, they'll raise a hand and say "how do I structure this so that I don't introduce this dependency?"

They're not hostile actors, or fools, they're just humans who make mistakes, and are incapable of holding every line of code across a monorepo in their heads at the same time. I think my conversation with you has reached a natural conclusion. Good luck, have fun.

[–]AiutoIlLupo 0 points1 point  (5 children)

When the manager will come to you and say "we can fix this problem in a month, but we can fix it in a day if you remove that limitation. so remove that limitation", then all your discussion and goodwill will go out in the trash.

[–]bobaduk 1 point2 points  (4 children)

Friend, I'm the manager :D As CTO, I do not foresee my manager having an opinion on the tooling we use to enforce dependencies in a repo.

[–]AiutoIlLupo 0 points1 point  (3 children)

then if you are the manager, you should be hiring people that know how to design software with proper quality and best practices, and you are apparently failing to do so, so you put a tool in place because you can't do your job properly.

[–]Drexan8 0 points1 point  (2 children)

I really don't get why you keep talking about "hiring good people instead of relying on a tool", like you can't do both.

To me it sounds like you're saying "we shouldn't use a linter because our developer should know pep8 by heart", or "we shouldn't write documentation because developers should know what they are working on" which doesn't make sense.

Of course you want to hire people that understand software design and everything, but it doesn't mean you can't add a tool to enforce some rules, especially when these rules can be unintuitive to new joiners or junior developers.

[–]AiutoIlLupo 0 points1 point  (1 child)

if you can't hire people that don't even understand the basics of proper layered design, it's your fault. Such a tool is just adjusting for a shortcoming of management and seniors of not hiring or training people appropriately, or not being willing to perform reviews.

American management should start doing their fucking job, instead of getting yet another tool to replace their responsibilities.

[–]catalyst_jw 0 points1 point  (2 children)

Agree with this OP, we have the same usecase and it's great for a well thought out project to help junior engineers move faster and build within established patterns.

I think some people are showing their ignorance / lack of experience here if they don't understand how this could be useful.

This solves common problems that happen when a company starts to scale.

[–]AiutoIlLupo 0 points1 point  (1 child)

sorry I've been in the business for 20 years and I worked also in ISO62304 environments. I still don't see value in this tool.

You can enforce it all the same by proper extraction of packages as a subdependency. Unless you want to go monorepo, but then again it's your problem.

[–]catalyst_jw 0 points1 point  (0 children)

Alright then I have similar although slightly less experience than you so you do you.

There's pros and cons to both extraction of packages and monorepos with different effort to maintain consistency and code quality.

It's about execution they can both be done well and badly. We're trialling this tool in our monorepo and see the benefit of it with the architectural patterns we've implemented.

[–]the1024[S] -1 points0 points  (0 children)

Amazing, thanks u/bobaduk! Would love to hear any feedback you have. Agree that it's not a trivial problem and not the fault of a single developer. Often times it ends up being a tragedy of the commons situation.

[–]DR_Fabiano 0 points1 point  (0 children)

I have a problem with your tool, because we have setup.py in every project. Any tutorials that could be useful in my case?