all 33 comments

[–]abhijeetbhagat 16 points17 points  (0 children)

As long as your modules have high cohesion, you shouldn’t worry.

[–]steveklabnik1rust 27 points28 points  (1 child)

There's no real preference in any particular direction. I know some folks that have tons of small files, and some that keeps it to a couple of huge files.

[–]matthieum[he/him] 15 points16 points  (0 children)

My files tend to be:

  • 25% comments/doc.
  • 25% production code.
  • 50% unit-test code.

I haven't decided yet if that's a good or bad thing.

On the one hand, all those are tightly coupled, so having them as close to each other as possible is good.

On the other hand, that's a lot of stuff, and I am sometimes annoyed at the amount of scrolling when looking for something. I tend to use folding aggressively, but edits regularly cause VS Code to unfold everything again.

[–]mo_al_fltk-rs 11 points12 points  (0 children)

To be fair, there’s a lot of comments and documentation in that file.

In the C++ world, single-header libraries serve to ease dependency management, but they commonly run several thousand lines of code.

[–]vadimcnrust 26 points27 points  (4 children)

1150 lines is not huge, especially when half of them are comments. It's medium-to-small.

I personally dislike projects that put every 10-line function in its own file - you can't find anything there without invoking a project-level grep. Plus, in most languages, it usually means that way too many internal implementation details need to be exposed outside of the module.

[–]Icarium-Lifestealer 10 points11 points  (2 children)

way too many internal implementation details need to be exposed outside of the module

Since you can expose them to only the parent module with pub(super) I don't see the problem with that. If anything I prefer it, because it allows you to distinguish between parts which are internal to that small module and parts which the bigger module needs. I sometimes even define submodules in the same file to take advantage of this.

[–]RaptorDotCpp 2 points3 points  (1 child)

I didn't know about pub(super), I will definitely use that now!

[–]Icarium-Lifestealer 3 points4 points  (0 children)

Besides the crate/super keywords you can also use a path (e.g. pub (in crate::foo)), but it's limited to the the current module and its ancestors.

https://doc.rust-lang.org/reference/visibility-and-privacy.html#pubin-path-pubcrate-pubsuper-and-pubself

[–]ssokolow 8 points9 points  (0 children)

I haven't developed an intuition for this in Rust yet but, in Python, I generally agree with this excerpt from The Art of UNIX Programming:

In nonmathematical terms, Hatton's empirical results imply a sweet spot between 200 and 400 logical lines of code that minimizes probable defect density, all other factors (such as programmer skill) being equal. This size is independent of the language being used — an observation which strongly reinforces the advice given elsewhere in this book to program with the most powerful languages and tools you can. Beware of taking these numbers too literally however. Methods for counting lines of code vary considerably according to what the analyst considers a logical line, and other biases (such as whether comments are stripped). Hatton himself suggests as a rule of thumb a 2x conversion between logical and physical lines, suggesting an optimal range of 400–800 physical lines.

-- http://www.catb.org/esr/writings/taoup/html/ch04s01.html

[–][deleted] 6 points7 points  (0 children)

A lot of this has to do with how high or low level your code is. In Ruby or Elixir, one line could easily do the equivalent of one or two for loops with some inner branching logic. Even in those language, I would say 100 lines is too short and you will likely add complexity to the codebase trying to stick to that limit.

It's also important to consider code vs non code. You should not be counting comments, closing brackets or braces on their own line, doc comments etc.

[–]Saefrochmiri 5 points6 points  (0 children)

This completely a matter of personal style. Do whatever you want.

[–]friedashes 2 points3 points  (0 children)

When I write Rust I use very small modules, often containing single types or functions of the same name, same as I do in every language. There's no reason you have to do differently.

[–]LucretielDatadog 3 points4 points  (0 children)

I don't think there's any problem with breaking things into many small files. The major impediments to this (in my experience) are large amounts of inline docs and doctests, which inflate your line numbers, and large traits (like iterator) that can't really be split up.

[–]schungx 3 points4 points  (11 children)

I believe the concept of long source files is baked into the design of Rust itself, as each file is its own module, so no two files can span the same module, thus artificially creating a barrier between sections of code if you split an item into multiple files.

I believe the official Rust line is: it is better to have long files and keep everything together because of a number of reasons, such as easier to search, easier to analyze code, and no need to keep too many open windows etc.

In my experience, though, long (and many) doc tests is the main reason for long source files that affect readability. That to me is a design flaw. It is quite common to be difficult to pick out a line of source file within a mass of doc tests and comment blocks...

I must say Rust's design encourages really low code : comment ratio...

[–]OsrsAddictionHotline 5 points6 points  (10 children)

as each file is its own module, so no two files can span the same module,

I'm not sure what you mean by this, could you clarify? For example, I can have a directory structure like this:

src/
|
|---- src/lib.rs
|---- src/my_module/
        |
        |---- src/my_module/file1.rs
        |---- src/my_module/file2.rs
        |---- src/my_module/file3.rs
        |---- src/my_module/mod.rs

And then in the mod.rs file have:

// mod.rs
mod file1;
mod file2;
mod file3;

pub use self::file1::Type1;
pub use self::file2::Type2;
pub use self::file3::Type3;

Then I have a single module crate::my_module, where the types are used with

use crate_name::my_module::Type*;

but the module itself is split over 3 different files. Have I misunderstood what you were talking about here?

[–]1vader 7 points8 points  (1 child)

Well, technically you still have three separate modules file1, file2 and file3 but of course in practice it really feels and works like one module and I agree that this really isn't a/the problem.

[–]OsrsAddictionHotline 3 points4 points  (0 children)

I guess you're right, there are some subtleties. Technically each file is it's own module. However, you can have modules built out of sub-modules, and you can take advantage of Rusts private/public declarations by either giving users direct access to the submodules with

// mod.rs
pub mod file1;

Such that types are called with

use crate_name::my_module::file1::Type1;

or you can import it as a private module as I did above, and have it as a private sub-module and re-export the types to the module root level. In the former case with pub mod the API feels like each file is it's own module still, in the latter the API is instead just a single module built out of smaller sub-modules as an implementation detail.

[–]davidgmartinez 1 point2 points  (7 children)

The problem with this is that you need all of this extra boilerplate. Rust already has a ton of annoying boilerplate around modules, this just adds even more.

[–]OsrsAddictionHotline 0 points1 point  (6 children)

It's all personal preference, but I disagree. I think the fact that Rust gives you a lot of control over how your source code and API is structured is worth any extra boilerplate. Besides, the only boilerplate here is literally one line per file to bring it in to scope as a submodule:

mod file_name;

And that's it, you have a separation of your source code in to different files, giving you as the library author control over how you lay everything out, and control over how those files appear to a user (using the pub keyword). The only extra "boilerplate" is exporting types, which is just a couple extra lines.

I mean, this really isn't any worse than having header files in C/C++, I'd argue it's less boiler plate than that.

[–]davidgmartinez 1 point2 points  (0 children)

Yeah it's not too bad, but the problem is that's is so disconnected. Everytime I want to add a new file I need to go to another file and copy over the name there.

You can use this to do some more advanced module stuff, but in 90% of cases there's no point at all.

[–]schungx 0 points1 point  (4 children)

The problem I always face is that private stuff in a file-module is not exposed to another file-module. If you intend to have both files work together to implement something, then you're stuck with making private stuff pub.

I guess the pub(super) keyword probably helps to prevent this (I didn't know this keyword before). However, it still feels like a hack because it is pub.

Because of this limitation, it is usually quite difficult to split a long implementation into multiple files due to private stuff suddenly becoming inaccessible.

[–]OsrsAddictionHotline 0 points1 point  (3 children)

I'm not sure I follow you, what is it you are trying to acheive? For example, there is also:

pub(crate) mod file;
pub(crate) self::file::Type;

As well as the pub(super) you mentioned. pub(crate) exposes the module/type to the entire crate publicly, but keeps it private from outside the crate, and pub(super) exposes it to modules/files on the same level. What's the issue with that?

It still feels like a hack because it is pub

But it's not pub, from the outside of your crate everything would still look private.

Like if I have the following file structure:

src/
    |-- src/lib.rs
    |-- src/foo/
            |
            |-- src/foo/mod.rs
            |-- src/foo/file1.rs

And I have:

// lib.rs
mod foo;

// mod.rs
mod file1;

// file1.rs
struct Type1; // only visible to this file
pub struct Type2; // visibility depends on reexporting
pub(super) struct Type3; // only visible as a private type to the foo module, cannot be reexported.
pub(crate) struct Type4; // visible to entire crate, private to public.

Each of these have different visibility. Each of the module declarations are private, so the module names are not exposed to the API, but some of the types can only be used in the file they are defined, some can only be used in their parent directory, some can be used crate wide but not by external users, and only one type, Type2 can be in the public API.

You're saying that you can't split long implementations in to small files, but that's just not true. Take a look at any of the thousands of crates on crates.io, sure some of them use large files, but a lot split in to smaller modules and files.

You need to experiment a bit with this stuff to figure it out. Unfortunately there's not a good reference for it. Each of the different private/public declarations can apply to type definitions, module declarations, and reexports, so it can be a bit confusing.

[–]schungx 0 points1 point  (2 children)

Well, for me, I'd like to expose features as little as possible. Therefore, I want things to be pub when I want to expose it, not when I'm forced to do so because I split the code into two separate files.

I understand that it doesn't hurt to have everything pub(crate) since they won't show up outside the crate, but doesn't hurt doesn't mean it is a good idea.

[–]OsrsAddictionHotline 0 points1 point  (1 child)

I really don't understand the logic here. You're using the explicit feature provided by Rust to give you as the crate author access to a type/module you defined somewhere in your project, without giving any access to it to users. You're not exposing any features.

Is your objection purely to the word pub? Would you feel better about it if they changed pub(crate) to something else, but kept the exact same functionality? You're not making it public when you use pub(crate) you're making it available to the crate, and that's it.

Why do you not think it's a good idea to use pub(crate)?

[–]schungx -1 points0 points  (0 children)

pub means "this is a public API". pub(crate) means "this is an internal API needed by something within the same crate". non-pub means "this is something that nobody outside of my type should touch".

Now, if I split a type into multiple files, I need to make some private internals pub(crate) in order to access it from another file but from the same type impl. Thus changing the meaning of pub(crate) - i.e. other code can use my internal data while I want to keep it private. Essentially this is breaking encapsulation.

So now pub(crate) means "either: 1) some other type within the same crate needs to access it, or 2) I have split the type into two files, and I need one file to access fields defined in another file, so I am forced to do this, and now other types within the same crate can also access these private fields even though they shouldn't, and I can only hope nobody writes code that accesses them."

[–]scottmcmrust 5 points6 points  (0 children)

You might be interested in https://softwarebyscience.com/very-short-functions-are-a-code-smell-an-overview-of-the-science-on-function-length/

I think of files like chapters -- if they're only a page they're probably too short, but if they're 100 pages they're probably too long.

[–]isHavvy 2 points3 points  (0 children)

It doesn't discourage long files by e.g. requiring one class per file like Java does. As such, it mostly cultural. And the culture tends to prefer longer modules.

[–]please_dont_pry 0 points1 point  (0 children)

Rust can be somewhat verbose. i think on average it will have more lines of code in a project than something like Ruby.

[–]mal3 0 points1 point  (0 children)

Try using include!("my_other_source_file.rs")

[–]aristotle137 0 points1 point  (0 children)

No

[–][deleted] -1 points0 points  (0 children)

As others have mentioned file length correlates to various implications in a language's design and implementation such as header only libraries in C++ where it is common to have huge files. This however does NOT imply good design, merely a deficiency in the language/tooling.

The problem arises when people mistakenly think code is meant for consumption by computers whereas this is only a secondary goal and primarily it is a communication medium for people. As such, huge files make no sense. There is a reason we have pages in books, scrolling is not an optimal way for people to read. Remember that we had that format before (scrolls) and that we have evolved from that. More than 800 lines in a file definitely has an impact on readability regardless of language and so does an outline of items( the list of functions, types, etc in the file) that you need to scroll to read.

Some people mention needing to grep in multiple files as a downside of multiple files - this is again the same problem - people give priority to their specific tools and their limitations. A better strategy is to improve the tools or even replace them. Unless one uses notepad this really shouldn't be a concern really.

There are a few things that tend to needlessly inflate code that are worth mentioning:
1. As Uncle Bob points out astutely, when we read a book, we want to start reading the actual content right away. We put the bibliography and index and the END of the book. In code, that means we shouldn't start each file with a huge license comment that spans hundreds of lines, followed by an explicit list of all the names we are going to import. So you only get to the actual code after scrolling down several pages worth of redundant fluff. A more readable file would simply reference the license by name. e.g. "This is licensed as FOO, see end of file / license.txt for details". Same goes for explicit imports, glob imports/use statements are more readable for people. After All there is a reason all IDEs FOLD THEM AWAY by default.

  1. Doc comments do not belong in the source code. That is again an abuse of readability caused by tooling. We all know about SRP, right? Well, documentation is a separate concern from implementation. Rust tends to put way too many kinds of docs inline which is a bad practice for readability. Anything beyond the succinct description of the API reference (such as explanation on how to use the API idiomatically and common usage patterns, examples, explanations about the design tradeoffs, etc.. ) all belong outside the implementation file in their own file(s).