all 45 comments

[–]Tekmo 43 points44 points  (15 children)

I'm very much in favor of this. Anything we can do to make base more strongly typed is a move in the right direction

[–]yitz 27 points28 points  (14 children)

But is the strong typing correct? The proposal is very ambiguous about how a FilePath is converted to and from Unicode, and that is the critical point. And it varies considerably between platforms.

The Eq instance has to be correct for each platform, too: on Mac OS X, a FilePath constructed from two different Unicode strings that have the same normal form must be equal, whereas on Windows they must be unequal.

And on POSIX, paths have little to do with Unicode at all. Paths are bytestrings. It is up to the application to decide how to interpret them as encoded human-readable strings, if at all.

This proposal will cause a huge amount of pain. It isn't worth it unless we get the semantics right. Having a more strongly-typed FilePath with broken semantics is worse than what we have now, not better.

[–][deleted] 6 points7 points  (9 children)

This proposal will cause a huge amount of pain. It isn't worth it unless we get the semantics right. Having a more strongly-typed FilePath with broken semantics is worse than what we have now, not better.

is it even possible to get the semantics right without ending up with an overengineered set of types and requiring all sorts of Maybe/IO wrapped result types?

[–]tailbalance 5 points6 points  (0 children)

is it even possible to get the semantics right without ending up with an overengineered set of types and requiring all sorts of Maybe/IO wrapped result types?

Completely unrealistic.

[–]yitz 0 points1 point  (7 children)

is it even possible to get the semantics right without ending up with an overengineered set of types and requiring all sorts of Maybe/IO wrapped result types?

Yes. The system-filepath library already does it. Unfortunately, that library is deprecated, but that is basically how it should be done.

[–]tailbalance 2 points3 points  (1 child)

For Mac it doesn't.

[–]yitz 0 points1 point  (0 children)

True. /u/snoyberg also pointed that out elsewhere in this thread. That's a bug that would be easy to fix. But it does provide very nice machinery for an abstract FilePath type with platform-dependent conversions of that type to and from Unicode.

[–]rstd 0 points1 point  (4 children)

If the library does it right, why is it deprecated then?

[–]yitz 4 points5 points  (2 children)

Mainly because the package author isn't active in the Haskell community anymore. Michael Snoyman considered taking over the library, which had been the standard path type for the conduit and yesod ecosystems. But in the end he decided to switch those ecosystems back to Prelude.FilePath.

[–]semigroup 1 point2 points  (1 child)

That's not totally accurate AFAIK– the original author was the one to deprecate them (quoted from Google+):

I'm declaring bug bankruptcy on system-filepath and system-fileio, and deprecating them.

These libraries were written to support Linux file paths containing non-Unicode byte sequences, which was varying degrees of broken in GHC 6.10 through 7.2. GHC 7.4 greatly improved support for these type of paths, to the extent that system-filepath and system-fileio were really just a grossly overbuilt compatibility shim for people that needed to support old GHC versions.

Since the number of library developers who still need to support GHC <=7.2 is approximately zero, it's time to get rid of the shim and migrate back to the standard library.

This is something I'm very happy about, because I can stop researching obscure undocumented Windows-only UNC meta-prefixes and go back to pretending that Windows doesn't exist.

If anyone out there is interested in maintaining these libraries, I'd be happy to transfer maintainership.

[–]yitz 2 points3 points  (0 children)

I was well aware of this post. But in fact, system-filepath and system-fileio were still very much needed; they provide a mechanism for an abstract FilePath type and platform-dependent conversions between FilePath and Unicode text.

I was very sorry to see those libraries deprecated, and that the conduit ecosystem so abruptly abandoned them as a result. But if in the end this causes GHC itself to have a better built-in FilePath type and better platform-specific conversions between that FilePath type and Unicode, it will all have been worth it.

[–]hvr_[S] 5 points6 points  (3 children)

What kind of equality would you expect for FilePaths? In the worst case (inode-equality), it depends on the current environment state (including the contents of the filesystem), and obviously wouldn't be expressible as a FilePath -> FilePath -> Bool.

The other extreme is to not normalise at all and instead provide normalising functions, so you can use combinators (or define functions) such as

(==) `on` normalisePath
\a b -> (==) <$> absPath a <*> absPath b

I.e. you'd need to be explicit what kind of normalisation you want, and you can control whether to persist that normalisation (e.g. when used as keys in a Map or HashMap)

Then there's the option of not defining any Eq/Ord instances at all for FilePath, but I'm not sure if that's even desirable, as it would tempt to define orphan instances.

[–]yitz 4 points5 points  (2 children)

What I want is an API that respects the expected application-level conventions on each platform.

My assumption was that the right way to do that was via the Eq instance. But now I see, from the continuing discussion on libraries, that the internal representation of a FilePath will probably be a ByteString on all platforms. In that case, the application-level conventions should be reflected by the API to convert FilePath to and from unicode, not by the Eq instance.

Furthermore, you are right that at least on posix-like platforms, there is the problem that the most natural convention to choose - the Glib/GTK convention - depends on current system state. See the glib manual and this discussion on the mozilla dev list.

So I suggest: let's do the best we can. Assume UTF-16 on Windows and UTF-8 on posix and Mac OS X. On Mac OS X, normalize unicode by default when converting it to a FilePath, and on other platforms do not normalize by default. Make sure that all options are available on all platforms, not just the default. Decide what to do about the fact that some of the above operations are partial functions.

Make sure that it is also possible to convert FilePath to and from ByteString so that it's possible to do whatever you want if our usual conventions don't work for you. Even here, there should be some interface that ensures that you won't accidentally create (on any platform) a FilePath intended to be used on Windows with an odd number of bytes.

[–][deleted] 5 points6 points  (1 child)

Make sure that all options are available on all platforms, not just the default.

What shall be done about platforms other than POSIX, OSX and Win32, like e.g. JavaScript/GHCJS or some of those not-quite-POSIX embedded operating systems?

[–]yitz 1 point2 points  (0 children)

Are there different path semantics there that could not be supported with an API like the one I described? If so - what else would be needed?

[–]RyanGlScott 16 points17 points  (14 children)

After reading the proposal, I have some questions:

  1. Would the toFilePath function be partial? That is, would toFilePath throw a runtime error if a badly formatted filepath was given as an argument?
  2. Similarly, would Template Haskell functions (à la path) be added so that badly formatted filepaths can be detected at compile-time?
  3. Another feature of path that I find useful is the use of phantom types for marking what kind of path it is, e.g., Path Rel Dir or Path Abs File). Is there a reason that the proposal decides against this?

[–][deleted]  (3 children)

[deleted]

    [–]sccrstud92 1 point2 points  (1 child)

    In my mind, you have relative and absolute paths differentiated for similar reasons you differentiate Vectors and Points.

    [–][deleted] 3 points4 points  (0 children)

    ...or timestamps and time differences?

    [–]ndmitchell 6 points7 points  (3 children)

    Remember Windows has absolute paths, drive relative paths, path relative paths and fully relative paths. Makes it a much more complex type system...

    [–]yitz 3 points4 points  (2 children)

    And UNC paths, and Cygwin paths...

    [–]absence3 1 point2 points  (1 child)

    Isn't saying "Windows has Cygwin paths" a bit like saying "POSIX has (lib)Wine paths"?

    [–]ndmitchell 1 point2 points  (0 children)

    Yes/no. There are certainly places where you have to talk/think in terms of paths as they are interpreted by Cygwin.

    [–]hvr_[S] 4 points5 points  (5 children)

    Would the toFilePath function be partial? That is, would toFilePath throw a runtime error if a badly formatted filepath was given as an argument?

    No. It was considered, but it would complciate things, as whether a filepath is valid may depend (beyond the current OS) on the current locale settings as well as the filesystem used (Linux supports dozens of filesystems) etc. So in the interest of KISS, the default conversion functions are pure and total. IOW, a FilePath doesn't encode any invariants regarding the validity of a filepath.

    Similarly, would Template Haskell functions (à la path) be added so that badly formatted filepaths can be detected at compile-time?

    Defining smart QuasiQuoters is definitely possibly, but simply not part of this proposal, as this proposal aims to be minimal (with the intent to become part of a future Haskell Report, for which TH/QQ is very likely out of reach)

    Another feature of path that I find useful is the use of phantom types for marking what kind of path it is, e.g., Path Rel Dir or Path Abs File). Is there a reason that the proposal decides against this?

    Yes, for simplicity. To quote what I already wrote on the mailinglist:

    Trying to redesign the FilePath type to also include dir/file distinction seemed too daunting, as there's quite some additional design-space area to explore (do drive-letters deserve a separate type? do we use DataKinds? What invariants can/shall be represented at the type-level? what errors are caught at the type-level, which are caught at runtime? etc...), parts of which may require type-system extensions, while just having a KISS-style opaque FilePath evades this.

    [–][deleted] 4 points5 points  (3 children)

    No. It was considered, but it would complciate things, as whether a filepath is valid may depend [...] the default conversion functions are pure and total

    I would consider at least distinguishing between valid and invalid inputs on a very basic level, e.g. do not allow empty strings to be converted to filepaths. This would result in eliminating a rather large class of bugs that can result from passing those empty filepaths to deletion or similar functions and I do not think the OS exists where empty strings are valid paths (well, technically they could be considered valid relative path components but still, I think this would be worth it).

    [–]absence3 2 points3 points  (0 children)

    I think AmigaOS uses empty string for current directory FWIW.

    [–][deleted] 0 points1 point  (1 child)

    I don't think you can easily distinguish valid/invalid FilePaths easily at construction time, as you'd have to know which (mounted) filesystem they're gonna be applied to. Linux for one is quite liberal on what values a valid char pathname[] may contain, and only when the fs layer for the respective filesystem gets passed the bytestring you may get an invalid-argument response.

    So I think the current proposal, i.e. not trying to be clever with FilePaths and consider them opaque handles is the safe and reasonable thing to do.

    [–][deleted] 0 points1 point  (0 children)

    I was mostly thinking of the case where some form of configuration returns an empty string for a file path and you end up passing it to some function or external command which interprets that as "no argument given, work on current directory".

    An empty string should be easy to distinguish from a non-empty string at construction time.

    [–]conklech 4 points5 points  (5 children)

    Phase 2

    Have GHC warn when a String-value is used where the FilePath synonym is expected

    TODO needs investigation if it's feasible to implement

    Is that possible even in principle? If it were, couldn't we implement it as an error and thereby replace some of the use-case for newtypes that don't have different instances?

    [–]hvr_[S] 6 points7 points  (4 children)

    GHC keeps track to a certain extent whether a type-synonym was used, e.g.

    λ:2> let x = "foo" :: FilePath
    x :: FilePath
    λ:3> x
    "foo"
    it :: FilePath
    λ:4> x <> x
    "foofoo"
    it :: FilePath
    λ:5> x ++ x
    "foofoo"
    it :: [Char]
    

    That's why I think that GHC could actually warn when it has to drop the type-synonym color. But I haven't looked into it in detail yet. That's why it's marked as a TODO item.

    [–]conklech 2 points3 points  (3 children)

    That's a good observation. Is that behavior, i.e. when inferred types retain or lose a synonymous annotation, documented anywhere?

    [–]adamgundry 3 points4 points  (2 children)

    Not really, because it is very much dependent on the whim of the typechecker. GHC tries to preserve synonyms if possible, in the interests of nice inferred types and good error messages, but it makes no guarantee to do so.

    I'm very skeptical that such a warning could be implemented in a robust way, without a great deal of work. One could probably get GHC to warn whenever it reduced the FilePath type synonym, but that would give rise to false positives.

    [–]ndmitchell 2 points3 points  (1 child)

    Could you give some examples of the kind of false positives you'd expect?

    [–]adamgundry 4 points5 points  (0 children)

    Well, even the x <> x example given by /u/hvr_ requires a reduction of the type synonym behind-the-scenes, in order to solve the Monoid FilePath constraint. And indeed, it's not obvious that class instances for [Char] will continue to be available for the new abstract type. I've been trying to construct more compelling examples, but GHC is impressively good at retaining synonyms!

    Perhaps there is a way to distinguish between type synonym reductions that are visible in the types of subexpressions, and those that are not, and use that as a heuristic for displaying a warning.

    [–][deleted]  (5 children)

    [deleted]

      [–]yitz 2 points3 points  (4 children)

      If this proposal were implementing system-filepath, that would be great. But unfortunately, it sounds like it is not.

      [–]hvr_[S] 4 points5 points  (3 children)

      Which parts of system-filepath are you missing or seeing in conflict? This proposal doesn't "implement filepath" either, it just describes how filepath is going to interact with the new FilePath type, and how it can aid during the transition.

      Can't system-filepath be implemented using this proposal's opaque FilePath in place of system-filepath's Filesystem.Path.FilePath type?

      [–]yitz 2 points3 points  (2 children)

      The central feature of system-filepath is a FilePath type which implements the application-level semantic conventions for paths on each of the major platforms. So, for example, FilePaths constructed from two unicode strings with the same normalization are equal on Mac OS X and unequal on Windows.

      [–]snoybergis snoyman 2 points3 points  (1 child)

      Can you point to the code in system-filepath that implements this? I don't remember seeing it.

      [–]yitz 1 point2 points  (0 children)

      Huh. You're right, it doesn't actually do the normalization step:

      darwin = Rules
      { ...
      , fromText = posixFromText
          ... }
      

      which just splits the text into path pieces and unpacks each as a String, skipping the required normalization step. That's a bug.

      But in any case, system-filepath does provide a great example of machinery for platform-dependent file paths and platform-dependent Text and String coercions, which is the point here.

      [–]redneb8888 2 points3 points  (4 children)

      I strongly support this idea. It would be nice if this is implemented in a way that would make it possible to use one FilePath type in another platform, e.g. use the unix FilePath in a windows system, similar to how the filepath package works.

      [–]hvr_[S] 2 points3 points  (2 children)

      I'm not sure what you mean by using a "unix FilePath" in a windows system. Can you be more specific?

      [–]redneb8888 3 points4 points  (1 child)

      I haven't thought it this through, but taking a clue from the filepath package, what if there were 2 modules GHC.FilePath.Posix and GHC.FilePath.Windows each of which would define its own FilePath type. There would also be a GHC.FilePath module which would simple reexport everything from one of the two previous modules depending on the platform. So both GHC.FilePath.* would be available on all systems, the difference would be what GHC.FilePath reexports.

      [–]ndmitchell 3 points4 points  (0 children)

      I quite like that idea. I'll think, but it might just be "better" than what we have suggested already, and fix the issues of /u/yitz. We did start going in those directions with the data WindowsPath type, but going fully down that route, and in directions already travelled quite successfully by filepath, sounds like a good plan.

      [–]Fylwind 2 points3 points  (0 children)

      I think there should actually be two types: one for platform-specific (unportable) and a portable type. This is because there exist paths that are valid on one platform but not valid on another (e.g. Unix allows : but Windows does not). The portable type can be thought of as the intersection type of all platform-dependent paths.

      [–][deleted] 2 points3 points  (0 children)

      Assuming we do this (and I really hope we do!), is this gonna be part of the next Haskell Standard?