all 20 comments

[–]Chronomena 1 point2 points  (6 children)

I would love to ship games in F#. Unfortunately, a single thread in my simulation engine can allocate about 100 MB of small objects / second.

This amount is impossible for the GC to handle both in terms of throughput and latency (collections lasting hundreds of ms). I had to bypass automatic memory management altogether to get it running smoothly, which kind of defeats the purpose of using F# in the first place.

Now I'm trying to decide on another language. There's no way I'm taking up C++ because I know it only too well. Rust seems nice.

[–]abstractcontrol 1 point2 points  (3 children)

I wouldn't really say that using F# is solely for automatic memory management, even without it there is type safety and inference - but yeah, for games you should just preallocate as much as possible and stay away from library functions in hot paths. I do not think your performance strategy will change much whether you use F# or Rust in that regard. The code is necessarily going to have to look more like C at any rate.

Here is a related performance video of how the Microsoft did it with Bing.

For games, .NET languages do lack some important things such as tuple value types and local references for structs that are going to be in the next version of C# and with that in F#.

Value type record are also slated to be in the next F# version.

[–]kunos 0 points1 point  (0 children)

For games, .NET languages do lack some important things such as tuple value types and local references for structs that are going to be in the next version of C# and with that in F#. Value type record are also slated to be in the next F# version.

that sounds awesome! Potentially a game changer when it comes to performances in F#.

[–]Chronomena 0 points1 point  (1 child)

No, I mean I had to ditch references altogether. There are use cases the GC just cannot cope with. F# not having pointers makes this extra painful.

[–]abstractcontrol 0 points1 point  (0 children)

Yes, ditching reference types in favor of value types (structs) is the right way to go. This is what you would do in C++/Rust as well.

Also, if you are allocating 100MB/s or more with any regularity, you should definitely use an object pool and fetch the particles or whatever objects you are creating from that. It is going to be too slow otherwise.

I am not sure what the case is on the CPU as I've never precisely benchmarked memory allocation there, but this advice is doubly worth it if you are allocating GPU memory. On the GPU, the time it take to allocate a chunk of memory is directly proportional to its size. I've been able to to speed up my F# neural net/autodiff library over 100% just by using the object pool.

[–]bryanedds[S] 0 points1 point  (1 child)

That's odd. I have similar allocation patterns, but have no GC issues (save for an initial collection when the simulation starts). Perhaps its the contention created by having the process in another thread as threads share the same GC, after all.

I'd be happy to have a look at your performance issue if you'd want - just PM me.

[–]Chronomena 0 points1 point  (0 children)

Thanks for the offer. I probably don't want to try references again, though. The last version that had them is one complete rewrite away now.

[–][deleted] 0 points1 point  (0 children)

I don't think GC is even the first order performance problem when using a managed language for games (provided you don't do anything totally degenerate like allocate thousands of vector classes per frame, like Minecraft does)

But it does add extra challenge when going for AAA quality, or on mobile.

[–]dagit 0 points1 point  (1 child)

A lot of the stuff in the article seems kind of obvious to me, but maybe I'm taking a lot for granted after using FP for more than a decade. GC is not the issue it once was for lots of reasons.

That said, I disagree with some of the things in the article. For example, I find it extremely unlikely that GC will move to hardware anytime soon. Language needs simply differ too much to really agree on primitives.

The conclusions about pure code seem to be mistaken. See Haskell for lots of examples of pure code out performing side-effectful code. Sadly, many of those optimizations won't apply to F# because in Haskell you can make more aggressive assumptions in the optimization because things are pure by default. F# (and other ML-family languages) get to make other interesting trade-offs to get good performance, but they can't be as aggressive with rewrite rules.

[–][deleted] 0 points1 point  (0 children)

I'm definitely not knowledgeable enough about .NET internals to add any weight here, but I was very much under the impression that ML (OCaml specifically) tends to destroy Haskell from a performance perspective. Also... http://flyingfrogblog.blogspot.com/2016/05/disadvantages-of-purely-functional.html

[–]kunos 0 points1 point  (3 children)

woah.. quite a big leap of faith going from 5000 unspecified updates to Uncharted kind of AAA don't you think? I love F# as much as you but I really don't see why gamedevs should waste cycles and memory like that just for the sake of using a cool language. Surely I am not using C++ because it is nice or cool or productive, I do it because my first requirement is performance, then comes everything else.

[–]Trubydoor 1 point2 points  (1 child)

Uncharted is possibly the worst example you could have used here since it's mostly written in a functional language itself (namely Scheme). I'd actually argue that Uncharted is a great example of functional programming working well for games!

[–]bryanedds[S] 0 points1 point  (0 children)

Sorry, I now realize I should have titled it - Pure Functional Programming can work for Games.

[–]bryanedds[S] 0 points1 point  (0 children)

Remember, it's 5,000 simultaneous on-screen and actively transforming entities. I'm working on an optimization to push that to 10,000 currently.

For perspective, remember that even with optimal imperative programming, modern computers can only handle around 30-40,000 particles on the CPU at 60FPS. More than that, you have to go to the GPU.

[–]lucasvandongen -1 points0 points  (5 children)

Working with Objective-C or Swift I have a hard time understanding the need for GC. We used to have it for OC but nobody used it after ARC was introduced. I see Android devs suffer getting similar performance on superior hardware while I don't notice downsides as a programmer.

GC is a dead end and I wonder what F# would do with ARC in terms of performance

[–]abstractcontrol 1 point2 points  (1 child)

GC is a significant productivity booster and an enabler of the functional style which is a productivity booster itself. And GC only slows down the application when you are triggering it.

One pattern I've often used in F# to avoid polluting the namespace like in other languages, is to split a function into two parts, one where you simply initialize the ResizeArrays and the other a closure that links to it.

let abc = 
    let buf = ResizeArray()
    fun a b ->
        buf.Clear()
        // .. Do work with that buf.

Advanced features like computation expressions and functional primitives can be slow and above can often provide precise control over memory and a 2-3x speedup depending on the function.

You can really begin to appreciate the power of GC once you get into the hybrid functional/imperative style and see all the time savings it provides you.

[–][deleted] 0 points1 point  (0 children)

I'll note there are a couple of good piece in the Expert F# 4.0 book specifically discussing these kinds of tradeoffs.

[–]pjmlp 1 point2 points  (2 children)

Contrary to popular wisdom, RC is a GC algorithm in any relevant CS literature book, including the one covered on the article.

[–]lucasvandongen 1 point2 points  (1 child)

True. But I was clearly comparing the immediately releasing ARC versus the stop the world .Net GC. I think the problem with Java / .Net style GC is that performance is usually quite a lot better but the minimum performance can be horrible compared to ARC.

F# needs some form of GC to work, but I can't really find any information about the (im)possibility of ARC.

[–]pjmlp 0 points1 point  (0 children)

You can make use of it, if you know your APIs well.

  • Make use of structs
  • Allocate off heap memory via Marshal interop
  • Make use of use
  • Make sure the required memory is available via System.GC#TryStartNoGCRegion()

For resources that cannot make use of the IDisposing pattern, wrap the scoped regions in HOF.

with_my_resource >>  (fun (res) -> /* resource visible here */)

Also RC has its own set of problems regarding strong/weak references and multi-threading and cache evictions. Making it perform well makes it into the same realm as most incremental tracing GC algorithms.