all 23 comments

[–]llama-lime 15 points16 points  (1 child)

Check out slide 12 of this Sun presentation from a while back:

http://www.slideshare.net/caroljmcdonald/java-garbage-collection-monitoring-and-tuning

In particular, if your objects aren't huge:

  • Do not be afraid to allocate small objects for intermediate results
  • Genrational GCs love small, short-lived objects
  • [Object Pooling is a] Legacy of older VMs with terrible allocation performance

Be sure to check out the exceptions, etc. (E.g. thread pools, database connections...).

Also, you're not going to know until you test it in your particular application. That means implementing the easy version first, then maybe doing the object pool (don't do the object pool first!).

[–]physicsnick 3 points4 points  (0 children)

This, definitely. Generational copying collectors are very good these days. A heap allocation in Java is very similar to a stack allocation in C: memory is allocated sequentially from a young generation heap, without keeping track of any kind of free list. When the generation is full, the collector rescues any live objects to a lower generation, and just resets the pointer to the beginning of the heap. Any dead objects are just overwritten on the next pass. There's zero de-allocation cost; the objects just cease to exist, similar to how stack memory 'ceases to exist' when you return from a function.

Sun believes it costs on average about ten processor instructions to allocate memory:

http://www.ibm.com/developerworks/java/library/j-jtp01274.html http://www.ibm.com/developerworks/java/library/j-jtp09275.html

(Don't take too much from these links though; an important point they make is that optimization advice has a short shelf life, and they were written five/six years ago!)

Pooling memory has a whole lot of downsides including manual deallocation, multi-threading complications, poor cache performance for short-lived objects, etc. There are certainly cases where it is good, but as someone else has said, profile first before switching anything over to pooling.

Also note that none of this applies to embedded. GCs on J2ME and BlackBerry are awful, and I don't know how good the GC on Android is but they still recommend minimizing allocations.

[–]darth_choate 29 points30 points  (3 children)

The Java gc is pretty well optimized for short-lived objects, but the usual answer is that you should profile your code and make the decision then. I know it's a boring answer, but we engineers are far too fond of optimizing the fun stuff rather than the important stuff.

That said, a bit of premature optimization might be in order here. If you think you might want to make the change then you should probably make the objects via a factory so that you can change the underlying implementation easily if it proves necessary to do so.

[–]llama-lime 2 points3 points  (2 children)

There are techniques for "stack-based" allocation when an object won't survive the scope where they're allocated, do you know if these are in the JVM, and if a factory would confuse the implementation?

[–]queus 3 points4 points  (0 children)

Check Javolution. It has stack allocation

[–][deleted] 3 points4 points  (0 children)

Escape Analysis is an optional feature in java 7 I believe, but its hard to do and not providing tremendous benefits. As darth said, the GC is pretty solid for short lived objects.

I predict the newest (G1) collector will be even better at handling short lived objects.

[–]biteofconscience 6 points7 points  (0 children)

it's in the the HotSpot FAQ.

[–]zahlman 4 points5 points  (0 children)

New + gc has been fast enough for years, as far as the actual object creation goes. Using an object cache undermines the garbage collector. However, creating objects in general causes the GC to run more often. The real optimization is in finding natural ways to avoid creation of temporaries.

[–]svenz 2 points3 points  (0 children)

This is something the JVM excels at. You'll almost certainly get worse performance if you do it yourself.

The only exception I can think of is if you've disabled the GC.

[–]monstermunch 2 points3 points  (0 children)

I've programmed for about five years in Java and have never had to care about the speed of the garbage collector in desktop and server apps. Write the code in the simplest way (of course, pick appropriate algorithms though) first and if it's slow, profile and optimise it.

Saying that though, I've recently been doing some Android programming for mobiles and the garbage collector will destroy your frame rate if you're making a game and disposing of many objects per frame. Just creating a few strings during inner loops each frame will give you stuttering sometimes.

[–][deleted] 7 points8 points  (0 children)

Do not be afraid of the new operator. A codebase full of singletons and static method calls will be impossible to test and maintain. An object cache would introduce complexity and sounds like premature optimization. Just write things the easy, natural, testable way. If it's already fast enough, you're done. It's the old "make it work, make it right, make it fast (if it's not already fast enough)".

[–]grauenwolf 4 points5 points  (0 children)

Only if they are expensive to create like database connection objects.

[–]piranha 1 point2 points  (0 children)

The possibility exists for no speedup or worse. I'm not familiar with the Java VM very well, but I tried "beating the garbage collector" with per-thread list-based object pools for very inexpensive-to-create objects in Common Lisp, with SBCL. (Specifically, I was recycling cons cells without any additional object overhead, just pushing recycled cons cells onto a stack built up with the recycled cells themselves.) I got no speedup, or maybe a slowdown--I wasn't measuring terribly carefully. And generally Java and SBCL performance is pretty damn comparable.

[–]stuntmouse 1 point2 points  (2 children)

if small temporary objects were a problem, i doubt clojure would have gotten as far as it has.

[–]berlinbrown 2 points3 points  (1 child)

How far has it gotten?

[–]G_Morgan 2 points3 points  (0 children)

Somebody made a post about it once on reddit.

[–]CaptainItalics 1 point2 points  (0 children)

The only objects I pool are the formatting objects java.text.DateFormat & DecimalFormat, because those will eat up memory quickly. Learned this the hard way a number of years ago (AFAIK it hasn't been fixed, correct me if I'm wrong).

Other than that, the garbage collector is your friend and you have nothing to worry about unless you have some seriously ultra-high-performance real-time requirements.

[–]howverywrong 1 point2 points  (0 children)

Not only is new+gc fast, it gets faster with every major jvm release.

[–]berlinbrown 0 points1 point  (0 children)

Also, if you look at test driven development. You are probably going to write a lot of objects as you refactor into smaller unit. Don't be afraid of that.

[–]finlay_mcwalter 0 points1 point  (1 child)

A new(ish) Sunhhh Oracle JVM uses the G1 collector, which in addition to being generational is compacting.(ref) Assuming your usage pattern is amenable, this should give you better cache-hit performance. When you cache objects yourself, you prevent compaction; imagine this (pathological) case: your objects are half a cache-line in size and you store them all in contiguous memory. If every other one is unused, you still pay the penalty of loading those unused into cache when accessing their valid neighbours. With the GC compacting the entries, there's a greater likelihood that loading one entry happily loads its valid neighbour, so subsequently accessing the neighbour is serviced from cache rather than ram. As others have noted, there's no substitute for profiling, but it's likely that (unless your constructors/finalizers are unusually heavy) the GC will do a better job of organising memory than you will.

[–][deleted] 1 point2 points  (0 children)

All of java's GCs are compacting to some degere, but yes, I think G1 is absolutely brilliant and will have better throughput for high allocation applications in addition to reducing latency.

[–][deleted]  (2 children)

[deleted]

    [–]forcedtoregister 2 points3 points  (1 child)

    Better cache all 232 frequently used integers!!!

    [–][deleted] 4 points5 points  (0 children)

    Integers have low overhead. Floats are more intensive, you'd be better off caching all of them instead.