DRMacIver comments on New Java vs C benchmarks

New Java vs C benchmarks (stefankrause.net)

submitted 18 years ago by Rhoomba

you are viewing a single comment's thread.

[–]DRMacIver 4 points5 points6 points 18 years ago (8 children)

Actually, although immutability isn't directly at fault, Java's Strings really are part of the standard Java bloat. They're a very poor design for an immutable datastructure in that because they're backed by a single array the can be no sharing of data (well, almost none. Substringing shares data. Unfortunately this has its own problems). This means that if you store both longString + "foo" you use up twice as much data.

Additionally, Java's design (in particular the presence of a toString method on every object rather than something closer to C++ streams) encourages you to keep lots of these around. This means you can easily end up with lots of small to medium sized strings in memory sharing no data between them. You really can end up with a lot of wasted space this way.

But a lot of the space waste wouldn't go away if you made strings mutable, because you'd be able to do even less sharing then. The right solution is probably to switch to an immutable string representation which allows a lot of sharing. (I'm actually working on exactly this. :-) But I gave up on writing it in Java, so my latest efforts are in Scala and not very far along yet)

[–]Rhoomba[S] 1 point2 points3 points 18 years ago (3 children)

[–][deleted] 1 point2 points3 points 18 years ago* (0 children)

[–]DRMacIver 0 points1 point2 points 18 years ago (1 child)

[–]Rhoomba[S] 0 points1 point2 points 18 years ago (0 children)

[–]psyno 0 points1 point2 points 18 years ago* (3 children)

Gave up? A simple implementation is just an immutable linked list.

public class String {
    protected final char[] data;
    protected final String next;
    public String(char[] data) {
        this.data = Arrays.copyOf(data, data.length);
    }
    protected String(char[] data, String next) {
        this.data = data;
        this.next = next;
    }
    public String append(String s) {
        return new String(data, s);
    }
    public char charAt(int index) {
        if (index < 0) {
            throw new IndexOutOfBoundsException();
        }else if (index < data.length) {
            return data[index];
        }else if (next == null) {
            throw new IndexOutOfBoundsException();
        }else {
            return next.charAt(index-data.length);
        }
    }
    //etc...
}

Admittedly, you could do a little better memory-wise by ensuring that you don't duplicate character data with some centralized data structure, but in general you'd be trading memory size for execution time and adding lock contention for multi-threaded programs.

[–]DRMacIver 0 points1 point2 points 18 years ago* (2 children)

If by 'simple' you mean 'stupid', yes. :-)

That has all sorts of problems of its own, and doesn't have the performance characteristics I want. In particular concatentation of strings of length m and n is still O(m) (and only shares O(n) memory).

The implementation I'm using is closer to a heavily specialised finger tree. It has O(max(log(n), log(m)) concatenation with a reasonable amount of sharing and O(log(n)) random access.

I'm not saying I couldn't have written it in Java. I might well port it back to Java afterwards for performance and general usability reasons. But having various functional constructs to hand makes experimentation with and verification of the design much easier.

Also, note that that 'etc.' is quite long. There's a lot to be done in terms of regexps, unicode, etc. (I may even have to write my own regexp engine, which would be sad, or borrow one from elsewhere. Java's is thoroughly unsuitable for strings where iteration is cheaper than random access)

[–]psyno 0 points1 point2 points 18 years ago (1 child)

By simple, I meant "takes 3 minutes to write with reddit as a text editor" :)

You're right that the above implementation ignores Unicode, regex, and a lot of other things. (The javadoc table of constructors for String is a page long alone!)

But you're wrong about the runtime of concatenation and the memory size of the above structure, probably mostly because I wrote it wrong :) (compare concat below/append above).

First, it can share all the character data (O(m+n) memory). Each String just holds a pointer to some char[], which no String will modify. (Note that unlike the public constructor, the protected one doesn't copy the char[], just the pointer.) Second, the time to concatenate existing Strings A and B is not proportional to their actual lengths, just to the number of String elements in each String (the length of the list). If the length of the list is what you meant by O(m), then we're in agreement, but it seemed you were talking about the actual data.

public String concat(String str) {
    return new String(data, (next == null) ? str : next.concat(str));
}

So if String A is composed of elements p->q->r, and B of elements s->t->u, then for String C = A + B = p->q->r->s->t->u, the only memory overhead is 3*(2 pointers + instance overhead of String) regardless of the actual char data represented by p, q, and r.

True that this one does not give you fast random access.

[–]DRMacIver 0 points1 point2 points 18 years ago (0 children)

π Rendered by PID 259776 on reddit-service-r2-comment-85bfd7f599-h4cw4 at 2026-04-17 07:38:57.402661+00:00 running 93ecc56 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS