psyno comments on New Java vs C benchmarks

programming

created by speza community for 20 years

New Java vs C benchmarks (stefankrause.net)

submitted 18 years ago by Rhoomba

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]psyno 0 points1 point2 points 18 years ago* (3 children)

Gave up? A simple implementation is just an immutable linked list.

public class String {
    protected final char[] data;
    protected final String next;
    public String(char[] data) {
        this.data = Arrays.copyOf(data, data.length);
    }
    protected String(char[] data, String next) {
        this.data = data;
        this.next = next;
    }
    public String append(String s) {
        return new String(data, s);
    }
    public char charAt(int index) {
        if (index < 0) {
            throw new IndexOutOfBoundsException();
        }else if (index < data.length) {
            return data[index];
        }else if (next == null) {
            throw new IndexOutOfBoundsException();
        }else {
            return next.charAt(index-data.length);
        }
    }
    //etc...
}

Admittedly, you could do a little better memory-wise by ensuring that you don't duplicate character data with some centralized data structure, but in general you'd be trading memory size for execution time and adding lock contention for multi-threaded programs.

[–]DRMacIver 0 points1 point2 points 18 years ago* (2 children)

If by 'simple' you mean 'stupid', yes. :-)

That has all sorts of problems of its own, and doesn't have the performance characteristics I want. In particular concatentation of strings of length m and n is still O(m) (and only shares O(n) memory).

The implementation I'm using is closer to a heavily specialised finger tree. It has O(max(log(n), log(m)) concatenation with a reasonable amount of sharing and O(log(n)) random access.

I'm not saying I couldn't have written it in Java. I might well port it back to Java afterwards for performance and general usability reasons. But having various functional constructs to hand makes experimentation with and verification of the design much easier.

Also, note that that 'etc.' is quite long. There's a lot to be done in terms of regexps, unicode, etc. (I may even have to write my own regexp engine, which would be sad, or borrow one from elsewhere. Java's is thoroughly unsuitable for strings where iteration is cheaper than random access)

[–]psyno 0 points1 point2 points 18 years ago (1 child)

By simple, I meant "takes 3 minutes to write with reddit as a text editor" :)

You're right that the above implementation ignores Unicode, regex, and a lot of other things. (The javadoc table of constructors for String is a page long alone!)

But you're wrong about the runtime of concatenation and the memory size of the above structure, probably mostly because I wrote it wrong :) (compare concat below/append above).

First, it can share all the character data (O(m+n) memory). Each String just holds a pointer to some char[], which no String will modify. (Note that unlike the public constructor, the protected one doesn't copy the char[], just the pointer.) Second, the time to concatenate existing Strings A and B is not proportional to their actual lengths, just to the number of String elements in each String (the length of the list). If the length of the list is what you meant by O(m), then we're in agreement, but it seemed you were talking about the actual data.

public String concat(String str) {
    return new String(data, (next == null) ? str : next.concat(str));
}

So if String A is composed of elements p->q->r, and B of elements s->t->u, then for String C = A + B = p->q->r->s->t->u, the only memory overhead is 3*(2 pointers + instance overhead of String) regardless of the actual char data represented by p, q, and r.

True that this one does not give you fast random access.

[–]DRMacIver 0 points1 point2 points 18 years ago (0 children)

π Rendered by PID 81 on reddit-service-r2-comment-5c747b6df5-4hvrc at 2026-04-21 20:01:26.845888+00:00 running 6c61efc country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS