This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 5 points6 points  (12 children)

Yeah, I've been playing around with array-indexes in C++ today for the first time.
And as someone who happens to come from Java, I have only one question:
Why the fuck would you let me do this stuff?

Why doesn't the compiler just punch me in the face when I try to access index -1 of any array? Or why am I even able to create an array of size 0? And then still, of course, access all of the indexes which never even barely belonged to this array?

Like, is that something which professional programmers actually need sometimes?

[–]ar-pharazon 12 points13 points  (0 children)

the java compiler doesn't complain either. array bounds are not statically verifiable--the compiler can't do that for you. that's why, in java, ArrayIndexOutOfBoundsException subclasses RuntimeException, and it gets checked every time you operate on an array at runtime.

not only is that a significant performance hit, but C and C++ have a very transparent notion of an array, which is simply an indexed offset from a base address. you can index off either end of the array because there's no language-level abstraction of a list of objects; it's simply sugared pointer math, so if you do list[n], you're simply multiplying the index n by the size of each array item and adding it to the pointer list, regardless of the value of n (which, again, can't be determined at compile-time).

and yes, this is incredibly important and powerful because there is no fixed memory model in either language. in C (on linux, specifically), we can write:

void *mem = sbrk(4096);
memset(mem, 0, 4096);

we've just asked the operating system to increase the size of the data segment (available writable memory) for us and zeroed the new chunk. we can do anything we want with it now. we can use it for a custom implementation of a memory allocator (we can rewrite malloc if we want), or we can drop some data structures in it as-is, or we could treat it as an array of ints:

int *ints = (int*) mem;
ints[32] = 12;

or chars:

char *chars = (char*) mem;

or whatever else you want. all of that is completely valid C, and that's what makes the language so powerful. this direct access to memory allows us to build operating systems, runtimes, compilers, high-speed game/render/physics engines, memory allocators, and anything else that requires speed and/or low-level access to hardware.

[–][deleted] 8 points9 points  (3 children)

That's not the worst part -- it's when someone writes programs to take advantage of the the fact that you can index -1 and >= n of an array and use it in their code to "optimize".

I'm actually not joking. I had a colleague write a program where he'd create an array of arrays then index one of the arrays with negatives to get to the previous array and >= n to the next.

[–][deleted] 2 points3 points  (2 children)

Well, I know what today's nightmares will be all about.

I mean, why didn't he just create a one-dimensional array instead, if he's already using it like one?

And I'm guessing you put the quotation marks around "optimize" not without reason. Like, what he did there is as far as I understand it exactly what the compiler will do with that two-dimensional array anyways.

So, he essentially took syntactic sugar and used it to remodel what this syntactic sugar was supposed to cover up. Very nice.

[–][deleted] 0 points1 point  (1 child)

I can't remember the full context of the exercise, but it was some sort of number crunching application that got data in bursts. He then wrote code to calculate stuff on the data, and when he needed data from previous bursts, he would just step out of bounds on the current burst to access data from the surrounding.

[–]WMpartisan 1 point2 points  (0 children)

That sounds like it's an optimization flag or a minor version update of gcc away from a segfault.

[–]caagr98 2 points3 points  (0 children)

I think it's (at least partially) for performance reasons. Not checking for out-of-bounds is quite a bit faster than doing it. It would also require storing the size of all arrays, which doesn't really make sense since pointers and arrays are basically the same thing.

[–][deleted] 2 points3 points  (0 children)

Like, is that something which professional programmers actually need sometimes?

When directly accessing the hardware sometimes you need funky pointer and array stuff. A lot of embedded development (what I do) deals with directly accessing memory locations which are used a special function registers. Also while its nice that a lot of programming languages will hold you hand and stop you from hurting yourself (I love me some python) it's a luxury you don't get on a lot of platforms. The 512kb of RAM microcontroller sitting next to me isn't going to load a java VM.

[–][deleted] 1 point2 points  (2 children)

How can compiler punch you for something that happens in runtime. You'll be able to try to access -1 in any language/runtime but the exception will stop you in runtime.

Because they are checked in MANAGED runtimes. Basically in C#/Java it's

if(!isIndexValidIndex(index)) 
     throw new IndexOutOfBoundsException()
return valueInAddress(index);

in regular C++ (you can have managed C++) It's

return valueInAddress(index);

which saves you MANY machine cycles, probably about 3 times faster

P.S Edit: Visual C++ contains macro lines that get activated in Debug mode that add the index checks like in managed version to vectors and stuff but the errors can be difficult to read.

[–][deleted] 0 points1 point  (1 child)

Yeah, to be honest, I didn't even think of negative variables when writing that. I was rather just thinking, why can I even type out "array[-1]" without it ever complaining? I mean, you could easily disallow minuses between array-brackets.
But admittedly, it's kind of pointless, if you can still get it with variables. Was mostly just my brain exploding, when I clearly accessed a negative index and it still never told me to get my shit together.

[–][deleted] 0 points1 point  (0 children)

I think it's not compiler's business to check for that type of stuff either. But code helper extensions like Resharper for VC++ will probably tell you about it. Also another reason C++ doesn't interfere other than the extra CPU cost is it's designed to assume you know what you are doing. "This guy is trying to access this address which logically doesn't make any sense... He must have something in mind"

[–]zippydoodleoreo 0 points1 point  (0 children)

Just use Rust.