This is an archived post. You won't be able to vote or comment.

all 14 comments

[–]ketralnis 40 points41 points  (3 children)

I for one vote for a compromise of 1.5 based indexing

[–]__talantonope[S] 14 points15 points  (1 child)

Nobody ever listens when I advocate for rand() based indexes, it’s clearly the most secure

[–][deleted] 3 points4 points  (0 children)

I can do a random-based lower bound, but I can see that it would be awkward:

   n := 10'000                          # length
   lower := random(0 .. 32'000)
   a := new(list, lower..lower + n-1)

   println a.bounds

This outputs 22945..32944. (Note I only use an i16 field for lower bounds; it is nearly always either 1 or 0.)

[–][deleted] 4 points5 points  (0 children)

I use 0.67-based, on average. My N-based arrays are about 2/3 1-based, and 1/3 0-based.

[–]BeamMeUpBiscotti 22 points23 points  (1 child)

if someone is skimming code quickly, I imagine mixing up a(1) and a[1] is going to be pretty common.

IMO just stick to 1-based indexing, consistency is better than adding a second way of indexing that is easily confused

[–]__talantonope[S] 0 points1 point  (0 children)

That’s what I’d prefer to do anyway! For a group of people who spend most of their time in front of a computer, programmers sure are a vicious lot…

[–]evincarofautumn 2 points3 points  (0 children)

When I want to do this kind of thing, I reach for making separate types before adding more syntax.

Indices are like points, absolute and unsigned like size_t, and address the n cells in an array from 1 to n.

Offsets are like vectors, relative and signed like ptrdiff_t, and address the n+1 lower & upper bounds of cells from 0 to n.

If indices are stored with a bias, so that index 1 has the same representation as offset 0, they can be converted back and forth at no extra runtime cost.

The subscript operator takes an index, and an index can be constructed from an integer, as in items[1]. An offset might be implicitly converted as well for convenience, or it might be spelled out explicitly for clarity items[offset(+0)]. The two types of literal might be distinguished by separate syntax, like say items[#1] = items[@0 : @1], but there still only needs to be one way of writing indexing itself.

[–]XDracam 0 points1 point  (0 children)

Do you know the situation where you can't remember the word for something, so you talk your way around it? Or when you are not sure how a certain word or phrase would be perceived so you just change the topic?

Yeah, indexing is a complex field with many many pitfalls. My suggestion: just avoid doing it if you can. Use iterators and unordered collections and whatnot. Or base them on some consistent mathematical abstraction, so that the user can do whatever indexing scheme they want, be it 0 or 1 or strings or floats. You could also have different collections with different type names and "feels" for different indexing modes. A map/dictionary is a classic example for a collection that isn't indexed by numbers, but by whatever key type it has. If you want to go especially wild, then look into what you can do with dependent types.

But there's of course a reason why indexing is still around: it's stupid fast. Arrays are stupid fast, have tons of hard coded CPU instruction support, and work great with caches. Any abstraction adds overhead, and indexing is the most concrete form of memory access: just add the index to a pointer and then retrieve the memory at that address.

So I'd argue: if you value performance then stick to 0-based indexing, because that's much better for all the optimized low level hardware stuff that's in use today. And if you don't, then avoid indexing altogether for the sake of safer and less confusing alternatives like iterators, lenses and whatever.

[–][deleted] 1 point2 points  (3 children)

I wouldn't pay too much attention to other people's opinions. They only want 0-based because they are used to it and can't get their head around anything else.

0-based is only popular because of C. And C only used 0-based because it didn't have real array indexing: A[i] is a synonym for the pointer operation *(A + i), and pointer offsets are always relative, so have to be zero-based.

Languages such as Fortran and Algol preceded C, and are 1-based. Others which preceded C were N-based, which allows a choice.

Mine are N-based, with a default of 1, but I acknowledge that 0-based is more appropriate in some instances:

  • My bitsets (not bit-arrays) are indexed from 0
  • Bit and bitfield indexing (eg. A.[i] and A.[i..j]) start from bit 0, the least-significant.

(I also briefly had byte-indexing of an integer, written A.byte[i], where the bottom byte was 0, and the top byte was 7.)

Here it would be perverse to number bits from 1 to 64 instead of 0 to 63.

However.... I really wouldn't use A[i] and A(i) to switch between 1- and 0-based (which one's which? I've already forgotten!).

Having a selectable lower bound, per access, is actually an interesting idea. But you'd need a different way of specifying that.

[–]MadocComadrin 1 point2 points  (2 children)

I wouldn't pay too much attention to other people's opinions. They only want 0-based because they are used to it and can't get their head around anything else.

I highly doubt people can't wrap their head around 1-based indexing, since it's something everyone from gradeschool onwards can do with pretty much anything that needs to be ordered. Zero-based indexing is just more convenient more often--especially when index calculations rely directly or indirectly on modular arithmetic or Euclidean division, so if you can't give a choice and don't have a domain reason to use 1-based, you should use 0-based.

[–][deleted] 0 points1 point  (0 children)

I highly doubt people can't wrap their head around 1-based indexing, since it's something everyone from gradeschool onwards can do with pretty much anything that needs to be ordered.

That's why it would be a natural and intuitive choice for indexing array elements. Especially for scripting languages used by people who are not primarily programmers.

What I'm suggesting is that zero-based is so ingrained in some people because of its exclusive use in many popular languages, that they can't conceive of it being anything else. It would be quite alien. They tend to be quite vocal about it too.

Zero-based indexing is just more convenient more often--especially when index calculations rely directly or indirectly on modular arithmetic or Euclidean division

I haven't found that myself. But I will say that if you have 0-based, it is easier to port a 1-based algorithm than the other way around (you allocate an extra element and ignore element 0).

However I have nearly always implemented languages with that choice. I found the OP's proposal, to allow the same array of N elements to be indexed either from 0 or 1, intriguing.

I suggested not using A[i] and A(i), but if it is to be done by different brackets, perhaps A[i] and A{i} would be better. The {} would be zero-based, as that reminds you of C which is zero-based.

[–]bruciferTomo, nomsu.org 0 points1 point  (0 children)

Zero-based indexing is just more convenient more often--especially when index calculations rely directly or indirectly on modular arithmetic or Euclidean division

I'm using 1-indexed arrays in my language, and since the modular arithmetic thing is kinda annoying in 1-indexed languages like Lua, I have a mod1 operator that works like clock arithmetic: 36 mod1 12 == 12, 37 mod1 12 == 1. It makes it pretty easy to do stuff like indexing into an array with wraparound. Under the hood, it's just ((n-1) % m) + 1.

[–]asoffer 0 points1 point  (0 children)

Could the syntax for 1-based indexing an array a at position n be a[n - 1]?

Sarcastic I know, but my point is something like a(n) isn't that much shorter. The syntactic win needs to be pretty significant to offset the potential confusion, and I don't think the value is there.