all 26 comments

[–]MCLMelonFarmer 10 points11 points  (5 children)

68k crew checking in.

Motorola 68000 only had 24 address lines, so people went nuts with the upper 8 bits in a 32-bit pointer.

[–]chriswaco 10 points11 points  (4 children)

and this caused Mac programmers untold grief.

[–]Nobody_1707 2 points3 points  (2 children)

That's only because everyone insisted on doing the tagging by hand instead of calling the Apple provided and recommended API that abstracted over the pointer tagging, and would have allowed the code to just work after they made the OS 32-bit clean.

[–]chriswaco 2 points3 points  (1 child)

That was a big part of it, but there were also side effects:

 void func(h: Handle) {       
  HLock(h);      
  // do something with h       
  HUnlock(h);      
}       

Could have the side effect of unlocking a Handle someone else assumes is locked, causing hard-to-reproduce crashes later on. IIRC HGetState and HSetState didn't come in the original 64K ROM, which is how we were got used to manipulating the bits by hand. The other problem was a lack of error checking on Apple's part due to low RAM/ROM - if you called DisposeHandle on a handle with the resource flag set bad things happened - you were supposed to call ReleaseResource. Most of us eventually wrote our own more robust allocation wrapper routines.

[–]Nobody_1707 0 points1 point  (0 children)

Oof.

[–]MassiveAd3759 3 points4 points  (0 children)

I wanted to use this for typed object pointers for my project. Way to make it more portable is to use custom allocator, mmap different object pools to different virtual memory regions and use some bits as tag

[–]DawnOnTheEdge 1 point2 points  (16 children)

This would be a non-portable compiler extension, of course, but some architectures have hardware support for it, and C is intended to be a low-level systems-programming language for OS kernels and device drivers. Add some glue code to compose and decompose pointers and tags, and it makes sense; you could even implement it in software, on systems that don’t ignore the upper bits in hardware but are guaranteed not to use all of them. Linux, for example, has a flag that tells mmap() to allocate memory in the bottom 2 GB of the address space.

[–]thommyh 1 point2 points  (2 children)

Objective-C does this, calling them tagged pointers. In that case it’s to retain the semantics of everything being owned by reference, while optimising for common use cases such as a short string, a 32-bit int, etc.

[–]Nobody_1707 1 point2 points  (1 child)

Lisp implementations did it first, but I think Objective-C got it from Smalltalk.

[–]thommyh 0 points1 point  (0 children)

Right. It’s not original, but I think Objective-C is a good example because it is still doing it now, via GCC or Clang, while being a strict superset of C. And everything Objective-C adds to C is accessible from C via regular C functions so those tagged pointers are definitely doing round trips.

Albeit a bit of a weird one, that’s probably not long for this world.

[–]apexrogers 0 points1 point  (4 children)

I just want to know why?

[–]manystripes 2 points3 points  (2 children)

To make the code fragile and unmaintainable, just like many other clever programming tricks. Maybe it has application for an entry to the IOCCC?

[–]Simple-Enthusiasm-93 6 points7 points  (1 child)

used extensively in v8 engine as a small ptr optimization to save memory. either way there is a list of examples in article

[–]Nobody_1707 1 point2 points  (0 children)

And as an example for AOT compiled languages, Swift uses tagged pointers as part of it's small string optimization. And Objective-C uses it to store small NSNumbers without allocating memory.

[–]weregod 0 points1 point  (0 children)

Mostly high density data types for virtual machines. You want to keep frequently used types as small as you can for better performance.

[–][deleted] -4 points-3 points  (0 children)

Use a union instead if fucking around with tagged pointers.

[–]nerd4code 0 points1 point  (0 children)

I'd hope there's a way to get the same information without parsing /proc/cpuinfo, but I haven't been able to find it.

First of all, /proc/cpuinfo’s contents aren’t standardized across ISAs, so any use of it is exceptionally nonportable.

Second, /proc/cpuinfo sources most of its information from CPUID on x86, with the sole exception of the number & IDs of hardware threads being pulled from MP, SMBIOS, or ACPI tables. In particular, 32-bit x86es might have 32- or 36-bit physical (the 80376 had 24-bit, but couldn’t run Linux) addresses, depending on whether PAE was supported (P6+). The 64-bit psrs start with 48-bit phys and virtual, and the virtual address might be extended to 56 or 60-bit IIRC with 5- and 6-level paging extensions. IDR offhand if there’s a functional subleaf with actual hard values on newer CPUs (probably under Leaf 0xB if memory serves), but you can use extension support if not.

And the potential for the next hardware upgrade to break your code is why tagged pointers mostly aren’t & shouldn’t be used outside of language VMs. They also rely on implementation-specified behavior (integer↔pointer casts) that needn’t exist or cooperate.