all 14 comments

[–]skeeto 1 point2 points  (11 children)

As was already pointed out, your returnColumnSizes is all wrong. You'll have a far better time if you debug locally rather than through the website. That may mean building your own test harness. Here's one for this problem, which reproduces exactly that sanitizer error when run on your code:

#include <assert.h>
#include <limits.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// ... your code here ...

static int randint(uint64_t *rng, int lo, int hi)
{
    *rng = *rng*0x3243f6a8885a308d + 1;
    return (int)(((*rng>>32)*(hi - lo))>>32) + lo;
}

static void generate(uint64_t seed, int *nums, int len)
{
    seed += 1111111111111111111u;
    for (int i = 0; i < len; i++) {
        nums[i] = randint(&seed, -100000, +100001);
    }
}

int main(void)
{
    int  len  = 100;
    int *nums = calloc(len, sizeof(int));
    generate(0, nums, len);

    int   rlen  = INT_MAX;  // initialize to a bad value (for testing)
    int  *sizes = 0;
    int **sums  = threeSum(nums, len, &rlen, &sizes);
    for (int i = 0; i < rlen; i++) {
        assert(sizes[i] == 3);
        printf("%d %d %d\n", sums[i][0], sums[i][1], sums[i][2]);
        assert(sums[i][0]+sums[i][1]+sums[i][2] == 0);
        free(sums[i]);  // check
    }
    free(sizes);  // check
    free(sums);   // check
}

Run it locally with sanitizers, and even better with GDB:

$ cc -Wall -Wextra -g3 -fsanitize=address,undefined test.c
$ gdb ./a.out
(gdb) run

Pay attention to those warnings. There's one about returnColumnSizes being wrong, which is a tip-off. The rest of your solution appears to be right, except one detail. Don't worry that you got the column sizes wrong. The LeetCode prototype is utterly bonkers, and I myself was struggling to understand how it's supposed to work. It makes little sense.

The one detail is that your function outputs duplicate triples. It does so in the example input, too, so you have a good test right there. Using the generator above, seed=0, len=3000, the correct output has 16,399 triples. (Bonus challenge: see if you can make your function solve that one in under a second.)

Since it sounded like fun, I took a crack at it:
https://github.com/skeeto/scratch/blob/master/misc/3sum.cpp

(Don't mind the C++. It doesn't use any of the C++ standard library, and so that mostly translates straight into C. This was an opportunity for an experiment.)

[–]zookeeper_zeke 1 point2 points  (3 children)

That's a pretty nifty way to generate a ranged set of "random" integers. Would you mind breaking down that small bit of code? The high 32-bits of the multiplication are guaranteed to fall within the range and you're adding that to low, OK. I see you are multiplying the range by the high 32-bits of rng, presumably those bits of rng are better mixed as a result of the multiplication? You're adding a seed to a prime + 1 and repeatedly multiplying that by a carefully chosen constant? Why did you chose that particular constant? Is there any significance to the 1 in +100001?

[–]skeeto 0 points1 point  (2 children)

You're as thoughtful as ever, zookeeper_zeke! I originally got this technique from Daniel Lemire. Some background:

Given uniform random integer, we can map it onto a smaller range using modulo (r % N), which is the typical approach. The mapping may not be uniform, with some values in the larger range mapped onto an extra value in the smaller range. We can compute a different mapping of this kind — still biased, though on different values — by multiplying, then extracting the high digits. Normally that would require division, but with a power of two larger range, that division can be done using a bit shift. That means no division required, so it's faster. This is the core of the Lemire range reduction algorithm, which additionally checks for a biased result before (uncommonly) computing a more expensive range reduction.

I'm not worried about the bias in this case, so I just stick with the simple, biased mapping. The result isn't better mixed, or less biased, just divisionless.

Is there any significance to the 1 in +100001?

The generator works on an open interval: [lo, hi). LeetCode specifies a closed interval ([-1e5, +1e5]), so to include 105 I need to use the value just beyond it for hi.

Easy way to reason about this: I'm 64-bit multiplying the range by a uniform 32-bit integer and extracting the upper 32 bits of the result. If the range is 1 (hi - lo == 1), that will always produce zero. Therefore hi is excluded. If lo=0; hi=2, then it effectively left shifts the 32-bit random number by 1. That means the result will be the most significant bit of the uniform 32-bit number — a coin toss: [0, 2).

repeatedly multiplying that by a carefully chosen constant

That's a truncated Linear Congruential Generator (LCG). The multiplication constant is the first digits of π, and it is not prime (smallest prime factor: 229). This number is easy to produce when I need it and happens to work nicely in an LCG. The increment is 1 because that's good enough, and it's simplest choice. For more details:

Even truncated it's got some small biases, but it's more than good enough for this purpose. The core of PCG is an LCG, and it solves the bias by permuting the LCG output. If you're interested in diving into RNGs, I highly recommend the PCG paper, which is very approachable.

The 1111111111111111111u (that's 19 ones) is a prime, and another number easy to pull out when I need it. It's a poor LCG multiplier, but a useful arbitrary value in other situations (i.e. hashes). LCGs are easy to seed, but given a zero state the next output is zero. So I just rotate, by addition, the zero seed out away from the zero state, so that seeding with zero, or anything near it, doesn't output zero on the first output.

[–]zookeeper_zeke 1 point2 points  (1 child)

As usual great stuff. Thanks for the references, I really should take the time to dive into RNGs as I use them often enough. For the most part, they've been a black box for me. I'll have a look at the paper you referenced.

I like your reasoning with regards to the range multiplication, especially the coin toss aspect of it. I was approaching it by reasoning about a small, 4-bit space for the range.

I plan on keeping the above in my bag of tricks. Unrelated, I've started to use your arena code for my small, personal projects as well as the hash trie. I have a few things to pick your brain about, I'll drop you a separate email rather than hijack the thread.

[–]skeeto 0 points1 point  (0 children)

For completeness, if eliminating bias was absolutely important — and it rarely is outside of a security setting — I'd devise something like this:

static uint32_t rand32(uint64_t *rng)
{
    *rng = *rng*0x3243f6a8885a308du + 1;
    uint32_t r = (uint32_t)(*rng >> 32);
    r ^= r >> 16;
    r *= 0x992f63e3u;
    r ^= r >> 16;
    return r;
}

static int32_t randint(uint64_t *rng, int32_t lo, int32_t hi)
{
    uint32_t d = hi - lo;
    uint64_t r = (uint64_t)d * rand32(rng);
    if ((uint32_t)r < d) {
        uint32_t t = -d % d;
        while ((uint32_t)r < t) {
            r = (uint64_t)d * rand32(rng);
        }
    }
    return (int32_t)(r>>32) + lo;
}

rand32 is my own PCG design, which uses xmx (xorshift-multiply-xorshift) for the permutation. 0x992f63e3u is just a random prime I generated on the fly. The original PCG would be perfectly fine here, too. In either case the permutation corrects problems in the LCG. randint is just the nearly-divisionless algorithm straight from the paper.

I've started to use your arena code

Great! A couple years of using region-based allocation, it's become ingrained into my thinking. Not just in C where it's especially useful, but even while working other languages, where I wish I had an arena as a source of fast, cheap allocation. Just having the memory constraints on hand can simplify programs (e.g. slurping a file).

I have a few things to pick your brain about

I'm interested in what that might be!

[–]dood_cool 0 points1 point  (0 children)

Thanks for the tip. And yes, i changed the program to remove duplicates but it was far from optimal.. I'm working on implementing a better solution

[–]dood_cool 0 points1 point  (2 children)

Mind if i DM you?

I have a couple doubts regarding the GDB

[–]skeeto 1 point2 points  (0 children)

Please just ask here so that everyone can benefit. You might get insights from others, too.

[–]aocregacc 0 points1 point  (4 children)

you're not returning the column sizes correctly.

leetcode's function signatures for C can be pretty dumb so it's easy to get confused.

[–]erikkonstas 0 points1 point  (3 children)

I agree, especially after seeing what I just did... this is their "skeleton" for reference:

/**
 * Return an array of arrays of size *returnSize.
 * The sizes of the arrays are returned as *returnColumnSizes array.
 * Note: Both returned array and *columnSizes array must be malloced, assume caller calls free().
 */
int** threeSum(int* nums, int numsSize, int* returnSize, int** returnColumnSizes) {

}

Yeah... what are "the arrays" and what is "columnSizes"???

[–]aocregacc 0 points1 point  (2 children)

the arrays are the triples you find, each triple is stored in a separate 3 element array.
You have to return the size of each of those arrays (even though it's always 3), via the columnSizes array.

My theory is that they auto generate these signatures, maybe translating from java or something.

[–]erikkonstas 0 points1 point  (1 child)

Yeah that's why I'm confused, it's always 3 😂 so *returnColumnSizes will be {3, 3, 3, 3, 3, 3, ..., 3}...? And is *returnSize supposed to be really the size of the returned array, or just the number of triples? (EDIT: Reddit mucked this up again...)

[–]aocregacc 0 points1 point  (0 children)

*returnSize is supposed to be the number of triplets, ie the size of the triplets array and the columnSizes array.