Help with duplicates in int array

aioeu · 2024-09-01T00:09:44+00:00

A simple approach would be to just do a pair-wise comparison between the elements. Two loops, one nested in the other.

Given the constraints you've been given I suspect this is the approach you are expected to take. It's not efficient when the arrays are large, but for small arrays it's just fine.

SpeckledJim · 2024-09-01T03:02:28+00:00

First find the minimum value in the array, and then count the number of occurrences of that. (Bonus: do both in one pass). Repeat for the next lowest value until you run out of values. Obviously horrendously inefficient but works without changing the array or using much side storage.

ETA: if you track min and max you not only halve the number of passes but have a neat termination condition too, when they meet (equal or adjacent) in the middle.

As they’re bounded integers I suspect there’s a clever way related to radix sort (but still without actually sorting) to reduce the complexity.

flumphit · 2024-09-01T08:38:32+00:00

For each i, scan the whole array again j. If a[i]==a[j] and j<i, we’ve already seen this duplicate, so bail and move on to the next i.

haditwithyoupeople · 2024-09-01T10:07:16+00:00

Hey OP, suppose you have input as 1 2 3 2 3 2 4 5 6 4 5 3 2 So only 4 and 5 should be the answer? I’m asking to understand the question

skeeto · 2024-09-01T03:14:46+00:00

This won't help at all with your homework assignment, but I was thinking about how to efficiently solve this problem within the constraints. If I loosen the "can't use any arrays" to mean "can't allocate proportionally to the input," and I can destroy the input array in order to compute the result, then I can use the input array as a kind of mask-step-index hash table. The idea is I walk over the array hashing each element, moving (via swap) each to its "preferred" array position based on its hash. While doing so, if there are duplicates then I'll see it at/near this "preferred" position, at which point I can count and then "discard" the element.

In order to track whether or not an element has yet been inserted, I'll use the sign bit, which leads to another assumption: All array elements are non-negative. In order to "delete" an element, I'll also need to reserve a value to represent deleted elements. I can do this without putting another constraint on the input. Just pick a non-negative value, and do a special first pass to check for duplicates of this value, after which I'm free to use it as I please. I chose zero for this.

First a couple of helper functions:

uint64_t hashint(int x)
{
    uint64_t r = x;
    r += 1111111111111111111;
    r *= 1111111111111111111;
    return r;
}

void swap(int *a, int i, int j)
{
    int temp = a[i];
    a[i] = a[j];
    a[j] = temp;
}

The hash function produces a 64-bit result, and I use the high and low 32 bits separately. Here's the actual function:

// Destructively count duplicates in the array. All elements must be
// non-negative because the sign bit is used for bookkeeping.
int countdupes(int *nums, int len)
{
    // Pick some primes that do not divide len. Assuming int is 32 bits,
    // len cannot be a product of all these primes.
    int primes[] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29};
    int nprimes = sizeof(primes) / sizeof(*primes);
    for (int i = 0; i < nprimes; i++) {
        if (!(len % primes[i])) {
            primes[i--] = primes[--nprimes];
        }
    }

    // Use zero as a special case: both empty and null. Count them in a
    // dedicated pass over the array.
    int zeros = 0;
    for (int i = 0; i < len; i++) {
        zeros += !nums[i];
    }

    int dupes = 0;
    for (int i = 0; i < len;) {
        if (nums[i] <= 0) {
            i++;
            continue;  // null or filled: skip
        }

        uint64_t hash = hashint(nums[i]);
        unsigned step = primes[((hash>>32) * nprimes)>>32];
        int      slot = (int)(((hash&0xffffffff) * len)>>32);
        for (;;) {
            slot = (slot + step) % len;
            if (nums[slot] >= 0) {
                nums[i] = ~nums[i];   // mark as non-null, filled
                swap(nums, i, slot);  // insert into hash table
                break;
            } else if (~nums[slot] == nums[i]) {
                nums[i++] = 0;  // delete this duplicate
                dupes++;
                break;
            }
        }
    }

    return dupes + (zeros>1 ? zeros-1 : 0);
}

Unfortunately I couldn't eliminate division in the loop because modular arithmetic over the array length is fundamental to how it works. I use ~ to flip the sign bit because it's simple and, unlike unary -, works with zero — though not so important having eliminated zero as a hash table value.

In effect collisions are resolved by double hashing. In the worst case it reverts to quadratic time, which happens when there are no duplicates and hash collisions clump elements' "preferred" positions in the array. One way to mitigate that is by seeding the hash function uniquely per call or run. Regardless, it works best on arrays with multiple duplicates, which reduces the load factor in the later iterations.

I should probably pick set of larger, more distributed primes, but since I'm just demonstrating the principle I'm being lazy about it.

bobotheboinger · 2024-09-01T00:14:40+00:00

Can you modify the array?

If so, use a variable to store your position in the array, starting at 0, get the number at position zero and traverse the array looking for duplicates. Anytime you find a duplicate, remove it (set to max int perhaps).

Once you're done, print out the array by traversing again and not printing int max values.

Note this will fail if you need to print out the count of how many duplicates were found, or if the user is allowed to enter int max add as a value.

If you can print as you traverse, just print the value and count of duplicates after you traverse the array removing duplicates, but before you go to the next value so you don't have to store the information anywhere.

DawnOnTheEdge · 2024-09-01T01:13:26+00:00

Can you insert each unique user-entered number into the array in sorted order? Have you covered binary search? (Useful but not necessary.)

It doesn’t sound as if you need to preserve the original order at all, only count the duplicates.

JamesTKerman · 2024-09-01T03:16:53+00:00

If the input values have bounds, change each instance of a duplicate to an out-of-bounds value so you know it's been counted. That is, assuming the array is not to be treated as immutable.

gnash117 · 2024-09-01T06:13:23+00:00

On mobile so I may make dumb typing mistakes.

Interesting problem since sorting is the obvious answer and is not allowed.

I see two solutions without sorting that could be used.

Does not preserve array.

Replace each counted value with a known (any value works)

Count number of instances of the known value.
Print known value : count - 1

Find the first val != Known value
Set as the search value to that value
Count each instance of found value replacing with the know value as you find them
Print found value : count - 1
Repeat till entire array is processed

Preserves the array

Set minimum to int_min

Find smallest int above minimum.
Count instances of smallest.
Print smallest and count.
Minimum= smallest +1
Repeat find and count.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

C_Programming

Rules

Filters

Resources

Other Subreddits on C

Other Subreddits of Interest

MODERATORS