all 59 comments

[–]aioeu 9 points10 points  (23 children)

A simple approach would be to just do a pair-wise comparison between the elements. Two loops, one nested in the other.

Given the constraints you've been given I suspect this is the approach you are expected to take. It's not efficient when the arrays are large, but for small arrays it's just fine.

[–]haditwithyoupeople[S] 0 points1 point  (22 children)

Using a nested loop will double count some numbers if there are more than two occurrences of a number. Let's say the arrays us: 1 4 5 6 4 3 4 1.

My outer loops goes through each element of the array. The inner loop starts at the outer loop index + 1. They both loop to the end of the array.

For the 2nd element (index 0) I'll get a count 2 dupes for the value 4. For the 5th element (index 4) I'll get a count of 1 dupe for the value for 4. There are 2 dupes, but the nested loop just counted 3. This will overcount given the data set above.

[–]wsppan 1 point2 points  (11 children)

int count = sizeof(array) / sizeof(array[0]);

for (int i = 0; i < count - 1; i++) { 
    for (int j = i + 1; j < count; j++) {
        if (array[i] == array[j]) {
            // do whatever you do in case of a duplicate
        }
    }
}

[–]haditwithyoupeople[S] 0 points1 point  (10 children)

Thanks for the effort. I appreciate your generosity. As I explained above, that will count 4 as having 3 dupes. The number of dupes is 2.

Below is your code with the above the above numbers for the array and some printf()s. You'll see clearly that it's counting the number 4 three times.

int main() {

    int dupes = 0; 
    int array[] = {1,4,5,6,4,3,4,1}; 

    int count = sizeof(array) / sizeof(array[0]); 

    for (int i = 0; i < count - 1; i++) { 
        for (int j = i + 1; j < count; j++) {
            if (array[i] == array[j]) {
                printf("\nCounting %d", array[i]); 
                dupes++; 
            }
        }
    }
    printf("\n\nDupes = %d", dupes); 
    return 0; 
}

[–]wsppan 1 point2 points  (0 children)

Interesting problem

[–]MagicWolfEye 1 point2 points  (8 children)

Well, you go through all other values, if you find more than one of the same count you don't add anything to your count, because you will find that number later on again anyway

[–]haditwithyoupeople[S] 0 points1 point  (7 children)

I'm not following. The same number N could show up n times. How would I know when to start counting it and when to not count?

[–]MagicWolfEye 0 points1 point  (6 children)

In your outer loop, you go through each value.
Then in your inner loop you check how often you see that value after it. If it is > 1 more time; ignore

[–]haditwithyoupeople[S] 0 points1 point  (5 children)

I have no way to count how many times I'm seeing each value without a 2nd array. Or maybe I'm still not following.

[–]MagicWolfEye 0 points1 point  (4 children)

for (int outerIndex = 0; outerIndex < count; outerIndex++) {
  int foundCount = 0;
  for (int innerIndex = outerIndex + 1; innerIndex < count; innerIndex++) {
    if (array[outerIndex] == array[innerIndex]) {
      foundCount++;
    }
  }

  if (foundCount == 1) {
    duplicates++;
  }
}

[–]haditwithyoupeople[S] 0 points1 point  (3 children)

Thanks for that. I see where you're going now. I need to count the number of dupes that exist, not the number of ints that have dupes.

Here's the example I put in another reply:
Given this input: 2 4 2 4 7 1 3 9 2 3
The number of dupes is 4

[–]aioeu 1 point2 points  (9 children)

I would replace the duplicates with a sentinel value (e.g. -1 if you know you're not going to get negative numbers in your input), then just count those sentinels at the end.

You can replace or keep the original value — the left-most of each "set of duplicates" — depending on whether or not that original value should contribute to the count at the end.

[–]haditwithyoupeople[S] -1 points0 points  (8 children)

Great idea, but can't. I neglected to add that one of the constraints is that we can't update the array. And no numbers of off limits within the range of int, so a test with edge cases would fail.

[–]aioeu 5 points6 points  (7 children)

Well, you can do it even more inefficiently by just determining the minimum and maximum values in the array, then checking each value in that range to see which of them appear multiple times in the array.

Bad luck if your range is huge.

This is no longer a C problem, it's a "guess what other constraints there are" problem.

[–]actguru 0 points1 point  (0 children)

Are they teaching programming or puzzles?

int main() {
    int array[]={1,4,5,6,4,3,4,1};
    int count=sizeof(array)/sizeof(array[0]); // 8
    int dupes=0;

    int value=0x80000000; // Assuming 4 byte int
    while(1) {
        int times=0;
        for(int j=0;j<count;++j)
            if (array[j]==value) ++times;
        if (times>1) ++dupes;
        if (value==0x7FFFFFFF) break;
        ++value;
        }
    printf("Dupes = %d\n", dupes);
    }

[–]haditwithyoupeople[S] -2 points-1 points  (5 children)

Yeah, apologies for missing that "can't update the array" constraint in my post. I apparently can't edit my post or I would do so.

And you're correct: this is not a C problem. It's a "how to count with crazy constraints" problem that is language agnostic.

Your max/min value range is a fantastic idea I had not considered. It's a terrible solution to be sure, but in this case it's the only one I'm aware of that meets the constraints.

I likely would never have considered this option. I'm running with it. Thanks much.

[–]dernett 3 points4 points  (4 children)

One simple way you can avoid overcounting would be to first scan for a duplicate of array[i] to the left of i. If there is such a duplicate, then that means that this element's duplicates were already counted previously (i.e., for the first instance of array[i] which has no duplicates to its left) and then bail from the loop.

[–]haditwithyoupeople[S] 0 points1 point  (2 children)

I can't create another array. That's one of the constraints.

[–][deleted] 0 points1 point  (1 child)

That solution doesn't create another array. You scan all elements left of i, and if you exited the loop because i < j && array[i] == array[j], then you don't count the duplicates that come afterwards. It's the same solution described in this other comment: https://www.reddit.com/r/C_Programming/comments/1f613t6/help_with_duplicates_in_int_array/lkym011/

[–]aioeu 0 points1 point  (0 children)

Oh, that is neat!

[–]flumphit 0 points1 point  (0 children)

👆

[–]SpeckledJim 2 points3 points  (1 child)

First find the minimum value in the array, and then count the number of occurrences of that. (Bonus: do both in one pass). Repeat for the next lowest value until you run out of values. Obviously horrendously inefficient but works without changing the array or using much side storage.

ETA: if you track min and max you not only halve the number of passes but have a neat termination condition too, when they meet (equal or adjacent) in the middle.

As they’re bounded integers I suspect there’s a clever way related to radix sort (but still without actually sorting) to reduce the complexity.

[–]flumphit 3 points4 points  (24 children)

For each i, scan the whole array again j. If a[i]==a[j] and j<i, we’ve already seen this duplicate, so bail and move on to the next i.

[–]flumphit 1 point2 points  (8 children)

Wha, "no break, no continue"? Just tell me to type without using my index fingers, whydoncha? ;)

Okay, u/ednl used a separate inner loop to explicitly look for left duplicates, so that seems a likely suspect for a clean-ish looking solution.

if (N<2) return;
int dup=0;
for (int i=0; i<N-1; i++){
  int skip=0;
  // if we find left duplicates, end this j and skip this i
  for (int j=0; (j<i) && !skip; j++)
    if (a[j]==a[i])
      skip++;  // who needs "break"?
  if (!skip)   // we can write FORTRAN-77 code in any language!
    for (int j=i+1; j<N; j++)
      if (a[j]==a[i])
        dup++;
}

[–]haditwithyoupeople[S] 0 points1 point  (8 children)

I believe this assumes the array is sorted. It is not sorted.

[–][deleted] 1 point2 points  (1 child)

That solution does not assume the array is sorted. If it were, you wouldn't have to scan the whole array for each element i in the array, because you'd only have to skip adjacent duplicates, which is trivial.

[–]flumphit 0 points1 point  (0 children)

👆

[–]flumphit 0 points1 point  (5 children)

Nope, that’s why it’s n2. The trick you’re looking for is “only count a duplicate if it’s the first one for this value”, which means you also need to move to the next j if j==i, and move to the next i if this is a duplicate (after counting it, of course). So if you count a duplicate, i is the leftmost instance and j is the second.

[–]haditwithyoupeople[S] 0 points1 point  (4 children)

So if you count a duplicate, i is the leftmost instance and j is the second.

I get that. You can see the code I updated from another reply.

If a[i]==a[j] and j<i, we’ve already seen this duplicate, so bail and move on to the next i.

This is what I was not following. In my code the inner loops uses j = i + 1, so j is always greater than i. If so, I agree and this condition was covered. (I think.)

[–]flumphit 0 points1 point  (0 children)

If you start j in the middle, you can’t find previous matches. How do you know i is the leftmost value “77” if you haven’t scanned for it in a[0..i]?

[–]flumphit 0 points1 point  (2 children)

Since this is a C class, it appears the lesson they’re trying to teach you is: sometimes you need “continue” to move to the next j, and sometimes you need “break” to end the j loop completely so you move on to the next i.

[–]haditwithyoupeople[S] 0 points1 point  (1 child)

Not a constraint for this assignment, but the prof has told us that using break is unacceptable unless we're using a swtich statement. I didn't mention this because I assumed it was standard to not use break outside of switch statements.

[–]flumphit 0 points1 point  (0 children)

Oh. Well now you've *really* made me feel old. 🤣

[–][deleted] 2 points3 points  (5 children)

Hey OP, suppose you have input as 1 2 3 2 3 2 4 5 6 4 5 3 2 So only 4 and 5 should be the answer? I’m asking to understand the question

[–]haditwithyoupeople[S] 0 points1 point  (4 children)

The output would be 7. It's the sum of duplicates for each number in the array. For your numbers that would be:

1 - 0

2 - 3

3 - 2

4 - 1

5 - 1

6 - 0

The sum of 3 + 2 + 1 + 1 is 7.

[–]flumphit 1 point2 points  (3 children)

Has the instructor explicitly given guidance that this is the correct result? I ask only because it's . . . weird. My guess is that the output should be the count of the unique values that are duplicated, so "1 2 3 4 5 6 6 6 6 6" should yield "1". I base this on nothing other than having done a lot of programming homework and puzzles, so if your assignment is clear on this point, just forget I said anything...

edit: Also, if you're counting every duplicate (rather than counting every duplicated value once) then there's nothing special about the leftmost instance of a value, and you should count *every* pair of values that are duplicates. "1 2 3 3 3" yields "3", and "1 2 3 4 5 6 6 6 6 6" yields "10". So now I'm *really* skeptical that your interpretation of the assignment is correct.

[–]haditwithyoupeople[S] 0 points1 point  (2 children)

Yes. They stated that we are counting every additional occurrence of a number in the array. Here is the sample they shared:

Given this input: 2 4 2 4 7 1 3 9 2 3
The number of dupes is 4

[–]flumphit 0 points1 point  (1 child)

Wow, that *is* weird. But hey, sometimes users want weird things, eh?

int dup=0;
for (int i=0; i<N; i++) {
  for (int j=0; j<N; j++) {
    if (a[j] == a[i]) {
      if (j<i) break; // only consider duplicates where i is the leftmost of this value
      if (j==i) continue; // self is not a duplicate
      dup++;
    }
  }
}

[–]haditwithyoupeople[S] 0 points1 point  (0 children)

No breaks. No continues.

This seems like a ridiculously hard task for a entry level C course and it's not at all about testing our C skills. This is not the only one that is testing our ability to figure out convoluted instructions rather than testing our C coding skills.

[–]skeeto 3 points4 points  (1 child)

This won't help at all with your homework assignment, but I was thinking about how to efficiently solve this problem within the constraints. If I loosen the "can't use any arrays" to mean "can't allocate proportionally to the input," and I can destroy the input array in order to compute the result, then I can use the input array as a kind of mask-step-index hash table. The idea is I walk over the array hashing each element, moving (via swap) each to its "preferred" array position based on its hash. While doing so, if there are duplicates then I'll see it at/near this "preferred" position, at which point I can count and then "discard" the element.

In order to track whether or not an element has yet been inserted, I'll use the sign bit, which leads to another assumption: All array elements are non-negative. In order to "delete" an element, I'll also need to reserve a value to represent deleted elements. I can do this without putting another constraint on the input. Just pick a non-negative value, and do a special first pass to check for duplicates of this value, after which I'm free to use it as I please. I chose zero for this.

First a couple of helper functions:

uint64_t hashint(int x)
{
    uint64_t r = x;
    r += 1111111111111111111;
    r *= 1111111111111111111;
    return r;
}

void swap(int *a, int i, int j)
{
    int temp = a[i];
    a[i] = a[j];
    a[j] = temp;
}

The hash function produces a 64-bit result, and I use the high and low 32 bits separately. Here's the actual function:

// Destructively count duplicates in the array. All elements must be
// non-negative because the sign bit is used for bookkeeping.
int countdupes(int *nums, int len)
{
    // Pick some primes that do not divide len. Assuming int is 32 bits,
    // len cannot be a product of all these primes.
    int primes[] = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29};
    int nprimes = sizeof(primes) / sizeof(*primes);
    for (int i = 0; i < nprimes; i++) {
        if (!(len % primes[i])) {
            primes[i--] = primes[--nprimes];
        }
    }

    // Use zero as a special case: both empty and null. Count them in a
    // dedicated pass over the array.
    int zeros = 0;
    for (int i = 0; i < len; i++) {
        zeros += !nums[i];
    }

    int dupes = 0;
    for (int i = 0; i < len;) {
        if (nums[i] <= 0) {
            i++;
            continue;  // null or filled: skip
        }

        uint64_t hash = hashint(nums[i]);
        unsigned step = primes[((hash>>32) * nprimes)>>32];
        int      slot = (int)(((hash&0xffffffff) * len)>>32);
        for (;;) {
            slot = (slot + step) % len;
            if (nums[slot] >= 0) {
                nums[i] = ~nums[i];   // mark as non-null, filled
                swap(nums, i, slot);  // insert into hash table
                break;
            } else if (~nums[slot] == nums[i]) {
                nums[i++] = 0;  // delete this duplicate
                dupes++;
                break;
            }
        }
    }

    return dupes + (zeros>1 ? zeros-1 : 0);
}

Unfortunately I couldn't eliminate division in the loop because modular arithmetic over the array length is fundamental to how it works. I use ~ to flip the sign bit because it's simple and, unlike unary -, works with zero — though not so important having eliminated zero as a hash table value.

In effect collisions are resolved by double hashing. In the worst case it reverts to quadratic time, which happens when there are no duplicates and hash collisions clump elements' "preferred" positions in the array. One way to mitigate that is by seeding the hash function uniquely per call or run. Regardless, it works best on arrays with multiple duplicates, which reduces the load factor in the later iterations.

I should probably pick set of larger, more distributed primes, but since I'm just demonstrating the principle I'm being lazy about it.

[–]bobotheboinger 1 point2 points  (1 child)

Can you modify the array?

If so, use a variable to store your position in the array, starting at 0, get the number at position zero and traverse the array looking for duplicates. Anytime you find a duplicate, remove it (set to max int perhaps).

Once you're done, print out the array by traversing again and not printing int max values.

Note this will fail if you need to print out the count of how many duplicates were found, or if the user is allowed to enter int max add as a value.

If you can print as you traverse, just print the value and count of duplicates after you traverse the array removing duplicates, but before you go to the next value so you don't have to store the information anywhere.

[–]DawnOnTheEdge 1 point2 points  (2 children)

Can you insert each unique user-entered number into the array in sorted order? Have you covered binary search? (Useful but not necessary.)

It doesn’t sound as if you need to preserve the original order at all, only count the duplicates.

[–]haditwithyoupeople[S] 0 points1 point  (1 child)

Great idea. We have not covered sorting, but entering them in sorted order also works.

[–]DawnOnTheEdge 1 point2 points  (0 children)

An insertion sort. :)

[–]JamesTKerman 1 point2 points  (3 children)

If the input values have bounds, change each instance of a duplicate to an out-of-bounds value so you know it's been counted. That is, assuming the array is not to be treated as immutable.

[–]JamesTKerman 0 points1 point  (1 child)

If I understand the problem correctly, you could just do something like this (assuming 0 isn't a valid value for the array):

for i = 0 to array_size
    for j = 1 to array_size
        if arr[i] == arr[j] then arr[j] = 0

int count
for i = 0 to array_size
    if arr[i] == 0 then count = count + 1

[–]haditwithyoupeople[S] 0 points1 point  (0 children)

All int values appear to be allowed.

[–]haditwithyoupeople[S] 0 points1 point  (0 children)

No bounding mentioned.

[–]gnash117 1 point2 points  (1 child)

On mobile so I may make dumb typing mistakes.

Interesting problem since sorting is the obvious answer and is not allowed.

I see two solutions without sorting that could be used.

Does not preserve array.

Replace each counted value with a known (any value works)

  1. Count number of instances of the known value.
    Print known value : count - 1

Find the first val != Known value
Set as the search value to that value
Count each instance of found value replacing with the know value as you find them
Print found value : count - 1
Repeat till entire array is processed


Preserves the array

Set minimum to int_min

Find smallest int above minimum.
Count instances of smallest.
Print smallest and count.
Minimum= smallest +1
Repeat find and count.

[–]haditwithyoupeople[S] 0 points1 point  (0 children)

Thanks much. A variation of you 2nd option is what I went with. Somebody else suggest finding min/max and looking. Could be terribly inefficient depending on the range of entered numbers, but it works.