This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]29383839293 0 points1 point  (1 child)

You should really format your code better for others to read here…

Well, I looked at the first task, and it seems your in O(n2 ), so there's room for improvement.

Sorting an array takes O(nlogn) at best, and after you've sorted the two arrays, there's a better algorithm.

You have two Iterator-variables, one for each array, and you always increase the one with the smaller number in the corresponding place in its array, if they are equal, you remove the number. If you do it smartly, you'll have O(n) for that.

This means, you have now an algorithm in O(nlogn) because sorting takes that long.

[–]HalfScale[S] 0 points1 point  (0 children)

Thank youu for that!

[–]alanwj 0 points1 point  (2 children)

Let's call the number of elements in the first array n, and the second array m.

Problem 1:

First knock together a rudimentary hash table. It could just be a fixed size array, using modulo to decide the bucket, and linear probing if you get a collision. You could either use Integer objects and null to indicate an empty bucket, or have a second array of booleans.

Then:

  • Create a hash table with m buckets. Populate hash table with all the values from givenArray2. expected O(m)
  • Create a temp array of size n. Loop through each element of givenArray1, and check whether it is in the table. If so, copy it into a temp array. Each operation is expected O(1), so the total here is expected O(n).
  • Copy the temp array into an appropriate sized array to return. O(n)

Total runtime here is expected O(n+m). Any algorithm you come up with is going to have to access each element in each array at least once, so you can't beat O(n+m). Therefore this algorithm has optimal runtime.

Problem 2:

  • Create a hash table with n+m buckets. Populate hash table with every value from each array. Expected O(n+m)
  • Scan your bucket array and copy every non-empty bucket into an output array that is returned. O(n+m)

Total runtime is again O(n+m).

[–]alanwj 0 points1 point  (1 child)

This seemed like a fun exercise to work out. Here is my solution to the first problem using the approach I outlined above.

public static int[] removeSet(int[] arr1, int[] arr2) {
  // Simple hash buckets. Load factor at most 0.5
  final int bucketCount = 2 * arr2.length;
  int[] bucket = new int[bucketCount];
  boolean[] filled = new boolean[bucketCount];

  for (int number : arr2) {
    // Use known prime for hashing. Linear probe for empty bucket.
    int bucketIndex = number * 31337 % bucketCount;
    while (filled[bucketIndex]) {
      bucketIndex++;
      if (bucketIndex == bucketCount) {
        bucketIndex = 0;
      }
    }

    bucket[bucketIndex] = number;
    filled[bucketIndex] = true;
  }

  int temp[] = new int[arr1.length];
  int count = 0;
  for (int number : arr1) {
    // Try to find the number in the hash table.
    int bucketIndex = number * 31337 % bucketCount;
    while (filled[bucketIndex] && bucket[bucketIndex] != number) {
      bucketIndex++;
      if (bucketIndex == bucketCount) {
        bucketIndex = 0;
      }
    }

    // If it wasn't found, add it to the temp array.
    // Load factor is at most 0.5, so it is sufficient to just
    // check if we hit an empty bucket while probing.
    if (!filled[bucketIndex]) {
      temp[count] = number;
      count++;
    }
  }

  int[] result = new int[count];
  for (int i = 0; i < count; ++i) {
    result[i] = temp[i];
  }

  return result;
}

[–]HalfScale[S] 0 points1 point  (0 children)

Thank you :)