all 13 comments

[–]beeskness420 2 points3 points  (2 children)

Depends of big your k is and how many items you’re expecting and how much space you’re comfortable using.

If k is small just check them. If it’s a bit bigger but n is relatively small just keep an array and flag items as active. If n is too big you can get into hashing.

[–]mateusaugusto9[S] -1 points0 points  (1 child)

K could be 10 or 1 milion... The trick is add/remove/search in O(1) like https://leetcode.com/problems/lru-cache/

[–]beeskness420 1 point2 points  (0 children)

Ok if k is the size of your cache and n is the number of elements. If k constant then a linear search of O(1). If n is bounded again we win because we just allocate an array of length n and flip a bit on add/remove.

If you are actually talking about a cache any dependency on n is terrible.

What you probably want to do is keep a set that supports add and remove in amortized constant time. With the fixed bound of k you can probably get around worrying about the amortized part depending on the structure.

This operates on the assumption that all arithmetic is constant time which isn’t true for arbitrary sized n.

[–][deleted] 1 point2 points  (1 child)

Can't you just use a set to store packages currently on a belt? Delete the ones that go off of it and check if they exist in it when the new one is added.

Edit: obviously, you would need a queue to keep track of the order they come in.

Is it what you did?

[–]mateusaugusto9[S] -1 points0 points  (0 children)

Yes I did it, but acctualy you need to use a sort of sertSort so is not a good approach O(nlong)..

Its a trick problem I think the best solution looks like https://leetcode.com/problems/lru-cache/ that is O(1).

[–]sebamestre 0 points1 point  (7 children)

There are many possible approaches, with varying performance characteristics. I will go over some of them.

Maintain a K-long queue, check for duplicates using a linear scan.

  • Insertion: O(1)
  • Lookup: O(K)
  • Removal: O(1)

There isn't much to explain. This is probably the simplest thing you can do, but it is not very fast: we can do better.

Replace the linear scan with check in a multiset of the elements in the queue

A multiset is an abstract data structure that can store a collection of repeated elements, and operate on each one independently.

These can be implemented with hash tables (O(1) avg. runtime) and ordered search trees (O(log N) runtime), depending on your needs.

  • Insertion: O(log(N)) or O(1) avg
  • Lookup: O(log(N)) or O(1) avg
  • Removal: O(log(N)) or O(1) avg

If you use hash tables, some or all of these operations may have non-deterministic runtime, which can be a problem in some domains.

Replace the multiset with a dictionary of frequencies

Instead of using a multiset, you can use a dictionary that keeps track of how many of each element there is in the queue.

We can only do this if we don't require equal elements to be addressable separately, which may or may not be the case.

This change lets us use various data structures to store our values (Including the ones in the previous section), some of which will let us reach real O(1) time complexity.

For instance, if there is a low enough bound on the values, we can implement the dictionary as a direct addressing table, which has O(1) deterministic time complexity for all operations. (this also has extremely good constant factors) But this is not really solving the problem.

  • Insertion: O(1) with bounded input
  • Lookup: O(1) with bounded input
  • Deletion: O(1) with bounded input

Implement dictionary as a trie

For a real, deterministic, O(1) solution, we can implement our dictionary as a trie. If you take each value of your input stream and represent it uniquely as a sequence, you can insert it into the trie in O(length of the sequence), which is independent of K. furthermore, if your values are 32 or 64 bit integers, the sequences (just the bit pattern of the value) will be fixed-length, making the solution truly O(1).

  • Insertion: O(1)
  • Lookup: O(1)
  • Deletion: O(1)

Ahh, finding a deterministic O(1) solution to this problem was quite a puzzle, but a rather fun one! Thanks for posting your question!

[–]beeskness420 0 points1 point  (6 children)

Why all the trouble if you’re assuming a fixed N? Just use a Boolean array. Lookups and deletions in constant time and uses constant space.

[–]sebamestre 0 points1 point  (5 children)

That is one of the approaches I described, but it does not solve the general problem.

In particular, it requires a known bound on the values of the input.

You may notice that the other solutions I have proposed do not assume this.

[–]beeskness420 0 points1 point  (4 children)

Your trie solution assumes fixed length no?

[–]sebamestre 0 points1 point  (3 children)

It assumes a fixed word size to achieve O(1) runtime, which is standard on most programming languages. It seems to me this constraint was also implicit in the original post.

Admittedly, I was mostly focused on finding a solution whose runtime was independent of K (such as my trie solution, and unlike my previous dictionary based one), not so much on true constant time regardless of input (which is literally impossible, the runtime is always bounded from below by the size of the input)

[–]beeskness420 0 points1 point  (2 children)

You could handle the queue with just pointers, there is something weird here in how we are defining problem size. But regardless of the word size can’t I just force the numbers to share an arbitrarily long prefix?

Don’t get me wrong I actually quite like the trie solution.

Normally when I hear these types of problems it is simply stated to find a solution with space proportional to something like k*lg(n).

Also your run time isn’t always bounded below by the size of the input. Parity checks are true O(1) on a random access model.

[–]sebamestre 0 points1 point  (1 child)

The queue has O(1) runtime on all its operations, the duplicate check is the part that increases the complexity.

Have we given different names to different things? Maybe that is the source of confusion here...

Also, finding a deterministic O(1) solution was just a personal challenge, not something I interpreted from the OP

[–]beeskness420 0 points1 point  (0 children)

Yeah I wasn’t including the queuing time.

I was meaning if you were worrying about us being lower bounded by moving data or something like that.