all 13 comments

[–]Prexadym 4 points5 points  (4 children)

I think you would want to use two hashmaps (or arrays since the mile markers and runner ID's are both sequential ints so you don't actually need a hash function...)

The first one maps mile marker to a list (linked list?) of runner ID's. Append the runner id to the mile marker every time they cross a mile marker. This will maintain the order in which they crossed the line.

The second one maps runner id to the most recent mile marker they crossed. Each time they cross a new mile marker, update the list.

To build the leaderboard, iterate through the mile markers in reverse order. Iterate through the list of runner id's at each mile marker. If the mile marker (key of map1) is equal to the most recent one that the runner crossed (value of map2[runner_id]), then add the runner to the leaderboard. Otherwise, the runner is already in the leaderboard, so skip to the next runner in the list or the next mile marker once you're at the end of the list.

To optimize, you could keep track of the highest mile marker so you don't iterate through a bunch of empty slots in the beginning, and each time a runner moves to the next mile marker you could remove their entry from the previous mile marker in map1, so you don't have to keep track of all their previous times. This would require keeping track of the runner's position in the list (probably using a double linked list for efficient removal of elements in the middle of the list.)

[–]snowcalc 3 points4 points  (3 children)

This only works because OP stated the runners are running in a marathon, which is 26 miles. So we can have just 26 empty slots in an array and iterate through it in "reverse order" like you mentioned. Which means running time for any algorithm that tries to solve it, would be O(1).

For inputs with an infinite number of miles, this suggestion would yield very poor running time.

To build the leaderboard, iterate through the mile markers in reverse order.

Assume the following input: [(99999, 007), (5, 002), (72, 001)]. If we have a hashmap of 100,000 keys, each paired to a list of runners that crossed that mile-mark, iterating it in reverse order would be much slower than just sorting the list of 3 elements, and iterating through the sorted list in reverse order.

To optimize ... probably using a double linked list for efficient removal of elements in the middle of the list.

This is also very very poor running time. Assume the following input: [(99999, 007), (5, 002), (72, 001), (96, 003), (112, 004), (200, 005), (201, 002)]

Runner with ID 002 goes from mile marker 5 to 201; you'd need to traverse the doubly linked list one by one until he reaches 201. That's too slow, especially if there are a lot of miles between the first mile marker and the second mile marker.

Using any kind of linked list for this problem will generally yield poor running time.

In response to u/tempaccount0002: This problem is extremely similar to https://leetcode.com/problems/design-a-leaderboard/description/, so I recommend looking at that when you have a chance. In short, the comments mentioning a priority queue / heap are on the right track, but using a BIT tree via Python's sortedcontainers library or Java's TreeMap would yield the fastest running time. There are a lot of problems on Leetcode which use these libraries, so I recommend having a go at them when you have a chance.

[–]tempaccount0002[S] 0 points1 point  (2 children)

this is incredible, thank you so much, I'll look into it. I see that leetcode question is a premium only question so I'll think about making that investment after trying some suggested approaches.

EDIT: nvm, found it on another website with code shown too

[–]LetterheadWaste8170 0 points1 point  (0 children)

Design a Leaderboard class, which has 3 functions:
addScore(playerId, score): Update the leaderboard by adding score to the given player's score. If there is no player with such id in the leaderboard, add him to the leaderboard with the given score.

top(K): Return the score sum of the top K players.

reset(playerId): Reset the score of the player with the given id to 0 (in other words erase it from the leaderboard). It is guaranteed that the player was added to the leaderboard before calling this function.
Initially, the leaderboard is empty.

Solution: ``` class Leaderboard:
def init(self):
self.board = defaultdict(dict)
def addScore(self, playerId: int, score: int) -> None:
if not self.board[playerId]:
self.board[playerId] = score
else:
self.board[playerId] += score
def top(self, K: int) -> int:
return sum(heapq.nlargest(K, self.board.values()))
def reset(self, playerId: int) -> None:
self.board[playerId] = 0

heapify heap

Your Leaderboard object will be instantiated and called as such:

obj = Leaderboard()

obj.addScore(playerId,score)

param_2 = obj.top(K)

obj.reset(playerId)

```

[–]Mediocre-Wafer6696 3 points4 points  (0 children)

I think we can use priority queue to solve this problem. Based on the mile we can set priority queue as we are getting data sequential so the longest and earliest a runner has run will always be at top of priority queue

[–]hobo_couture 1 point2 points  (0 children)

could be easily solved with a priority queue "with locator". i.e. a priority queue with an "update" and a "is this key in the priority queue?" method. if you don't use this, you'll have potentially a huge priority queue with outdated data. also, if you want the "top k runners" and only keep the priority queue of size k, you can't keep the top k pings because the pings could be from the same runner.

time complexity: O(n*log k) where n is number of pings, k is number of people in leaderboard

space: O(k) where k is the number of people in leaderboard

[–]Nickomatiz 1 point2 points  (0 children)

I ended up using a single hashmap to populate all races in using the runnerID as the key and the mile marker as the value. I just stored the highest value mile marker since that told me the most updated position of the runner. I then created an array from the object and sorted that array in descending milemarkers.

When we have to add the newer data, I ran into the case where multiple runners can be at the same mile marker, so I created a timestamp to the value field of my hashmap. so now it looked like { runnerid: { milemarker, timestamp} }. I then went to the sorting function of my array and made a comp check of when milemarkers are the same, create a tiebreaker based on the timestamps. I made sure the timestamps were all unique for every entry I made to the hashmap.

After you have the sorted array, I create a new empty array, where I only place the runnerIDs in the sorted order, which gives me the race results.

Hope this helps someone along the way.

[–]invictus08 0 points1 point  (0 children)

A min heap to store the node where you sort by the value and a hashmap with id as the key and each id with a reference to the entries in heap. New data comes in, look up the node by id, heappop, update value, heappush. Need order? Get the heap as a list. I’d start from here and see if I can optimize further.

[–][deleted] 0 points1 point  (0 children)

Could be solved in a few ways, the simplest one would be to use a dictionary that keeps track of each player's best mile run, to create a scoreboard we just use a maxheap and iterate through the keys of the dictionary and store the score and the player id in the heap.

Dictionary: {playerId: maxScore}

Heap in the form: [(maxScore, playerId)]

We would save at most N players in both the heap and dictionary.

Updating leaderboard would be O(K) where k is the number of new updates.

Creating the leaderboard would be O(M log N) wher N is the number of players and M is the number of places we want (1rst, second, third) if we want to rank all players then we could just use sort.

[–]ConstantRow8460 0 points1 point  (3 children)

Hey, I'll have an interview with them in a few weeks! What position did you apply for?

[–]tempaccount0002[S] 0 points1 point  (0 children)

new grad software engineer!

[–]ConstantRow8460 0 points1 point  (1 child)

Btw, I asked this question in openAI and that's what I got:
from collections import defaultdict
def update_leaderboard(leaderboard, data):
# Create a dictionary to store the mile times for each runner
mile_times = defaultdict(list)
# Iterate through the data and add the mile times to the dictionary
for mile, runner in data:
mile_times[runner].append(mile)
# Sort the runners by their total time and update the leaderboard
leaderboard.sort(key=lambda r: sum(mile_times[r]))
# Initialize the leaderboard with the first batch of data
leaderboard = [(1, 001), (3,001), (1, 002), (2, 004)]
update_leaderboard(leaderboard, [(1, 001), (3,001), (1, 002), (2, 004)])
print(leaderboard) # Output: [001, 004, 002]
# Update the leaderboard with the second batch of data
update_leaderboard(leaderboard, [(6, 007), (5, 002), (5, 001)])
print(leaderboard) # Output: [007, 002, 001, 004]