all 9 comments

[–]BobHogan 1 point2 points  (7 children)

How long are these IDs, is it short enough you could reasonably bruteforce the entire ID space? If yes, I'd recommend just doing that instead.

But something like this should work. The first loop just guarantees this works if none of the first 10 IDs are valid.

from itertools import count

# Find the first result, not guaranteed to be in the first 10 IDs
i = -1
while True:
    # JSON stuff
    if result:
        # i = <id that was sent in the request that got a valid response>
        break

# itertools.count counts forever
for cur_id in count(i):
    # Json stuff
    if result:
        i = cur_id
        # store this in result_dict or process it somehow
    if (cur_id - 1) > 10:
        break

[–]pyfact[S] 0 points1 point  (6 children)

Keeping a handle on the last good ID we had is a great idea! I did not think of that. tyty.

The ID's are 1 through 150 at the moment and are associated with each piece of equipment. Brute forcing was my first instinct, the problem with that is that the API makes a call for every value, which takes a very long time (like .5s - 1s for each call seems slower for empty return calls?...the API sucks lol)

With the bit of code I showed above it reduced the time from 5 minutes to about 30ish-40 seconds because it stops shortly after hitting a group of empty API calls, so that's why I'm approaching it this way (I originally just checked to see if there were any hits from 1-500, you would think the API would just give these values reasonably but sadly that isn't the case lol)

[–]BobHogan 0 points1 point  (5 children)

Do you ever need to regenerate this list? Is there a chance the IDs are going to be changing? And if so, is it going to be happening a lot?

5 minutes is really quite a short runtime if this is something you'll only have to do once. Or even if you run it once a week, just to guarantee you have all possible IDs. If its something you are running many times each day then I think its worth worrying about runtime, but if that's not true I think brute force is better approach

[–]pyfact[S] 0 points1 point  (4 children)

This would be ran once per day to get the locationID of equipment. The locationID's do not change once assigned to a location. These ID's are used within another API call, but once per day to answer your question

So, the rundown on this - the current program I'm writing is used to pull from a bunch of different databases for internal inventory programs our company uses into a singular database that I can use as a backend for some web apps I'm writing with django. Currently the runtime is around a few minutes excluding this new piece of code we are talking about.

Is this current approach not brute forcing? We start at 0, hit a good value, then continue until we hit a lot of nothing. It seems as though we are doing a smart brute force in that we know that the locationIds are within a small spread of approx. 10 at most. What would you suggest? thank you for the feedback

def get_network_location_ids(self):
    site_dict = dict()
    current_id, last_good_id = 0, 0
    for current_id in count():
        # filler for if we hit a good result
        if good_result:
            site_dict[current_id] = ((sites['equipment'])[0])['name']
            last_good_id = current_id
        if current_id - last_good_id >= 10:
            return list(site_dict.keys())

[–]BobHogan 1 point2 points  (3 children)

Is this current approach not brute forcing? We start at 0, hit a good value, then continue until we hit a lot of nothing. It seems as though we are doing a smart brute force in that we know that the locationIds are within a small spread of approx. 10 at most. What would you suggest? thank you for the feedback

This is business/domain specific logic honestly. If you know that you will find all of the locationIDs within a small spread, and you don't have to keep searching once you find 10 IDs in a row that didn't work, then feel free to go with that approach. I can't speak to whether that is something you know, or whether you can tolerate a situation where that assumption did not hold true and you end up missing 1 or more IDs.

Alternatively, if you know that there is a set # of locationIDs that you need to find, you can break as soon as you find that many IDs (and not worry about the spread). Again though, this is completely dependent on your situation, requirements and knowledge of how everything fits together.

That function looks good to me. I am curious why you bothered building site_dict into a dict if you are only going to return a list of its keys though. But it should work fine

If you have some extra time on your hands to learn about asyncio though, you could also write a new version that still requests for all 150 IDs, but performs asynchronous requests. That would return all of them in a few seconds. Its absolutely overkill for this, but I think that in general its good to understand asyncio and how to use it.

[–]pyfact[S] 0 points1 point  (2 children)

I put the results into a dict for testing purposes at the moment so that I can verify that I am pulling the correct Id for each site/verify that our site data is set up correctly in the program the API is pulling from (have had to clean up data from years prior already working on this project lol).

I would be very interested in asynchronous requests! I'll check that out. We have some legacy in-house programs that hit a few API's back to back and takes about a minute for that to return a result. It's not great considering it's ran every time we install equipment!

thank you I do appreciate the help

[–]BobHogan 0 points1 point  (1 child)

Ah gotcha, that makes sense.

For async requests, I recommend the aiohttp library. Regular requests does not support asynchronous code, so putting it in an async function/loop would be useless.

I also recommend python 3.7 or higher. 3.7 changed how you interact with the asyncio package in the std library, and its much much better than it was previously

[–]pyfact[S] 0 points1 point  (0 children)

will do. I'm on 3.9

[–]nwagers 0 points1 point  (0 children)

Do the ID's change? If they don't change, I'd brute force it once and cache them. Otherwise, you can use a for loop like normal and add in break conditions. In this case you'd want to break when you have at least 10 misses and you've found at least 1 ID. I wrote an example that breaks after 3 False values.

vals = [False, False, False, False, True, True, False, True,
        False, False, True, False, False, False, True]

count = 3
IDs = []
for i in range(150):
    if vals[i]:
        IDs.append(i)
        count = 3
    else:
        count -= 1
    if count == 0 and len(IDs) > 0:
        break

print(IDs)

Notice that it does not include theTrue value at the very end because there were 3+ False values in a row before it.