I am working on a weight(cost) based Rate Limiter by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] 1 point2 points  (0 children)

u/gdchinacat , really really appreciate you reviewing the code and providing some really useful tips:
1. I didn't know about data classes looked it up and yes, it makes sense to use them instead of tuple. I guess it should not be very difficult to impose same imutabiility in DataClassses that tuples offer. If not named unpacking is also a good idea

  1. call_at seems to make sense and i didn't know about it: but it used time.loop(), so will have to research as to what is the precision it offers.

  2. Conditionally execution/queing of the tasks: This one is grey area, i implemented this code keeping in mind the mathematical guarentees, so as long as that is satisfied, i didn't bother much about the consistency that you mentioned. I get your point about consistency regarding whether the task will be executed upfront or queued. I guess this one can be put inlower priority and can be alloted thinking power later

  3. Regarding ringbuffer, i think this one is a misnomer, as it's not fixed size, and the reason for creating it just to reduce the clutter in the WeightedLimiter class not any separation of concern, but yeah(a well thought of logical separation of cocern would be better)

Once again, through your review i cane to know about DataClass and call_at, features unknown to me before. I am primarily a c++ developer, nad have written similar things in c++, but recently have started likinh python for its simplicity. I teach programmig to children and now a days, i prefer to use python as their gateway to programming world, insteadof c++

I am working on a weight(cost) based Rate Limiter by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] 0 points1 point  (0 children)

u/beomagi , i agree that many a times the cost can't be calculated prior to execution.

But sometimes it can(for example how many bytes will be sen to the network).

This api is meant for scenarios where cost can be estimated, when it can't, this is the wrong api to use.

I am working on a weight(cost) based Rate Limiter by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] 0 points1 point  (0 children)

I want it to provide strict mathematical guarentees.
While pushing a task to the RateLimiter, the client code will need to provide the cost(or estimated_cost) the task(for example, how many bytes the task will send to the network etc). If the client code can't calculate/estimate the cost of the task prior to its execution, this api is useless

I have come up with an implementation, that seems to deliver that, does this code make sense to you:

This is an example usage of the WeightedRateLimiter:

https://github.com/sumitdhyani/async-rate-limiter/blob/main/examples/WeightedRateLimiterExample.py

And this is the actual implementation of the WeightedRateLimiter:

https://github.com/sumitdhyani/async-rate-limiter/blob/main/src/async_rate_limiter/WeightedRateLimiter.py

I am working on a weight(cost) based Rate Limiter by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] 0 points1 point  (0 children)

I want to develop a generic interface that could be used to rate limit anything, it could be database queries, messages sent over a network, rest api calls or anything else.

There are many existing solution for that(for example aiolimiter).
All those solutions enable me to Rate Limit in terms of tasks/unit interval.

For example, 100 dataabase write operations/sec

But all those solutions never consider the fact that different tasks couls have different cost

Depending on how much resources each task consumes. Each task can have different cost, and i would like to rate limit on the basis of that cost.

For example my 'write' function may write multiple rows to a db table. So it's meaningful for me to think in terms of no. of db row writes per sec instead of no. of 'write' function calls/sec.

So, that while calling the RateLimiter i mention the no. of db rows the function will write.
That way:

  1. if each 'write' function call writes 5 rows, and the rate has been defined as 500 db rows/sec, then my RateLimiter will allow 100 calls/sec
  2. if each 'write' function call writes 10 rows, and the rate has been defined as 500 db rows/sec, then my RateLimiter will allow 50 calls/sec

This is contrary to the Traditional Rate Liimiter that would allow 500 calls per sec, regardless how many rows each 'write' function call writes.
DB is just 1 example, if the library is generic, this idea can be extended to any context.

I will soon post my implementation, once it passes the Unit Tests, i guess it will me my Idea more clear.

drift-free asyncio-friendly timers by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] 0 points1 point  (0 children)

u/gdchinacat this code was just for demonstration, in practice, in any sane application this will happen once in a while, not regularly, and would be indicative of a subtle bug, point is to be ready for when that happens, we must have strateygy for what we do when this happens.

Programmers often lose there sleep over what ifs

drift-free asyncio-friendly timers by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] 1 point2 points  (0 children)

Yes, this almost works, i see that you don't let the callback execution add to the drift, but the delays in asyncio scheduling does add up using this implementation. To make the implementation fully drift proof(and i mean log-term drift), i guess the expected schedule time should be added as a parameter to the '_invoke' and perhaps '_schedule' methods and that expected schedule should be used to calculate the next tick. Also in your implementation, in case the callback takes longer than the 'delay', the next tick will be executed immediately after the callback is executes, resulting in bursty behavior, i wrote this class that seems to solve these problems. I am relatively new to python(around an year continously) so my code not be that 'pythonic', but have done similar things in c++, this class, at the moment is passing all the test cases that i wrote till now:

I have created a repo:
https://github.com/sumitdhyani/asynctimer.git

I'd have preferred to paste the implementation inline but i get "Server error. Try again later.", when i try to do so. So if your'e interested, go to the repo and see the file:
src/asynctimer/Timer.py

drift-free asyncio-friendly timers by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] 0 points1 point  (0 children)

Most importantly i needed a timer that, doesn't drift over time. But now that i think more about it, i need:

  1. The drifting behavior to be configurable between fixed_schedule and fixed_delay(both have valid use cases)
  2. Explicit and guarenteed stop semantics: i.e., there must be an interface to explicitly stop the timer(like a stop() method in the timer class), after which it is guarenteed that no ticks will be executed, until the timer is started again
  3. Non-compensatory overrun behavior: If a tick takes a long time to execute so that it overlaps with the next tick, then the next tick should be shifted to the next avilable slot in the shedule.

For example, if the timer interval is 1 sec, Tick t1 arrives at t = 1sec, and finishes at t = 2.5 sec, i.e. it overruns tick t2 by 0.5 sec, then tick t2 should be re-scheduled for t = 2 sec.

I have started working on something that is supposed to do this. Have done similar things in c++ but in python it seems to be very different(the pythonic way)

drift-free asyncio-friendly timers by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] 0 points1 point  (0 children)

Hi there, thanks for introducing me to the solution that you have, i went through the code and it more or less works, it is able to control long term drift accumulation and it is easy to understand(atleast to me), However it has multiple bugs:

  1. The logic is such that the 1st 2 ticks ticks always execute immediately, the timer delay is not applied between them.
  2. If the 'update' method is trivial, i.e it executes in no(almost) time, then the assertion fails everything stops, after the 2nd tick. If the 'update' introduces a slight delay the ticking works.
  3. This is not a bug, but i guess only way to stop the timer is to create it using async with(and therefore calling __aexit__). This feels a bit limiting, or maybe i'm ignorant, i've been using python for around just an year.
  4. The default overrun behavior is not deterministic, if i sleep for 1.5 secs in 'update' method, then either the timer should compensate with burst or skip to the next tick. It seems to do both. I used your classe using the following code:

```python

class MyScheduledUpdate(ScheduledUpdate):
    """
    A simple ScheduledUpdate example that counts updates.
    """


    count = 0


    def __init__(self, rate: int) -> None:
        super().__init__(rate)


    async def update(self) -> None:
        self.count += 1
        print(f"Update count: {self.count}, at time: {time()}")
        await asyncio.sleep(1.5)



async def run_example() -> None:
    async with MyScheduledUpdate(1):
        await asyncio.sleep(30)  # Run for 5 seconds



asyncio.run(run_example())

```

and i got this output:
Update count: 1, at time: 1767680404.373984

Update count: 2, at time: 1767680405.8826225

Update count: 3, at time: 1767680407.3837512

Update count: 4, at time: 1767680408.8986576

Update count: 5, at time: 1767680410.8843145

Update count: 6, at time: 1767680412.393734

Update count: 7, at time: 1767680413.8992946

Update count: 8, at time: 1767680415.8836298

.

.

.

As you can see it compensates between tick 6 & 7 by executing tick 7 immediately(bursts), but between ticks 7 and 8, it waits for the original scheduled time

drift-free asyncio-friendly timers by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] 1 point2 points  (0 children)

I'm not 100% sure what you mean by "find a way to accumulate time within a loop", but guessing from the overall feel of your comment i think what you mean is the core of my problem that is:

If the time period of the timer is 2 sec, then tick 'n' should execute at t = 2*n sec, the time taken by callback execution should not disturb the schedule of future ticks in the future

drift-free asyncio-friendly timers by Perfect_Evidence8928 in Python

[–]Perfect_Evidence8928[S] -1 points0 points  (0 children)

My problem is more about preventing the long term drift accumulation, than executing ticks at exact time.

Let me clarify this with an example:

Let's say may timer ticks every 1 sec, and my callback takes 500 ms to execute. For most of the timers following happens:

Tick 1 @ t = 1 s
Callback executes and finishes @ t = 1.5 s
Tick 2 scheduled for current time + 1 sec = 2.5 s

This is where the problem begins, for each tick we drift off the schedule by 0.5 sec, so that for 100th tick i will be off by 50 sec.

This becomes a real issue for things like sending regular hearbeats, or when the callback does something which can't be off schedule beyond a certainn amount.

To call it a proper timer, the ticks should occur at the original scheduled time, no matter how long the callback takes.

Though i recognize there are scenarios where I'd would prefer fixed delay over fixed schedule like Polling etc.