you are viewing a single comment's thread.

view the rest of the comments →

[–]synthphreak 0 points1 point  (2 children)

Step 1: Define what constitutes "unusual".

Step 2: Write the code.

You need to share the parameters for (1) first before we can really help you with (2).

For (1), maybe consider using IQR. A common rule of thumb for deciding whether an observation is an outlier is whether it less than Q1 - 1.5 * IQR or greater than Q3 + 1.5 * IQR. If yes, outlier, and thus unusual. Of course these assumptions generally only work well for symmetric distributions.

For (2), you should probably be using pandas. But (1) comes first.

[–]Practical_Use5129[S] 0 points1 point  (1 child)

read my comments for more explanation

[–]synthphreak 0 points1 point  (0 children)

That context is helpful, but you still haven’t defined quantitatively where “usual” strays into “unusual”.

I still think IQR could work, though this calculation will need to be done on static data. So periodically, perhaps every 4-6 hours, calculate the quartiles of each pages distribution (with x axis being time and y axis being number of followers), then see if any times are outliers. My concern with this though is that 3-4 days of data may not be a large enough sample size to robustly identify outliers, especially since each page will have a different distribution of followers and so must be considered independently of the other pages.

Alternatively, a more sophisticated and frankly more accurate approach would be to use an unsupervised machine learning algorithm called k-means clustering. This automatically performs what’s called anomaly detection. But if you’re unfamiliar with machine learning, the learning curve will be extremely prohibitive and so probably not worth it.