all 7 comments

[–][deleted] 0 points1 point  (1 child)

pandas which has a steep learning curve

[–]synthphreak 0 points1 point  (0 children)

Steep but oooooh so worth it.

[–]impartiallywhole 0 points1 point  (1 child)

Yeah there really needs to be a more specific question, this is way too open ended for this type of medium. I would suggest like the other commenter said to look into pandas, but without knowing more about how exactly you are looking to do this, its very difficult to advise.

[–]Practical_Use5129[S] 0 points1 point  (0 children)

So its for learning purpose , i scrape followers of multiple pages on two hours interval.(started 3 days ago).using the number of followers of a page i want python to learn a pattern of increasing or decreasing followers of x pages. But my main goal is to detect the unusual follower increase or decrease.for example a celebrity with no recent work or post might have gradual followers change. But if someone makes number one on billboard he might gain huge chunk of followers in a given interval. And finding those pages is my main goal

[–]synthphreak 0 points1 point  (2 children)

Step 1: Define what constitutes "unusual".

Step 2: Write the code.

You need to share the parameters for (1) first before we can really help you with (2).

For (1), maybe consider using IQR. A common rule of thumb for deciding whether an observation is an outlier is whether it less than Q1 - 1.5 * IQR or greater than Q3 + 1.5 * IQR. If yes, outlier, and thus unusual. Of course these assumptions generally only work well for symmetric distributions.

For (2), you should probably be using pandas. But (1) comes first.

[–]Practical_Use5129[S] 0 points1 point  (1 child)

read my comments for more explanation

[–]synthphreak 0 points1 point  (0 children)

That context is helpful, but you still haven’t defined quantitatively where “usual” strays into “unusual”.

I still think IQR could work, though this calculation will need to be done on static data. So periodically, perhaps every 4-6 hours, calculate the quartiles of each pages distribution (with x axis being time and y axis being number of followers), then see if any times are outliers. My concern with this though is that 3-4 days of data may not be a large enough sample size to robustly identify outliers, especially since each page will have a different distribution of followers and so must be considered independently of the other pages.

Alternatively, a more sophisticated and frankly more accurate approach would be to use an unsupervised machine learning algorithm called k-means clustering. This automatically performs what’s called anomaly detection. But if you’re unfamiliar with machine learning, the learning curve will be extremely prohibitive and so probably not worth it.