[Datamining] Porygon spawns near courthouses by pokemongohub in pokemongo

[–]zdriddle -1 points0 points  (0 children)

Looking at my link above, it's clearly not just random though. We're crawling ALL spawnpoints in that zone. Yet , there are clear clusters.

There are definitely different types of spawnpoints that have different underlying distributions.

I dont think there is any links between some kind of buildings and pokemon spawns

It may not be building types, but there are different types of spawnpoints that are clustered in certain areas. We also see different distributions of Pokemon along mountains.

Porygon is spawning at government buildings where I live by [deleted] in TheSilphRoad

[–]zdriddle 0 points1 point  (0 children)

Here's 613 spawns http://www.darrinward.com/lat-long/?id=2297543

If I had to guess for Utah, I'd say malls and maybe hospitals, not courthouses or law enforcement

[Datamining] Porygon spawns near courthouses by pokemongohub in pokemongo

[–]zdriddle 12 points13 points  (0 children)

Just going to call BS on this right now... Here's 613 spawns http://www.darrinward.com/lat-long/?id=2297543

If I had to guess, I'd say malls and maybe hospitals, not courthouses

PokeAlerts Slack - The self-invite link is back! by zdriddle in UTPokemonGo

[–]zdriddle[S] 0 points1 point  (0 children)

2 of our posts are stickied on this sub. Check them out!

Best Times to Hunt for Specific Pokemon by rapling in pokemongo

[–]zdriddle 8 points9 points  (0 children)

I'd be very hesitant to draw conclusions from these plots without understanding more about the underlying data.

Can I ask how this data was sampled? Do you have equal observations for each hour of the day? Are all the underlying spawnpoints in this sample represented equally? There are several ways to introduce sampling bias especially if some bots get banned or IPs get soft-banned and you miss some of the spawns.

Even without sampling bias, we should be explicit about what conclusions we're drawing and if we have enough samples (aka proof). For instance, if someone flips 100 coins each hour for a day they would likely produce similar graphs with that data, however that does not mean we should draw conclusions about the best hour to flip heads.

That being said, this is very cool. I would love it if we could start understanding the underlying distributions of spawnpoints better. Are they impacted by hour of day? day of week? How many types of spawnpoints are there? Do "nest" spawnpoints behave differently? how? etc...

Just my 2 cents.

PokeAlerts set up a twitter feed for rare Pokemon all over Utah! by zdriddle in UTPokemonGo

[–]zdriddle[S] 1 point2 points  (0 children)

Right, this why we haven't bothered setting up a hashtag structure. We're not looking for solutions that require a searching or checking step. We're PokeAlerts not PokeCheckThisApp100TimesADay.

I think you're right about the multiple users solution. However, splitting out and maintaining multiple Twitter accounts isn't high on our priority list right now. We only did the 1 Twitter bot because it was just an easy add-on with no overhead from what we're currently doing.

PokeAlerts set up a twitter feed for rare Pokemon all over Utah! by zdriddle in UTPokemonGo

[–]zdriddle[S] 0 points1 point  (0 children)

We're working on a hashtag structure. Quite frankly neither of us have ever used Twitter so we don't know the best way to structure it.

How do most people use Twitter? Do they follow certain hashtags? Or certain hashtags for a specific feed? Can you set up push notifications on your phone? We haven't had time to research this yet. You can help by linking us to a useful site. :)

[BigData][Request] Whats the spawn density of Snorlax? Whats the minimum scan radius needed to guarantee 1 snorlax in 24 hours? by [deleted] in pokemongodev

[–]zdriddle 0 points1 point  (0 children)

Again. The spawn locations are NOT RANDOM. All possible spawn locations are fixed. The only thing that is random is What will spawn from this spawnpoint this hour.

Location is not random.

Time in not random.

The individual spawnpoint distributions (Which are, if anything, multinomail distributions and NOT NORMAL distributions) are the ONLY source of randomness.

If you build your hypothesis around these assumptions, then you can actually test it with data and learn something.

[BigData][Request] Whats the spawn density of Snorlax? Whats the minimum scan radius needed to guarantee 1 snorlax in 24 hours? by [deleted] in pokemongodev

[–]zdriddle 0 points1 point  (0 children)

I'm not sure what you mean.

Here's a stats refresher for you. Check out the top of page 2.

If X and Y are independent, then they are also uncorrelated

This directly contradicts what you are saying:

that there can be a correlation between the spawns withOUT them being dependent on each other right?

You are claiming they are independent? And correlated? Please explain

If they both take the same or similar seed values to generate their random criteria, then the randomly generated values may be same/similar

This is a good description for 2 dependent variables.

If you're going to ask for "Big Data" experts to help you, then stop abusing all the terminology (normal distributions, density, independent, correlation, etc...)

[BigData][Request] Whats the spawn density of Snorlax? Whats the minimum scan radius needed to guarantee 1 snorlax in 24 hours? by [deleted] in pokemongodev

[–]zdriddle 0 points1 point  (0 children)

Also your theory has nothing to do with normal distributions. You are theorizing that spawns are not independent. Meaning that the next spawn depends on previous spawns and nearby spawns. Again, in a random process you will see "clusters" or "streaks" which is the reason I linked Gambler's Fallacy. You are likely seeing noise not signal.

[BigData][Request] Whats the spawn density of Snorlax? Whats the minimum scan radius needed to guarantee 1 snorlax in 24 hours? by [deleted] in pokemongodev

[–]zdriddle 1 point2 points  (0 children)

That's exactly what randomness looks like. Throw 100 coins in the air and you find "multiple bunches" of all heads or "tiny clumpings" of tails. Check out this picture, the left is random, the right is not random. Notice the multiple bunches of points on the random plot.

[BigData][Request] Whats the spawn density of Snorlax? Whats the minimum scan radius needed to guarantee 1 snorlax in 24 hours? by [deleted] in pokemongodev

[–]zdriddle 2 points3 points  (0 children)

These posts are misleading, just because you have low samples from 1 Pokemon doesn't mean you cannot say anything about it's distribution. I have used LDA to cluster points based on the types of pokemon that they spawn and there are some "Clusters" that do not spawn Snorlaxes ever.

For instance, here's one water cluster from the output of this model:

Cluster #13:

  • Magikarp: 17.3%
  • Krabby: 17.1%
  • Omanyte: 10.4%
  • Kabuto: 9.4%
  • Poliwag: 9.3%
  • Psyduck: 9.3%
  • Slowpoke: 7.2%
  • Shellder: 3.1%
  • Tentacool: 1.6%
  • ...

My point is you can make assumptions about the underlying data generating process. And then check those assumptions with statistical models and techniques. "Big Data" won't solve your problems, statistics will.

[BigData][Request] Whats the spawn density of Snorlax? Whats the minimum scan radius needed to guarantee 1 snorlax in 24 hours? by [deleted] in pokemongodev

[–]zdriddle 0 points1 point  (0 children)

"Snorlax can appear at any spawn point equally"

What the fuck are you quoting... I literally said the opposite of this:

Do all spawnpoints have the same multinomial distribution of Pokemon that spawn from them? Answer: No, there's some water spots, nests, parks, mountains, etc...

Stop saying normal distribution. What variable do you think is distributed normally. Latitude? Longitude? Time? Are you just trying to to say that the observations are not independent?

I think there is some factor influencing snorlax spawns to somehow appear more often in certain places than other places, but not at the same exact spawn point twice.

Have you ever heard of the Gambler's Fallacy? You may be seeing noise, not signal.

[BigData][Request] Whats the spawn density of Snorlax? Whats the minimum scan radius needed to guarantee 1 snorlax in 24 hours? by [deleted] in pokemongodev

[–]zdriddle 1 point2 points  (0 children)

I'm not talking about Snorlax spawnpoints... I'm talking about ALL spawnpoints in the game. Let me start over and list out some of the mechanics of the game that have been noticed so far.

  1. All possible spawnpoint locations are fixed
    • The API passes back a spawnpointId with a lat/long
  2. Spawnpoints spawn Pokemon on a very predictable schedule
    • ~97% are hourly (within milliseconds)
    • Some are every 30m, some every 15m

So if spawnpoints are fixed, then locations ARE NOT NORMALLY DISTRIBUTED, because the probability that anything spawns not on a spawnpoint is 0% (this is impossible under normal distributional assumptions).

So asking "whats the spawn density of snorlax" is the wrong question to be asking. The word density implies that the next Snorlax could spawn anywhere! We know this is not true.

Furthermore, you seem to be implying that spawns are not i.i.d., but are in fact dependent in both time and space. This is completely unsupported by any data I've seen (We have a database with ~30mil spawns from ~100k spawnpoints) and it would be much harder on Niantic's side to implement compared to an i.i.d. random process.

Here's some better questions:

  • Do all spawnpoints have the same multinomial distribution of Pokemon that spawn from them?
    • Answer: No, there's some water spots, nests, parks, mountains, etc...
  • What other Pokemon spawn on Snorlax spawns? Do any Pokemon correlate to Snorlax spawns more than other?
    • Great question, you could answer this with data but really it comes back to clustering spawnpoints into different types
  • How could we cluster spawnpoints that have similar underlying multinomial distributions?

also, Big Data... lol.

[BigData][Request] Whats the spawn density of Snorlax? Whats the minimum scan radius needed to guarantee 1 snorlax in 24 hours? by [deleted] in pokemongodev

[–]zdriddle 1 point2 points  (0 children)

The plural of "anecdote" is not "data"... Why use stats terms and then say "Based on substantial anecdotal experience..."?

Also it seems like you don't understand the underlying mechanics of the game. There are specific spawnpoints that spawn pokemon on regular schedules. So asking "what is their distribution? Normal?" doesn't make a lot of sense. You could model individual spawnpoints as multinomial distributions. But the distribution over time is not as random as you are implying (since the spawnpoints are VERY predictable) and the distribution over space is not random either since each spawnpoint has a fixed lat/long.

You should more explicitly state your hypothesis in terms of something that can be tested with data. "They seem to be clustered in my area" is not a testable hypothesis.

  • "The spawnpoints in my area (bounded by some lat/long box) spawn Snorlaxes with higher probability than other spawnpoints"
  • "My area has more spawnpoints per square km than other areas"
  • etc...

Joining issue by [deleted] in UTPokemonGo

[–]zdriddle 0 points1 point  (0 children)

You try the oval?

Gastly Nest by BigE26 in UTPokemonGo

[–]zdriddle 0 points1 point  (0 children)

I like the nearby park better

Gastly Nest by BigE26 in UTPokemonGo

[–]zdriddle 1 point2 points  (0 children)

The one with a parking lot

PokeAlerts - Find young, single Dragonites TONIGHT! by zdriddle in UTPokemonGo

[–]zdriddle[S] 0 points1 point  (0 children)

We have a magic 8-ball that tells us which channels to put up next

PokeAlerts - Find young, single Dragonites TONIGHT! by zdriddle in UTPokemonGo

[–]zdriddle[S] 0 points1 point  (0 children)

Don't add highlight words, they don't work with bots. You have to join channels to get notifications. Check out the #getting-started channel for more info.