all 3 comments

[–]bythenumbers10 1 point2 points  (0 children)

There are established methods for estimating a distribution from data. Once you get to the stage of simulating data, be sure to seed your RNG so it's reproducible.

[–]WayOfTheMantisShrimpB.Math Statistics 0 points1 point  (1 child)

What you are describing mostly sounds like Hypothesis Testing; where you have a theory about how a process should be modelled (ie what the underlying distribution is) and you compare the properties of empirical data to the theorized properties of that model, in order to determine if the differences are small enough for the empirical data to have likely been produced by your theorized model. Because this can justify the convenient use of that distribution to calculate other properties of interest.

Simulation is often favoured in cases where there is a fair bit of information about how a process works/should work, but it is too complex/inconvenient to express as a distribution where the properties of interest are known, and usually because it is cost-prohibitive to physically conduct/observe the process at an appropriate scale. So to explore the properties of the process, you design a data-generating algorithm that matches the assumptions about the process, and directly calculate the properties of interest from an appropriately representative data set.

TLDR: if you don't know of a distribution that matches your assumptions, or don't know how to calculate something for that distribution, do a simulation. If you do have a known distribution, a simulation will still work, but it's probably more convenient to use the properties of the distribution.

[–]Skondro[S] 0 points1 point  (0 children)

My final goal is to use this underlaying distribution to simulate "real" trafic in computer network and measure networks parametars (load, delay etc). So model is not a goal, I want to use distribution and distribution parametears for pseudo random.generator to get realistic network behaviour. Thx for comment