Saturday, May 13, 2017

probability - Estimate the number of elements by random sampling with replacement



Setup:





  1. $N$ numbered balls are in a bag. $N$ is unknown.

  2. Pick a ball uniformly at random, record its number, replace it, shuffle.

  3. After $M$ samples, of which we noticed $R$ repeated numbers, how can we estimate the value of $N$?



Clearly, after we've covered the set, only repeats will appear. However, there is a vanishingly small probability we've just missed one.



Simplification:



Assume we know $N


Possibly related:





Application:



Estimating the number of webcomics when I'm pressing "random," assuming random is uniform over the comics.


Answer



Chance of getting a repeat = (M-R)/N which we are going to call a "failure". Call this probability (M-R)/N = (1-P). Now the number of successful draws we have had (without getting a repeat) is M-R. Call this K. Now we just have a negative binomial distribution with R failures occurring.




The mean of the neg. binomial distribution is the number of successes before the Rth failure. In this case M-R. So M-R = pR/(1-p) with p being defined above.
$$
\ M-R = (1 - (M-R)/N))R / (M-R)/N
$$
$$
\ = (R - (M-R)R/N) * (N / (M-R)) = NR/(M-R) - R = M - R
$$
$$
\ = M = NR/(M-R) = M^2 - RM = NR --> N = (M^2/R) - M

$$


No comments:

Post a Comment

analysis - Injection, making bijection

I have injection $f \colon A \rightarrow B$ and I want to get bijection. Can I just resting codomain to $f(A)$? I know that every function i...