probability - Estimate the number of elements by random sampling with replacement

Saturday, May 13, 2017

probability - Estimate the number of elements by random sampling with replacement

Setup:

$N$ numbered balls are in a bag. $N$ is unknown.

Pick a ball uniformly at random, record its number, replace it, shuffle.

After $M$ samples, of which we noticed $R$ repeated numbers, how can we estimate the value of $N$ ?

Clearly, after we've covered the set, only repeats will appear. However, there is a vanishingly small probability we've just missed one.

Simplification:

Assume we know $N

Possibly related:

Application:

Estimating the number of webcomics when I'm pressing "random," assuming random is uniform over the comics.

Answer

Chance of getting a repeat = (M-R)/N which we are going to call a "failure". Call this probability (M-R)/N = (1-P). Now the number of successful draws we have had (without getting a repeat) is M-R. Call this K. Now we just have a negative binomial distribution with R failures occurring.

The mean of the neg. binomial distribution is the number of successes before the Rth failure. In this case M-R. So M-R = pR/(1-p) with p being defined above.

$\ M-R = (1 - (M-R)/N))R / (M-R)/N$

$\ = (R - (M-R)R/N) * (N / (M-R)) = NR/(M-R) - R = M - R$

$\ = M = NR/(M-R) = M^2 - RM = NR --> N = (M^2/R) - M$

Blog

Saturday, May 13, 2017