Setup:
- $N$ numbered balls are in a bag. $N$ is unknown.
- Pick a ball uniformly at random, record its number, replace it, shuffle.
- After $M$ samples, of which we noticed $R$ repeated numbers, how can we estimate the value of $N$?
Clearly, after we've covered the set, only repeats will appear. However, there is a vanishingly small probability we've just missed one.
Simplification:
Assume we know $N Estimating the number of webcomics when I'm pressing "random," assuming random is uniform over the comics.Possibly related:
Application:
Answer
Chance of getting a repeat = (M-R)/N which we are going to call a "failure". Call this probability (M-R)/N = (1-P). Now the number of successful draws we have had (without getting a repeat) is M-R. Call this K. Now we just have a negative binomial distribution with R failures occurring.
The mean of the neg. binomial distribution is the number of successes before the Rth failure. In this case M-R. So M-R = pR/(1-p) with p being defined above.
$$
\ M-R = (1 - (M-R)/N))R / (M-R)/N
$$
$$
\ = (R - (M-R)R/N) * (N / (M-R)) = NR/(M-R) - R = M - R
$$
$$
\ = M = NR/(M-R) = M^2 - RM = NR --> N = (M^2/R) - M
$$
No comments:
Post a Comment