Setup:
- N numbered balls are in a bag. N is unknown.
- Pick a ball uniformly at random, record its number, replace it, shuffle.
- After M samples, of which we noticed R repeated numbers, how can we estimate the value of N?
Clearly, after we've covered the set, only repeats will appear. However, there is a vanishingly small probability we've just missed one.
Simplification:
Assume we know $N
Possibly related:
Application:
Estimating the number of webcomics when I'm pressing "random," assuming random is uniform over the comics.
Answer
Chance of getting a repeat = (M-R)/N which we are going to call a "failure". Call this probability (M-R)/N = (1-P). Now the number of successful draws we have had (without getting a repeat) is M-R. Call this K. Now we just have a negative binomial distribution with R failures occurring.
The mean of the neg. binomial distribution is the number of successes before the Rth failure. In this case M-R. So M-R = pR/(1-p) with p being defined above.
M−R=(1−(M−R)/N))R/(M−R)/N
=(R−(M−R)R/N)∗(N/(M−R))=NR/(M−R)−R=M−R
=M=NR/(M−R)=M2−RM=NR−−>N=(M2/R)−M
No comments:
Post a Comment