The following algorithm is given in wiki for solving reservoir sampling with O(n^2).
Used for solving ques like this,
You must pick a subset S’ of k numbers from S such that the probability of each element of S occurring in S’ is equal (i.e., each is selected with probability k/n). You may make only one pass over the numbers. What if n is unknown?
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]
The explaination given below is as follow:
The algorithm creates a "reservoir" array of size k and populates it with the first k items of S. It then iterates through the remaining elements of S until S is exhausted. At the ith element of S, the algorithm generates a random number j between 1 and i. If j is less than or equal to k, the jth element of the reservoir array is replaced with the ith element of S. In effect, for all i, the ith element of S is chosen to be included in the reservoir with probability k/i. Similarly, at each iteration the jth element of the reservoir array is chosen to be replaced with probability 1/k * k/i, which simplifies to 1/i. It can be shown that when the algorithm has finished executing, each item in S has equal probability (i.e. k/length(S)) of being chosen for the reservoir.
-
The part i understand is that i have 2 arrays,
- array s[] of size n(where n can be unknown or very large). array r[] of size k, since k elements have to be chosen(also k < n).
Now, first we fill the first 'k' elements from array s[1..n] into array r[1..k]. This obviously fills array r[1..k] completely. then, starting from index k+1 to n then starting from i=k+1 to n, we choose a random number between 1 to i, and if j
Please can someone explain the probability part here.This part specially...
Similarly, at each iteration the jth element of the reservoir array is chosen to be replaced with probability 1/k * k/i, which simplifies to 1/i. It can be shown that when the algorithm has finished executing, each item in S has equal probability (i.e. k/length(S)) of being chosen for the reservoir. To see this, consider the following proof by induction.
via Chebli Mohamed
Aucun commentaire:
Enregistrer un commentaire