# saush

## How to hire the best engineers – part 2

Posted in Ruby by sausheong on April 5, 2010

In my previous blog post, I came to the conclusion that

If you’re a hiring manager with a 100 candidates, just interview a sample pool of 7 candidates, choose the best and then interview the rest one at a time until you reach one that better than the best in the sample pool. You will have 90% chance of choosing someone in the top 25% of those 100 candidates.

I mentioned that there are 2 worst case scenarios for this hiring strategy. Firstly, if coincidentally we selected the 7 worst candidates out of 100, then the next one we choose will be taken regardless how good or bad it is. Secondly, if the best candidate is already in the sample pool, then we’ll be going through the rest of the population without finding the best. Let’s talk about these scenarios now.

The first is easily resolved. The probabilities are already considered and calculated in our simulation so we don’t need to worry about that any more. The second is a bit tricky. To find out the failure probability, we tweak the simulation a bit again. This time round instead of getting every sample pool size, we focus on the best results i.e. sample pool size of 7.

require 'rubygems'
require 'faster_csv'

population_size = 100
sample_size = 0..population_size-1
iteration_size = 100000
top = (population_size-25)..(population_size-1)
size = 7
population = []
frequencies = []

iteration_size.times do |iteration|
population = (0..population_size-1).to_a.sort_by {rand}
sample = population.slice(0..size-1)
rest_of_population = population[size..population_size-1]
best_sample = sample.sort.last
rest_of_population.each_with_index do |p, i|
if p > best_sample
frequencies << i
break
end
end
end

puts "Success probability : #{frequencies.size.to_f*100.0/iteration_size.to_f}%"



Notice we’re essentially counting the times we succeed over the iterations we made. The result for this is a success probability of 93%!

Let’s look at some basic laws of probability now. Let’s say A is the event that the strategy works (the sample pool does not contain the best candidate), so the probability of A is P(A) and this value is 0.93. Let’s say B is the event that the the candidate we find out if this strategy is within the best quartile(regardless if A succeeds or not), so the probability of B is P(B). We don’t know the value of P(B) and we can’t really find P(B) since A and B are not independent. What we really want is  . What we do know however is P(B|A) which is the probability of B given A happens. This value is 0.9063 (from the previous post),

Using the rule of multiplication:

which is 84.3% probability that the sample pool does not contain the best candidate and we get the top quartile of the total candidate population.

Next, we want to find out, given that we have a high probability of getting the correct candidates in, how many candidates will we have to interview before we find one that is better than the best in the sample pool? Naturally if we have to interview a lot of candidates before finding one, the strategy isn’t practical. Let’s tweak the simulation again, this time to count the successful ones and get a histogram of the number of interviews before we hit jackpot.

What kind of histogram do you think we will get? An initial thought was that it would be a normal distribution, which is not great news for this strategy, because a histogram peaks around the mean, which means the number of candidates we need to interview is around 40+.

require 'rubygems'
require 'faster_csv'

population_size = 100
sample_size = 0..population_size-1
iteration_size = 100000
top = (population_size-25)..(population_size-1)
size = 7
population = []
frequencies = []

iteration_size.times do |iteration|
population = (0..population_size-1).to_a.sort_by {rand}
sample = population.slice(0..size-1)
rest_of_population = population[size..population_size-1]
best_sample = sample.sort.last
rest_of_population.each_with_index do |p, i|
if p > best_sample
frequencies << i
break
end
end
end

FasterCSV.open('optimal_duration.csv', 'w') do |csv|
rest_of_population = population[size..population_size-1]
rest_of_population.size.times do |i|
count_array = frequencies.find_all{|f| f == i}
csv << [i+1, count_array.size.to_f/frequencies.size.to_f]
end
end



The output is a CSV file called optimal_duration.csv. Charting the data as a histogram, surprisingly we get power law graph like this:

This is good news for or strategy! Looking at our dataset, the probability of the first candidate in the rest of the candidate population to be the one is 13.58% while the probability of getting our man (or woman) in the first 10 candidates we interview is a whopping 63.25%!

So have we nailed the question if the strategy is good one? No, there is still one last question unanswered and this is the breaker. Stay tuned for part 3! (For those who are reading this, an exercise — tell me what you think is the breaker question?)