Garena, where I work now, is in an expansion mode and I have been hiring engineers, sysadmins and so on to feed the development frenzy for platform revamps and product roadmaps. A problem I face when hiring engineers is that we’re **not** the only companies that are doing so. This is especially true now as many companies have issued their annual bonuses (or lack of) and the ranks of the dissatisfied joined the swelling exodus out of many tech companies. In other words, mass musical chairs with tech companies and engineers.

Needless to say this causes a challenge with hiring. The good news is that there plenty of candidates. The bad news however is to secure the correctly skilled engineer with the right mindset for a growing startup. At the same time, the identification and confirmation needs to be swift because once you are slow even with a loosely-fit candidate you can potentially lose him/her within a day or two. This causes me to wonder — what is the best way to go through a large list of candidates and successfully pick up the best engineer or at least someone who is the top percentile of the list of candidates?

In Why Flip a Coin: The Art and Science of Good Decisions, H.W. Lewis wrote about a similar (though stricter) problem involving dating. Instead of choosing candidates the book talked about choosing a wife and instead of conducting interviews, the problem in the book involved dating. However the difference is in the book you can only date one person at a time while in my situation I can obviously interview more than one candidate. Nonetheless the problems are pretty much the same since if I interview too many candidates and take too long to decide, they will be snapped up by other companies. Not to mention that I will probably foam in the mouth and die from interview overdose before that.

In the book, Lewis suggested this strategy — say we are choosing from a pool of 20 candidates. Instead of interviewing each and every one of those candidates we randomly choose and interview 4 candidates and choose the best out of the sample pool of 4. Now armed with the best candidate from the sample pool, we go through the rest of the candidates one by one until we hit one that is better than him, then hire that candidate.

As you might guess, this strategy is probabilistic and doesn’t guarantee the best candidate. In fact, there are 2 worst case scenarios. First, if we happened to choose the worst 4 candidates of the lot as the sample pool and the first candidate we choose outside of the sample pool is the 5th worst, then we would have gotten the 5th worst candidate. Not good.

Conversely if we have the best candidate in the sample pool, then we run the risk of doing 20 interviews and then lose the best candidate because it took too long to do the interviews. Bad again. So is this a good strategy? Also, what is the best population pool (total number of candidates) and sample pool we want in order to maximize this strategy? Let’s be a good engineer and do another Monte Carlo simulation to find out.

Let’s start with the population pool of 20 candidates, then we iterate through the sample pool of 0 to 19. For each sample pool size, we find the probability that the candidate we choose is the best candidate in the population.

Actually we already know the probability when the sample pool is 0 or 19. When the sample pool is 0, it means we’re going to choose the first candidate we interview (since there is no comparison!) therefore the probability is 1/20 which is 5%. Similarly with a sample pool of 19, we will have to choose the last candidate and the probability of it is also 1/20 which is 5%. Here’s the Ruby code to simulate this. We run it through 100,000 simulations to make the probability as accurate as possible, then save it into a csv file. called *optimal.csv*.

require 'rubygems' require 'faster_csv' population_size = 20 sample_size = 0..population_size-1 iteration_size = 100000 FasterCSV.open('optimal.csv', 'w') do |csv| sample_size.each do |size| is_best_choice_count = 0 iteration_size.times do # create the population and randomize it population = (0..population_size-1).to_a.sort_by {rand} # get the sample pool sample = population.slice(0..size-1) rest_of_population = population[size..population_size-1] # this is the best of the sample pool best_sample = sample.sort.last # find the best chosen by this strategy best_next = rest_of_population.find {|i| i > best_sample} best_population = population.sort.last # is this the best choice? count how many times it is the best is_best_choice_count += 1 if best_next == best_population end best_probability = is_best_choice_count.to_f/iteration_size.to_f csv << [size, best_probability] end end

The code is quite self explanatory (especially with all the in-code comments) so I won’t go into details. The results are as below in the line chart, after you open it up in Excel and chart it accordingly. As you can see if you choose 4 candidates as the sample pool, you will have roughly 1 out of 3 chance that you choose the best candidate. The best odds are when you choose 7 candidates as the sample pool, in which you get around 38.5% probability that you will choose the best candidate. Doesn’t look good.

But to be honest for some candidates I don’t really need the candidate to be the ‘best’ (anyway such evaluations are subjective). Let’s say I want to get the candidate to be in the top quartile (top 25%). What are my odds then? Here’s the revised code that does this simulation.

require 'rubygems' require 'faster_csv' population_size = 20 sample_size = 0..population_size-1 iteration_size = 100000 top = (population_size-5)..(population_size-1) FasterCSV.open('optimal.csv', 'w') do |csv| sample_size.each do |size| is_best_choice_count = 0 is_top_choice_count = 0 iteration_size.times do population = (0..population_size-1).to_a.sort_by {rand} sample = population.slice(0..size-1) rest_of_population = population[size..population_size-1] best_sample = sample.sort.last best_next = rest_of_population.find {|i| i > best_sample} best_population = population.sort.last top_population = population.sort[top] is_best_choice_count += 1 if best_next == best_population is_top_choice_count += 1 if top_population.include? best_next end best_probability = is_best_choice_count.to_f/iteration_size.to_f top_probability = is_top_choice_count.to_f/iteration_size.to_f csv << [size, best_probability, top_probability] end end

The *optimal.csv* file has a new column, which shows the top quartile (top 5) candidates. The new line chart is shown below, with the results of the previous simulation as a comparison.

Things look brighter now, the most optimal sample pool size is 4 (though for practical purposes, 3 is good enough since the difference between 3 and 4 is small) and the probability of choosing a top quartile candidate shoots up to 72.7%. Pretty good! Now this is with 20 candidates. How about a large candidate pool? How will this strategy stand up in say a population pool of 100 candidates?

As you can see, this strategy doesn’t work in getting the best out of a large pool (sample pool is too large, probability of success is too low) and it is worse than in a smaller population pool. However, if we want the top quartile or so (meaning being less picky), we only need a sample pool of 7 candidates and we can have a probability of 90.63% of getting what we want. This is amazing odds! This means if you’re a hiring manager with a 100 candidates, you don’t need to kill yourself trying to interview everyone. Just interview a sample pool of 7 candidates, choose the best and then interview the rest one at a time until you reach one that better than the best in the sample pool. You will have 90% of choosing someone in the top 25% of those 100 candidates (which is probably what you want anyway)!

There is a flip side to the problem, which I think has been tackled with some success by game theorists. That is – similarly to dating issue, getting the best candidate depends quite a bit on what you can offer. Also all the dating, interviewing and looks may be deceiving :-), hence the happy engagements survives the time only when both parties are happy with what they get out of each other.

Absolutely agree. However I’m only consider discovering the best (or top quartile) engineers from a large pool of candidates. Attracting the correct candidate to join is another matter altogether, as with retaining him.

Mr. Chang. I take my hat off to you. I’ve never seen anyone use math/science for hiring. The search firms should take a read at your blog. :-)

Interesting exercise, but to be the devil’s advocate, maybe your model is a bit too one-dimensional:

1) Within those 100 candidates the quality may very but so may salary (or other) expectations. Your top 1 out of 7 guy/gal may not be willing to join the company…

2) It also assumes that there’s an easy way of ranking someone’s capability, while in reality people have various strengths and weaknesses that are pretty difficult to quantify. 3) Also, you’d need to weigh in short-term usefulness vs. long-term potential.

I’d take a completely different approach to hiring in case you have too many applicants:

Give them an automatic open-book test as a pre-screening for a F2F interview. You’ll probably cut down the number of applicants by at least two thirds or even a lot more if you create an extensive test, and you’ll probably see a few not even willing to go through it. And for the ones who go through the whole lot and pass – ask them to bring along some of the code they wrote…

Andrew, it’s one dimensional because I was only considering that one dimension :) The base assumption I had is that I would have an objective means of evaluating a candidate (up to a certain degree of accuracy of course) and condensing that single metric to a number. I also did not consider that more than 1 candidate can have a single value (of that metric). In fact, the model is pretty simplistic but it’s a starting point to show that to find the best candidates you don’t need to interview each and every candidate, there is a more probabilistic and pragmatic approach to doing interviews.

And of course, it’s the geeky thing to do.

Joining discussion again…

Well, I’m also proud to be a geek, however applying statistics when having a potential to influence someone’s life does not seem to be a very clean solution. In statistical terms you may improve on your otherwise elegant methodology by adding a multi variant analysis, T-tests, etc, however when hiring people, personally, I will read each single resume that is send my way. In this case, I believe in a simple rule: I ask for CVs I read them all. Then I make a short list. There is a chance that I will lose some very good candidates, but like in an ethical fishing practice, the fish has a chance of its own choice: take a bite and end on my plate, or swim away free What, I’m trying to say is that we need to be very careful when using statistics in human decision making processes if we fear dehumanization of course.

Hi Jacek,

Viewing CVs in the tens is ok, in the hundreds wiill be tough and thousands is probably impossible. Similarly doing interviews is a magnitude of scale higher than reading CVs and the problem is if you have to interview 100 people, and if you have a high throughput of interviewing 5 candidates per day, you will only finish interviewing in about 1 month (in which you are will not have any capacity to do anything else). By which time the best candidates would have joined another company who offers them a good job earlier, unless your offer is so compelling that he can wait for you.

What I’m proposing here is not to remove the human elements of hiring people, but to suggest a more pragmatic method in organizing (and not conducting) the interviews. The conduct and evaluation of individuals still needs to be done. In a way, this strategy in organizing interviews has a more ‘human touch’ to it as we will have more time to do the interviews properly instead of rushing to clear the numbers.

My next post will go into some of the problems we face in the strategy and also comparing it with a standard strategy.

[…] seocnd issue of Jiu Jitsu Style becomes available on 21 April! I am a proud contributer to JJS and was crazy pleased to have written the cover interview with Marc Walder and a piece on BJJ for […]

You are aotlbusely right but 3G mobile peneteration is still very very low in India and by having more 3G handset, usage will beome high which will also force mobile operators to lower down 3G tariff.Like or Dislike this comment: 0 0

yR4ALP bjfztrmkdfre