Why We Ended Up Using HackerRank: A Tale of Vaping Job Applicants and failed Turing Tests with ChatGPT

Disclaimer: All the information presented in this article is based on my personal opinions and experience and does not reflect those of my past, present, or future employers. This is also not a comment on HackerRank as a platform - we've had a great experience with them, and I'd recommend them to anyone looking for a platform to administer coding tests.

Oh, the joys of modern recruitment - especially at a small tech startup with finite resources and time! Do you remember the good ol' days when a fair assessment could separate the best candidates from the pack? Neither do I. Recruiting in tech has become a minefield of buzzwords, inflated resumes, and candidates who think that vaping during an interview is a "power move". In my last two years as the Head of Engineering at Behaviour Lab, I've seen it all. We've grown the team from 3 engineers to 10 and have interviewed hundreds of candidates. We've tried everything from the traditional resume screening to the dreaded whiteboard coding interviews and "take-home" assignments. This is the story of how we ended up using a platform like HackerRank and why we're sticking with it (for now).

What's on our wishlist when we hire engineers?

At the heart of any recruitment process should be a clear understanding of what values and skills you're looking for. I appreciate that these will vary greatly from company to company, between small startups and huge corporations, and between different roles. Within our engineering team, we generally look for the following:

Problem-solving: The ability to break down a problem into smaller chunks and solve them one by one.
Learning Potential: The ability to learn new technologies and skills quickly. We're not looking for someone who knows everything but for someone who has the tenacity to figure almost everything out with the right resources. After all, we're a startup, and we're constantly evolving, and I want the team to do the same.
Communication skills: Clear, concise, and effective communication is important everywhere, but at a startup, there's a lot less margin for error. So we need to ensure we're not coding in isolation and are solving the right problems.
Teamwork: Every team is made up of a diverse group of people - those with niche expertise and skill, and some which amplify the capabilities of others through collaboration. Each one is useless without the other.
Attention to detail: Engineering is inherently detail-oriented, but even minor oversights can lead to significant repercussions if the team doesn't take those details seriously.
Ownership: With countless competing priorities and a constantly evolving landscape, no one has the time to follow up on every single task. We need people who take ownership of a problem and see it through to the end when they commit.
Depth of Knowledge: Especially for senior roles, we're not just looking for someone who can write code (or prompt engineer it) but for someone who understands the underlying problems and the intricacies of the tools they're using to solve them.

The BIG problems with the recruitment process

Recruiting is never a straightforward journey, and there are always challenges along the way. But there are a few problems that I've seen come up time and time again that I think are worth highlighting:

There is no shortage of interesting problems to work on. I don't know who needs to hear this, but this can't be the only thing you're selling as an employer. With more opportunities out there for candidates than ever before, the best candidates have their pick of the litter. They're looking for more than just interesting problems to work on. They want a company that aligns with their values and a team with which they can gel. The recruitment process offers the first impression that candidates get of your company, and it's crucial to make it count.
You can't capture everything in a single assessment. The harsh reality that both recruiters and candidates might not want to accept is that one test can't capture everything. The recruitment process is a funnel, and with hundreds of applicants for every role, it's rare to compare them all on one test and find a single standout.
The most useful tests are often orthogonal. If we accept the premise that the recruitment process is a funnel, then we must also accept that there's more than one test. Repeatedly testing candidates on the same areas is redundant. However, the challenge is that the most meaningful tests can sometimes evaluate completely different skills. For instance, a coding test can show coding proficiency but might not reflect a candidate's teamwork or communication abilities. This means that sometimes candidates who excel in one area but lack in another can progress far into the recruitment process before their shortcomings become apparent.
The process is time-consuming and expensive. Especially in tech, the competition for the best candidates is fierce, and the data you receive can often be noisy. Non-technical recruiters may struggle to assess a candidate's quality and might rely on proxies like years of experience or familiarity with specific technologies. However, this approach can be gamed and doesn't necessarily indicate a candidate's ability.
The process is stressful for candidates. Recruitment is stressful for everyone involved, especially for candidates. They're often juggling multiple roles and interviews simultaneously. Many even have to take time off from their current jobs for interviews. This places a significant burden on them, and it's essential to make the process as smooth as possible.

ChatGPT, Turing Tests and Vape Nation

Ahh yes, ChatGPT and LLMs - the real reason why you might have accidentally slipped into this rant about why recruitment has become so very interesting lately. These tools have transformed our workflows and can produce content that sound like Shakespeare on Red Bull and cocaine. There's a dark side to these tools. They're so good at what they do that they can make anyone look like a genius just enough to mask a candidate's true skills at first glance. The first impression isn't all that matters when you're trying to bring someone onboard a team - you're trying to build long lasting professional relationships with every engineer you bring onboard but that first glance is now more effective than the face masks in the Mission Impossible series.

Every step of the interview process has become a Turing Test. We've seen candidates copy paste solutions from ChatGPT into the editor for the take home assessment (which we can see since it records how you got to your answer...). We've seen candidates start sharing their screen and accidentally show us that the last 10 minutes of what they said were read off a chatgpt chat window word for word (and then they try to play it off like it was a joke...). We've even seen one candidate, an experienced data engineer with close to 10 years in the industry working with NoSQL and Postgres, artificially hallucinate (INCLUDE LINK TO THE RESEARCH PAPER DEFINING THE TERM) with ChatGPT that "NoSQL databases were the best relational database".

If LLMs were truly sufficient in the way that candidates are treating them in the interview process, then we wouldn't need to hire any new engineers at all. The industry still needs software engineers that can solve real problems - with a more powerful toolset. The way that some engineers are becoming entirely dependent on these tools is as concerning as if someone used Wheely's for so long that they forgot how to walk (I can't believe these were ever a thing...please tell me someone remembers this??).

I've interviewed candidates that have been self-procraimed python experts and worked with the language for 10 years but they couldn't check if a number was even or odd without an LLM, others who couldn't tell a class for a function and didn't know how to use a print statement. These are not exagerrations! We've had another candidate get so angry at the suggestion that we were asking them to solve a question without ChatGPT that they started shouting during the interview! Even if any of these candidates were able to get through the early stages of the interview process (and one of them did), they were immediately caught out in the system design interview and we had to end the interview early. As if none of those stories weren't wild enough, enter my personal favourite: the vaping interviewee. Nothing quite screams "I'm the engineer you want" like a cloud of watermelon-scented smoke rising past your webcam. Okay fine, that one wasn't ChatGPT's fault but its still hard not to mention...

It's still not right though...

I have mixed feelings about coding tests. They're not perfect, but major tech companies use them for a reason. I remember dreading them when I was at university and often feeling that they were a poor reflection of my capabilities. Everyone knows they're flawed but they don't get used just because companies are lazy or want to see you squirm.

They test relatively well for computer science fundamentals. They're (relatively) respectful of candidates' time - they can take a test on Hackerrank whenever it suits them which makes them easier to manage around a work schedule and other interviews. They're also time limited which means candidates cannot spend an infinite amount of time on the test and they're forced to make decisions about what to prioritize and what to leave out. Back when we were using an open-ended project on GitHub although we asked candidates to respect the time limit there was no way to enforce it and it meant that many spent far longer on these tests than they should have.

To make sure the tests are fair (you'd be surprised how few companies do this...), we've also made sure that every engineer in our team has taken the test (even if they joined long before it was used) just to make sure that we're setting the right expectations and not giving tests that are impossible to finish.

The recruitment process we built

After 2 years of recruiting candidates, 100s of interviews and 10s of external recruiters, we've settled on a process that works for us. It's far from perfect but we're always trying to learn how to make it better than before. Here's how it works:

First Chat (25-35 mins): First chat with a split of technical and non-technical questions. This is a chance for us to get to know the candidate and for them to get to know us. We're looking for candidates that are a good fit for the team and the company and this is a chance for us to get a feel for that. We're also looking for candidates that are good communicators and can explain technical concepts in a clear and concise way. We're not looking for candidates that know everything but we're looking for candidates that can learn quickly and have the tenacity to figure things out.
Take Home Assessment (80-140 mins): Although this is controversial, of the 100s of candidates we've interviewed only a very small percentage were resistant to takehome assessments (less than 20 out of ~1000). Our second step is a take home assessment specifically designed for each role we're hiring for. This used to be an open-ended question on GitHub but now it's administered through HackerRank. We don't use their pre-made questions and rely on questions we've designed specifically to assess what we know is important in the roles we hire for. We're looking for candidates that can solve problems in a structured way and can communicate their ideas clearly and its important to us that candidates can do a test in a proctored environment like this (surprise surprise...we can see when you copy paste in the entire solution at once...(and yes, we did warn you)). We're not interested in finding someone that gets every single question right, or finds the perfect solution to every problem. We review the tests holistically and keep an open mind when at this stage most companies leave things purely numeric.
System Design Interview (90-120 mins): The system design interview is a chance for us to get a feel for how the candidate thinks about a complex problem that the team internally has already worked on a few years ago. We're looking for candidates that can break down a problem into smaller chunks and solve them one by one - that listen to the requirements and try to understand the problem before they jump in. This interview is an opportunity for a candidate to demonstrate their skill and while we might ask them questions about why they make certain decisions, the problem is intentionally broad enough so that they have an opportunity to show us what they know and put their best foot forward.
Culture Fit Interview (30-45 mins): This interview is an informal chat with the CEO and is a final sanity check for both sides to make sure that we're a good fit for each other. Here we're looking to see how comfortable people are with ambiguity and how they deal with it.

Conclusion

I'd love to give every CV the love and attention it deserves - but between that and supporting the team we have and focusing on growing our products and serving our clients, you have to understand which wins every time? As a small team, without dedicated in house recruiters to filter candidates and manage the process, we share these responsibilities across our team and resources are finite.

So why HackerRank? Because amidst the chaos of ChatGPT impersonators and vaping interviewees, we needed some semblance of order. We needed a way to sift alluvial placer gold from sand without taking too much of the team's time while also respecting the time and love many put into their applications. Is it perfect? Far from it. But it's a tool, and in the wild west of modern tech recruitment, I'll take all the help I can get.