The Leetcode Interview Is Over. Most Companies Haven't Noticed.
Algorithm puzzles test exactly the work AI now does for free. Here's what a technical interview should measure instead, and what mine looks like.
The classic technical interview asks a candidate to invert a binary tree on a whiteboard, from memory, under pressure, while a stranger watches. We ran this ritual for twenty years. It was never a great test, and everyone quietly knew it, but it had one honest defense: writing correct algorithmic code under constraints was at least adjacent to the job.
That defense is gone. The exact skill leetcode measures, producing known solutions to well-defined puzzles quickly, is now the single most automated part of software engineering. A model does it faster than your best candidate on their best day. When your interview tests the one thing the machine fully absorbed, passing it tells you almost nothing about whether someone can do the job that's left.
How the puzzle interview actually broke
It broke twice, in opposite directions, and the combination is fatal.
First, it broke as a measurement. The job changed. An engineer's day is now spent directing AI tools, reviewing generated code, and making judgment calls about architecture, trade-offs, and what not to build. Recalling the optimal solution to a graph problem correlates with that work about as well as spelling bees correlate with writing novels. You're grading a skill the role no longer exercises.
Second, it broke as a filter. Remote interviews plus capable models mean the puzzle can be solved by something other than the person on the call, and detection is a losing arms race I have no interest in fighting. But notice what the cheating panic obscures: even a completely honest leetcode pass has stopped predicting performance. The dishonest pass is just a louder version of the same emptiness.
There's a quieter cost, too. The engineers I most want to hire, the ones with a decade of shipped systems and taste to show for it, increasingly refuse to grind puzzle prep for the privilege of interviewing. The process filters them out before it ever sees them, and keeps the people with the most spare evenings. That is precisely backwards.
What the interview needs to measure now
Strip the job to what humans still uniquely contribute and the list is short: judgment about what to build, the ability to evaluate work they didn't type, and clear reasoning when the problem doesn't match any pattern in the training data. So test those. Directly.
- Can they review? Reviewing code you didn't write used to be maybe a tenth of the job. It's now closer to half. Almost no interview process tests it at all.
- Can they direct? Given powerful tools, do they decompose the problem well, give the tools the right context, and notice when the output is confidently wrong?
- Can they decide? Trade-offs, sequencing, what to leave out. The expensive mistakes in software were never syntax errors.
- Can they go deep? When something breaks two layers below the abstraction, do they have the foundations to descend, or do they only operate at the level the tools hand them?
The work sample I actually run
When I vet engineers for a client, the technical exercise is boring by design. No puzzles, no tricks. I take a small, realistic problem, the kind of thing that would be a two-day ticket, and I ask the candidate to make real progress on it in an hour, with AI tools not just allowed but expected. Then I watch how they work, because the process is the product.
The strong candidates all do a version of the same dance. They interrogate the problem before touching the tools. They give the model context in deliberate, structured pieces. When it produces something plausible, they slow down instead of speeding up, because plausible is exactly when the traps appear. One candidate recently caught a generated migration that would have silently dropped rows on a table with a null foreign key. Nothing in the code looked wrong. He just knew where that class of bug likes to live. That single moment told me more than any whiteboard session in my career.
The weak candidates ship the first answer that runs. They aren't stupid and they aren't lazy. They've simply never been asked to be responsible for output at this volume, and it shows immediately. You cannot detect this from a resume, a puzzle score, or a pleasant conversation. You have to watch the work.
The part that isn't technical
After the exercise we talk about it, and this conversation carries as much weight as the code. Why this approach? What would break at ten times the load? What did the model get wrong, and how did you know? What would you do with another day? Candidates who own their reasoning can defend it, adjust it, and tell me where they were guessing. That honesty about the edge of their own knowledge is the strongest hiring signal I know, and it's completely invisible to an autograder.
Companies keep running leetcode because it scales, it feels objective, and changing an interview loop is organizational surgery nobody volunteers for. I understand all three reasons. They're also how you end up with a team optimized for an era that ended. The interview is a specification of what you value; engineers read it and sort themselves accordingly. Specify puzzle recall and watch who shows up. Specify judgment and taste, and watch who shows up instead.
I run this vetting as a service: CTO-led screens and AI-era work samples, so every candidate who reaches you has already shown judgment on real work. Here's how it works, or let's talk →
