A year ago I spent an embarrassing amount of time comparing models.
GPT vs Claude.
Claude vs Gemini.
Gemini vs open-source.
Context windows, benchmarks, reasoning scores, latency comparisons. I treated model selection like it was the most important decision in the entire stack.
Lately I'm starting to think I had it backwards.
I've watched teams get incredible results from models that weren't considered "the best," while other teams struggle despite having access to state-of-the-art systems. The difference rarely comes down to intelligence. It usually comes down to how the work is structured around the model.
The best implementations I've seen have clear inputs, clear outputs, defined review steps, and tight feedback loops. The worst implementations tend to treat the model like a magical black box that should somehow solve an entire business problem on its own.
The more AI becomes a commodity, the more valuable process design seems to become. Two companies can use the exact same model and end up with completely different outcomes because one designed a better workflow around it.
I'm curious whether people building production AI systems have noticed the same thing or whether you still see model selection as the primary factor.
