No, you won't be vibe coding your way to production. Not if you prioritise quality, safety, security, and long-term maintainability at scale. Recently coined by former OpenAI co-founder Andrej Karpathy, "vibe coding" describes an AI-coding approach where developers focus on iterative prompt refinement to generate desired output, with minimal concern for the LLM-generated code implementation. At Canva, our assessment — based on extensive and ongoing evaluation of AI coding assistants — is that these tools must be carefully supervised by skilled engineers, particularly for production tasks. Engineers need to guide, assess, correct, and ultimately own the output as if they had written every line themselves. Our experimentation consistently reveals errors in tool-generated code ranging from superficial (style inconsistencies) to dangerous (incorrect, insecure, or non-performant code). Our engineering culture is built on code ownership and peer review. Rather than challenging these principles, our adoption of AI coding assistants has reinforced their importance. We've implemented a strict "human in the loop" approach that maintains rigorous peer review and meaningful code ownership of AI-generated code. Vibe coding presents significant risks for production engineering: - Short-term: Introduction of defects and security vulnerabilities - Medium to long-term: Compromised maintainability, increased technical debt, and reduced system understandability From a cultural perspective, vibe coding directly undermines peer review processes. Generating vast amounts of code from single prompts effectively DoS attacks reviewers, overwhelming their capacity for meaningful assessment. Currently we see one narrow use case where vibe coding is exciting: spikes, proofs of concept, and prototypes. These are always throwaway code. LLM-assisted generation offers enormous value in rapidly testing and validating ideas with implementations we will ultimately discard. With rapidly expanding LLM capabilities and context windows, we continuously reassess our trust in LLM output. However, we maintain that skilled engineers play a critical role in guiding, assessing, and owning tool output as an immutable principle of sound software engineering.
Why Use Expert-in-the-Loop for LLM Coding
Explore top LinkedIn content from expert professionals.
-
-
Building useful Knowledge Graphs will long be a Humans + AI endeavor. A recent paper lays out how best to implement automation, the specific human roles, and how these are combined. The paper, "From human experts to machines: An LLM supported approach to ontology and knowledge graph construction", provides clear lessons. These include: 🔍 Automate KG construction with targeted human oversight: Use LLMs to automate repetitive tasks like entity extraction and relationship mapping. Human experts should step in at two key points: early, to define scope and competency questions (CQs), and later, to review and fine-tune LLM outputs, focusing on complex areas where LLMs may misinterpret data. Combining automation with human-in-the-loop ensures accuracy while saving time. ❓ Guide ontology development with well-crafted Competency Questions (CQs): CQs define what the Knowledge Graph (KG) must answer, like "What preprocessing techniques were used?" Experts should create CQs to ensure domain relevance, and review LLM-generated CQs for completeness. Once validated, these CQs guide the ontology’s structure, reducing errors in later stages. 🧑⚖️ Use LLMs to evaluate outputs, with humans as quality gatekeepers: LLMs can assess KG accuracy by comparing answers to ground truth data, with humans reviewing outputs that score below a set threshold (e.g., 6/10). This setup allows LLMs to handle initial quality control while humans focus only on edge cases, improving efficiency and ensuring quality. 🌱 Leverage reusable ontologies and refine with human expertise: Start by using pre-built ontologies like PROV-O to structure the KG, then refine it with domain-specific details. Humans should guide this refinement process, ensuring that the KG remains accurate and relevant to the domain’s nuances, particularly in specialized terms and relationships. ⚙️ Optimize prompt engineering with iterative feedback: Prompts for LLMs should be carefully structured, starting simple and iterating based on feedback. Use in-context examples to reduce variability and improve consistency. Human experts should refine these prompts to ensure they lead to accurate entity and relationship extraction, combining automation with expert oversight for best results. These provide solid foundations to optimally applying human and machine capabilities to the very-important task of building robust and useful ontologies.
-
How to Use AI to Build Things (Even If You’re Not Technical) — #4 LLMs can write working code, but it becomes far more useful when we reshape it to match how we think as #clinicians. That means refactoring not just for clarity, but to align with our clinical mental models. Take a creatinine clearance calculation. Instead of leaving it as one big block, break it into steps that reflect how we think: - calc_ideal_body_weight() and calc_adjusted_body_weight() - select_weight_to_use() - choose_crcl_formula(sex) - calculate_crcl() — combines all steps into one reusable abstraction Now the code mirrors our clinical workflow making it easier to understand, validate, and explain. There's also a technical bonus too :) - Reusability: use functions across tools - Readability: clearer for you and collaborators who are less technical - Composability: build complex workflows from simple parts If the LLM gives you raw code, ask it to refactor into functions. Then inject your domain knowledge to structure it in a way that makes sense to you. #HealthcareOnLinkedIn #VibeCoding #AI
-
"Will AI coding assistants replace AI engineers in 5 years?" ⬇️ My friend Drazen Zaric asked me this question over coffee, and it got me thinking about the future of AI engineering—and every other job. Here's what I learned from 10+ years in AI/ML: > 𝗟𝗟𝗠𝘀 𝗮𝗹𝗼𝗻𝗲 𝗰𝗮𝗻'𝘁 𝘀𝗼𝗹𝘃𝗲 𝗰𝗼𝗺𝗽𝗹𝗲𝘅 𝗽𝗿𝗼𝗯𝗹𝗲𝗺𝘀. They need the right context and expert human guidance. When I use Cursor for Python (my expertise), I code 10x faster. But with Rust (where I'm less expert)? It actually slows me down. > 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹 𝗴𝗮𝗺𝗲 𝗶𝘀𝗻'𝘁 (𝗮𝗻𝗱 𝗻𝗲𝘃𝗲𝗿 𝘄𝗮𝘀) 𝗮𝘁 𝘁𝗵𝗲 𝗰𝗼𝗱𝗶𝗻𝗴 𝗹𝗲𝘃𝗲𝗹 𝗮𝗻𝘆𝗺𝗼𝗿𝗲 It's about knowing WHAT to build and HOW systems work end-to-end. Companies need people who can: • Design the right solution architecture • Provide high-quality context to AI tools • Filter and refine AI outputs effectively • Understand the full stack from infrastructure to business logic > 𝗧𝗵𝗲 𝘄𝗶𝗻𝗻𝗲𝗿𝘀 𝘄𝗼𝗻'𝘁 𝗯𝗲 𝘁𝗵𝗼𝘀𝗲 𝘄𝗮𝗶𝘁𝗶𝗻𝗴 𝗳𝗼𝗿 𝗔𝗜 𝘁𝗼 𝗱𝗼 𝗲𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴. They'll be the experts who can accelerate their work 10x by combining deep system understanding with AI assistance. 10 years ago, knowing Python was enough for a data science job. Today, that's just the entry ticket. The value is in understanding how to orchestrate complex systems—from Kubernetes clusters to agentic workflows. > 𝗕𝗼𝘁𝘁𝗼𝗺 𝗹𝗶𝗻𝗲 Human expertise + LLMs = acceleration. Human expertise alone = slow progress. LLMs alone = endless loops and compounding errors. What's your experience using AI tools in your domain of expertise vs. areas where you're still learning? --- Follow Pau Labarta Bajo for more thoughtful posts
-
LLMs Won’t Level Product - They’ll Widen the Gap In an IEEE study, LLMs beat humans on coverage by over 40% - but still produced fewer acceptable user stories. A new IEEE study tested 10 state-of-the-art LLMs in an interview-based, real-world style requirements process: generating and evaluating agile user stories. The good: - High coverage: Models captured 73–96% of the “ground truth” requirements, far exceeding human students. - Strong structural quality: LLMs excelled in language clarity and internal consistency. - Well-formed baselines: Useful for getting an initial set of clean, syntactically sound user stories on the page. For example, Claude 3.5 Sonnet showed only 2.20% defected stories in AQUSA checks. - Quality checks: With a clear evaluation framework, top models matched or exceeded human–human agreement when assessing story quality. The bad: - Lower diversity and creativity: Humans explored far more of the requirements space; students’ average diversity was ~98.6% versus much lower for models. - Weaker rationale and problem framing: Many models struggled to make the “why” explicit, scoring notably lower on Rationale Clarity. - Fewer stories passed acceptance quality checks: Common defects included vague rationales and “and/and/and” stories that broke atomicity. - Quality variability: Even strong models produced a notable share of unacceptable stories compared with ground truth and students. This reinforces what I’ve said for yonks - although the study didn’t test experts using LLMs directly, the patterns make the likely effect clear: To a non-domain expert, AI looks magical. In the hands of a domain expert, it’s powerful. Give an LLM to someone without deep product sense, and it flattens their output - polished but narrow and locked to common patterns. Give it to a product expert, and it accelerates them - turning their contextual judgement, creativity, stakeholder insight, business acumen, and product sense into more complete, higher-quality outcomes faster. LLMs are not a leveler, they’re an amplifier, and they will widen the gap.
-
It's disappointing to me how many people are downplaying programming expertise and education with the rise of LLMs. A simple thought experiment: We've had the ability to outsource software engineering since the early 2000's by offshoring the work. Ask yourself this simple question. With offshoring, would it be more successful if the onsite lead knew how to code well? Or would it make no difference? Sure, large sections of my code base are written by Claude Code, but the reason I'm able to use it well is because I can still code relatively well without it. I still end up refactoring large chunks of what comes out of the LLMs, sometimes to fix functionality, other times to condense the codebase in such a way that the LLMs when they iterate or add new features they are more effective. My expertise as a programmer comes into play in the precision in how I prompt the LLM, how I steer the LLM as the code base becomes more complex either by prompting or manual refactoring and how I make suggestions on how to make the code faster. There is a huge difference between asking the LLM to make your code faster vs. telling it to cache large model objects with an LRU cache, both in the # of tokens spent and the code performance afterwards. And for the students out there. Please don't use the LLMs for all your course work. Use it for some, but please don't use it in your data structures and algorithms course. The point of the course work isn't to get an A, but to train yourself to see complex problems and break them down into their consistent pieces. In much the same way I make my kids to math by hand even though graphing calculators with computer algebraic solvers exist. (After they get to a point of expertise do the advanced calculators come out) At the end of the day, education is about you learning the fundamentals so you can use the tools better.
-
I use Claude Code on production codebases enough that I hit the Max limits. "Vibe coding" does not describe my work. A deep understanding of software engineering and computer systems is required to make the calls that keep a complex codebase healthy and keep my company’s engineering org able to maintain our production apps and services. LLMs get many details right, but it is also the norm for a few things to be wrong or not aligned with how we think about software engineering. It takes an expert eye to spot which 1 out of 10 outputs needs rework, or is simply wrong. A novice who trusts the LLM’s capabilities more than their own judgment will believe all 10. This is an excerpt from a memo on AI agents I shared with our CRO Joe Ryan: > LLMs accept imprecision. You can leave out details of your problem and solution and LLMs will fill in the blanks. They’ll often be wrong, but you will get something working end to end, which is valuable to iterate on. But you need to be able to spot gaps and mistakes in your prompts because the LLM will not reliably identify them. > LLMs create imprecision. You need to be able to spot mistakes in the LLM’s outputs, and the LLM cannot always check its own work. You need to already have a vision for the end state and the direction it lies in, and use the LLM to automate getting there faster. > Experts who understand a problem and are looking to accelerate solving it will be amplified in positive directions, scaling themselves. Novices who trust LLMs will be amplified in negative directions, becoming confident in wrong solutions. The frontier of what it means to be an expert will change. Experts will need to know how to apply AI and the boundaries of its capabilities. An expert software engineer will need the dexterity to wield a coding agent well. That dexterity will come from experience, intuition, and talent. A senior skill will be getting codebases, teams, and companies to work productively with agents. It has always been a senior skill to set organizations up for success and then achieve it. Typing source code is mostly dead. We’ll still edit a few lines here and there. Reading, and more importantly understanding, source code is very much alive. We’ll do more of this as code is written faster. The art and science of software engineering are blossoming again. This is not a renaissance; software engineering was never dead and is not being reborn. “Vibe coding” is different. It is something new being born. The dominant change, though, is the industry and discipline of software engineering are evolving more than they have since the internet, if not since the beginning.
-
LLMs are brilliant strategists, terrible operators? It’s tempting to use LLMs that are great at research and coding to orchestrate and execute business processes. That temptation hides a category error. LLMs are language tools excellent at writing reasoning, synthesis and explanation. Business processes are control systems: deterministic, auditable, state-safe. When LLMs run processes directly, you get: - Implicit logic hidden in prompts - Non-deterministic behavior - Poor auditability - Silent drift across model versions That’s not a tooling problem, it’s a structural mismatch. The right pattern: - LLMs propose and explain - Deterministic logic decides and validates - Systems execute Perhaps the future isn’t “LLMs running businesses.” It’s deterministic systems that consult LLMs—but never give them the keys.
-
9 months ago Color partnered with University of California, San Francisco to bring world-class cancer expertise to everyone. To achieve this, Color designed a novel architecture that safely integrates GenAI into clinical practice. We call it a “Large Language Expert” (LLE). We developed the LLE architecture to overcome challenges that prevent the use of GenAI in clinic (hallucinations, clinician trust & control, clear logic flows, etc). The big idea The Color team’s big idea was that while LLMs enable a huge step forward for some tasks, they are the wrong tool to embed clinical decision logic. Clinical decision logic is complex, constantly evolving and non-convergent (i.e. each system makes variations). This led us to the LLE: merging of machine learning with expert systems. How it works Rather than treating domain logic as “knowledge” that should be assimilated into a model, we created a framework to formally structure logic in a way that can be unambiguously understood by LLMs - and then use LLMs as a reasoning runtime. This enables applications that are reliable, but benefit from the flexibility of LLMs (eg. dealing with the mess of unstructured health records). Machine Learning meets Expert Systems We discovered that LLMs and Expert Systems are fantastically complementary. Expert Systems never worked because they forced a rigid structure that tried to turn all concepts into code. Historical attempts turned into an ocean-boiling mess that became too complex to manage. Instead, LLMs allow us to create a “bubble of consistency” specific to the task at hand. The world can stay messy now that we have an amazing toolset (LLMs) to map the messy world to specific criteria that drive a given task. But isn't this RAG? No - here’s why... In RAG, you inject information into the context of LLM prompts that is absent or under-represented at training-time. In an LLE application, rather than operating on raw content, we pre-process it through a pipeline that translates content into LLM-optimized, structured chains of logic. In a way, the LLE architecture makes RAG declarative. We toyed with calling it DRAG, but thought LLE sounded better :). Fantastic Results We applied the LLE architecture to build guideline-based cancer treatment workflows. This approach yielded important results: - Accuracy: The accuracy of the system across complex tasks was amazing (>95% accuracy across >12,000 decision factors). - Clinician Experience & Trust: It created a transparent and controllable experience for clinicians. A task that normally takes over an hour goes down to ~10 minutes. - Development Efficiency: We did this work with a small team that was able to rapidly build an efficient and reliable system. Updates and improvements were simple and straightforward to make compared to a black-box machine learning. Read More Blog post: https://lnkd.in/gQbyfhH7 White paper: https://lnkd.in/gf6v6gY7
-
Here's how Apple's key paper on the "Illusion of Thinking" is relevant to vibe coding: it underscores the critical importance of keeping human developers in the loop. LLMs are exceptional servants, capable of generating code at incredible scale and sophistication. They can reproduce complex language patterns with remarkable accuracy, but they lack the capacity for deep thinking or true engineering judgment. This is precisely where smart human developers become indispensable. The human provides the critical thinking and architectural vision; the AI executes that vision with speed and precision. This partnership produces genuinely high-quality code. Both extremes in the vibe coding debate—that AI cannot code at all, or that AI will completely replace human developers—are equally untenable. The future lies in this powerful synthesis.
