Grep Beats Vector Databases for Coding Agents

This title was summarized by AI from the post below.

// Is Grep All You Need? // Pay attention to this, AI devs. (bookmark it) They find that grep-style text search, when wrapped in the right agent harness, matches or beats embedding-based retrieval on coding-agent tasks. Are vector databases even needed where this is all going? It might be that what coding agents needed was not better embeddings. It was a better harness design around primitive tools. If you operate a coding-agent stack that depends on a vector DB, it might be time to re-evaluate. My personal experience on this has been that agentic search, if done right, is more than good enough for a lot of use cases. But you also have to understand how to properly index and structure information for the agents to take advantage. At scale, vector databases do shine, so take that into account as well. In most cases, a hybrid approach often works best, but that's something we haven't figured out really well as of yet.

50 Comments

Elvis S. 1w

Paper: https://arxiv.org/abs/2605.15184 Learn to build effective AI agents in our academy: https://academy.dair.ai/

8 Reactions

Wawan Cenggoro

AI & Data Science Advisor (VP-level) at Lintasarta (Indosat Ooredo Hutchison Group) | NVIDIA Certified AI Instructor & Advisor | Lead Data Scientist | AI Consultant, Engineer & Researcher | World’s Top 2% Scientist 2023

In the end, it's just a search/retrieval problem. Vector search is just one candidate algorithm among many algorithms we can implement. Each algorithm has its own strengths and weaknesses.

11 Reactions

Shruti Roy 1w

Isn’t that why multi indexing RAGs are the answer?

Awais Naeem 1w

A developer asking 'find the function that validates user permissions' won't match with grep on 'validate' or 'permissions' if the function is named 'checkAccess.' Embeddings handle semantic variation. The paper's finding is for coding-agent tasks, not all retrieval tasks. The hybrid approach (grep for exact, embeddings for semantic) is the pragmatic path. The 'haven't figured out really well' is honest.

2 Reactions

Selvaraj Y 3d

We will see such endless discoveries. I think the fundamental problem relies on how we choose to solve them. Attempts like these are a must for us to rationalize our choices, so that is great. The majority of data-related problems pre-date the current explosion, but what was lost was an understanding of the reasons behind such structural shifts. We never shape data and process. Instead, we structure processing and fit data to efficient structures. AI is forcing us to look back at practices we may have abandoned. So, every time you see someone claim this structure is better than that one, the question that is often silently ignored is: for which processing problem? We never stopped to ask: how does the AI actually use it? And what was the core limitation that forced us to morph and invent these retrieval methods in the first place? Maybe asking these questions will force a structured evolution of bridging AI's gaps systematically, rather than relying on trial and error or based on hearsay, just as it did for other domains, such as Relational versus NoSQL, or row-based versus columnar storage.

1 Reaction

Stefan Neubig 3d

I agree that grep-style search can be very effective for coding-agent harnesses and local agentic search. But I don’t think this should be framed as a novel thing. Even the paper itself is more careful: it evaluates long-memory conversational QA, where literal evidence spans matter a lot. In RAG, hybrid retrieval combining vectors and BM25 or other lexical methods has been a known pattern for years - so I'd say yes, we actually figured that part out :) Grep is a great fit for local, literal tasks like coding, where symbols, paths, errors and function names matter. But for RAG at scale, especially in domain-heavy systems, the baseline is rarely pure vector search.

See more comments

To view or add a comment, sign in

More Relevant Posts

YuAn Chang
1w
Report this post
In the era of AI agents, the key advancement may no longer lie in building more sophisticated search engines, but in developing agents that can utilize search tools more effectively.
Elvis S.

Founder at DAIR.AI | Angel Investor | Advisor | Prev: Meta AI, Galactica LLM, Elastic, Ph.D. | Serving 7M+ learners around the world
1w

// Is Grep All You Need? // Pay attention to this, AI devs. (bookmark it) They find that grep-style text search, when wrapped in the right agent harness, matches or beats embedding-based retrieval on coding-agent tasks. Are vector databases even needed where this is all going? It might be that what coding agents needed was not better embeddings. It was a better harness design around primitive tools. If you operate a coding-agent stack that depends on a vector DB, it might be time to re-evaluate. My personal experience on this has been that agentic search, if done right, is more than good enough for a lot of use cases. But you also have to understand how to properly index and structure information for the agents to take advantage. At scale, vector databases do shine, so take that into account as well. In most cases, a hybrid approach often works best, but that's something we haven't figured out really well as of yet.
Like Comment
To view or add a comment, sign in
Vadim Liavitski
2w
Report this post
Here's the AI-coding failure mode no one talks about: the tests pass for the wrong reason. AI agents now write decent-looking tests. They pass. Coverage is green. Then prod breaks and you find this: // "loading=true renders Spinner instead of label, disables press" expect(queryByText('Go')).toBeNull(); fireEvent.press(button); expect(onPress).not.toHaveBeenCalled(); 100% line coverage. Both assertions pass. Both for the wrong reason: A missing label ≠ a Spinner rendered. fireEvent.press in React Native Testing Library bypasses the disabled prop. The handler IS called — the test just happened to be set up so it didn't matter. Drop the Spinner from the source — test passes. Remove the disabled handling — test passes. Both regressions ship. Coverage measures execution, not detection. And AI is especially good at producing tests that execute lines without asserting their behavior. So I added a skill to claude-agentic-flow called /verify-tests that closes this gap empirically. It runs mutation testing on the changed source files. An external tool (Stryker / mutmut / PIT / Stryker.NET) mutates your code one operator at a time. Every mutant the tests don't catch is a real test gap — surfaced as file:line:column with a concrete suggested assertion. Design choices that mattered: → Scopes to git diff — full-repo mutation runs are hours; the diff is minutes. → Gates on the diff, not absolute score — no punishing devs for legacy code. → Realistic target 75–85% — chasing 100% produces noise tests, not signal. → Tool-agnostic — same skill, four ecosystems. The skill is one markdown file. Drop it in .claude/skills/, invoke after writing tests, fix the survivors. 🔗 File - https://lnkd.in/dAee9SQ2 ⭐ Star repo if you find it useful! Have you caught AI writing "fake green" tests in your projects yet? #AI #SoftwareTesting #MutationTesting #DeveloperTools #Claude
2 Comments
Like Comment
To view or add a comment, sign in
Damon McMillan
1w
Report this post
We just dropped new research on AI coding agents. A controlled experiment: 1,650 agent coding sessions, 3 frontier models, and 16,050 individual code outputs, to find out what actually makes an agent follow its instructions. The surprise: most of what people stress about didn't matter in the data. - The size of the agent's instruction file (CLAUDE.md) had no measurable impact between 25 and 500 lines - Where the rule sits in the file had no measurable impact. - Splitting instructions across multiple files (CLAUDE.md, AGENTS.md, nested files) had no measurable impact. What did matter: Agents drift the more code they generate. Around 5.6% lower odds of compliance per code output, with steeper drift on more demanding tasks. And on bigger codebases, compliance ran lower throughout. Translation: - Static instruction files alone aren't enough for rules that have to hold. - Critical rules need automated guardrails during and after the work (hooks, linters, CI checks). - Context that grows with your codebase needs to live outside the instruction file, retrievable on demand. That last one is why we built greppable.ai. Paper: https://lnkd.in/gddqFjiZ
2 Comments
Like Comment
To view or add a comment, sign in
Roshen Sanjay Nair
1w Edited
Report this post
𝘏𝘰𝘸 𝘤𝘢𝘯 𝘸𝘦 𝘢𝘶𝘵𝘰𝘯𝘰𝘮𝘰𝘶𝘴𝘭𝘺 𝘪𝘮𝘱𝘳𝘰𝘷𝘦 𝘓𝘓𝘔 𝘴𝘺𝘴𝘵𝘦𝘮𝘴 𝘰𝘯 𝘱𝘳𝘰𝘣𝘭𝘦𝘮𝘴 𝘩𝘶𝘮𝘢𝘯𝘴 𝘢𝘳𝘦 𝘢𝘤𝘵𝘪𝘷𝘦𝘭𝘺 𝘸𝘰𝘳𝘬𝘪𝘯𝘨 𝘰𝘯? We could improve the model itself, or we can also enable the model to meta-learn how to improve the harness around it. Excited to have contributed to this very interesting research paper, 𝗠𝗲𝘁𝗮-𝗛𝗮𝗿𝗻𝗲𝘀𝘀: 𝗘𝗻𝗱-𝘁𝗼-𝗘𝗻𝗱 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗠𝗼𝗱𝗲𝗹 𝗛𝗮𝗿𝗻𝗲𝘀𝘀𝗲𝘀. In this paper, we study this problem of how to automatically improve the scaffolding around language models over long horizons, where credit assignment spans past code, traces, and evaluation signals. The shift explored is toward 𝐦𝐞𝐭𝐚-𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠–𝐝𝐫𝐢𝐯𝐞𝐧 𝐡𝐚𝐫𝐧𝐞𝐬𝐬 𝐨𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧: moving beyond static, human designed pipelines and instead enabling agents to learn how to structure their own execution and evaluation loops. Concretely, the agent: • has access to its full history of experience stored in a filesystem (which can grow very large) • selectively retrieves relevant signals across past runs • forms hypotheses about which aspects of the harness are most important • iteratively refines the harness to improve end performance. This creates a 𝘀𝗲𝗹𝗳-𝗶𝗺𝗽𝗿𝗼𝘃𝗶𝗻𝗴 𝗹𝗼𝗼𝗽 𝗼𝘃𝗲𝗿 𝘁𝗵𝗲 𝗵𝗮𝗿𝗻𝗲𝘀𝘀 𝗶𝘁𝘀𝗲𝗹𝗳, not just the model. We observe state-of-the-art performance on Terminal-Bench 2.0, along with improvements on text classification and math reasoning benchmarks (details in the paper), suggesting that a substantial portion of performance gains can come from optimizing the harness, not only from scaling or tuning the underlying model. 𝗟𝗶𝗻𝗸𝘀 Website: https://lnkd.in/gkx8Zrue Paper: https://lnkd.in/gQQuADBs GitHub: https://lnkd.in/gABe_3g9 It has been encouraging to see strong interest in our paper across the AI community. A few examples of articles & posts about our work from LinkedIn, X, and research channels: • Junyang Lin (former Qwen lead) discussed & reposted: https://lnkd.in/gq9QessA • https://lnkd.in/g_H96GjB • https://lnkd.in/gV-r_6G8 • https://lnkd.in/gJqNxWqH • https://lnkd.in/ghWuvUf5 • https://lnkd.in/g4wevF3d Grateful to be part of this wonderful research and to work with an excellent team. Special thanks to Yoonho Lee for his mentorship and guidance throughout the project as part of Chelsea Finn’s IRIS Lab.

5 Comments
Like Comment
To view or add a comment, sign in
Simon Hamblin
3w Edited
Report this post
Researchers recently published a full source-code breakdown of how Claude Code actually works. They found that 1.6% of the codebase is AI decision logic. The other 98.4% is operational infrastructure. What makes Claude Code great is the system, not the AI. As a vibe-coder, digging into Claude Code made me realise the same principle applied to how I was using AI at work. I had been building projects with one big instruction document trying to cover everything. It worked, but as complexity grew, rules buried halfway down a wall of text got lost and the AI would forget things. Breaking it into specific skill files fixed that. Project instructions should focus on three things: who the AI should be, the workflow it should follow, and a list of the skills it can draw on. Then skills are created, a focused set of instructions saved as a .txt file in your project. It only gets read when relevant. The AI selects which skills to use. You stay in control by confirming each step before the next starts. That is the attempt to mimic the 98.4%. You are not prompting harder. You are building a system. If your work has a process, even a complex one, AI can follow it with a proper workflow. Hope this was useful. Message me if you want help setting one up. Source: https://lnkd.in/gYpPGctP

2 Comments
Like Comment
To view or add a comment, sign in
Pascal Biese
1w
Report this post
No embedding model. No vector index. Just grep. A new paper shows that letting an AI agent search a raw corpus with basic terminal tools - grep, file reads, shell commands - substantially outperforms conventional retrieval systems on multiple benchmarks. The setup is called Direct Corpus Interaction (DCI). Instead of compressing all corpus access into a single top-k similarity search, the agent interacts with documents directly, the same way a developer would navigate a codebase from the command line. Why does this work? Standard retrievers force everything through one narrow step: query in, ranked list out. Exact lexical constraints, multi-step hypothesis refinement, and combining weak clues across documents all get bottlenecked by that single retrieval call. Evidence filtered out early is gone forever - no amount of downstream reasoning recovers it. DCI sidesteps this entirely. The agent can grep for an exact string, read surrounding context, refine its hypothesis, and search again - all without any offline indexing or embedding infrastructure. On several BRIGHT and BEIR datasets, this approach outperformed strong sparse, dense, and reranking baselines. On BrowseComp-Plus and multi-hop QA, it achieved strong accuracy with zero reliance on semantic retrieval. No precomputed embeddings. No reranker pipeline. No index maintenance. Just a richer interface changes what's retrievable. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡

36 Comments
Like Comment
To view or add a comment, sign in
James A. Rolfsen
1w
Report this post
Since the day it was created, Claude Code has implemented a different and simpler pattern for searching files locally on your machine. The grander strategy here extends beyond just “grep” and enables an entirely different family of multi-step functions for exploring files. And none of this requires creating an embedding index, which is integral to traditional RAG systems. From my standpoint, this is valuable to read not because it completely replaces RAG systems, but because it is the ultimate complement to traditional, vector-based semantic search. Each of these strategies has their own trade-offs. But a system that uses both is basically unstoppable. It’s worth a bookmark! 😁 #embeddings #vectors #search #agents #RAG #AI #claudecode

Pascal Biese

AI Lead at PwC </> Daily AI highlights for 80k+ experts 📲🤗
1w

No embedding model. No vector index. Just grep. A new paper shows that letting an AI agent search a raw corpus with basic terminal tools - grep, file reads, shell commands - substantially outperforms conventional retrieval systems on multiple benchmarks. The setup is called Direct Corpus Interaction (DCI). Instead of compressing all corpus access into a single top-k similarity search, the agent interacts with documents directly, the same way a developer would navigate a codebase from the command line. Why does this work? Standard retrievers force everything through one narrow step: query in, ranked list out. Exact lexical constraints, multi-step hypothesis refinement, and combining weak clues across documents all get bottlenecked by that single retrieval call. Evidence filtered out early is gone forever - no amount of downstream reasoning recovers it. DCI sidesteps this entirely. The agent can grep for an exact string, read surrounding context, refine its hypothesis, and search again - all without any offline indexing or embedding infrastructure. On several BRIGHT and BEIR datasets, this approach outperformed strong sparse, dense, and reranking baselines. On BrowseComp-Plus and multi-hop QA, it achieved strong accuracy with zero reliance on semantic retrieval. No precomputed embeddings. No reranker pipeline. No index maintenance. Just a richer interface changes what's retrievable. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡

1 Comment
Like Comment
To view or add a comment, sign in
Subhash Nair - MBA (Finance,Strategy), B.Eng.(Computer), Lean 6 σ
1w
Report this post
What happens when you rush to adopt a nascent technology corporate-wide, tightly coupling your stack to a provider, only to discover that grep and a shell command would have outperformed the entire pipeline? Adopting technology while it’s still in flux usually leads to high technical debt. If you built your entire stack around a specific vector provider's API in 2023, you’re now finding that the "state of the art" has shifted toward agentic reasoning and raw text interaction. When you bake a specific provider's embedding logic into your core architecture, switching to a "just use grep" approach requires a massive de-coupling effort - one that most corporate teams are too bogged down to execute. #TechnicalDebt #SoftwareArchitecture #GenerativeAI

Pascal Biese

AI Lead at PwC </> Daily AI highlights for 80k+ experts 📲🤗
1w

No embedding model. No vector index. Just grep. A new paper shows that letting an AI agent search a raw corpus with basic terminal tools - grep, file reads, shell commands - substantially outperforms conventional retrieval systems on multiple benchmarks. The setup is called Direct Corpus Interaction (DCI). Instead of compressing all corpus access into a single top-k similarity search, the agent interacts with documents directly, the same way a developer would navigate a codebase from the command line. Why does this work? Standard retrievers force everything through one narrow step: query in, ranked list out. Exact lexical constraints, multi-step hypothesis refinement, and combining weak clues across documents all get bottlenecked by that single retrieval call. Evidence filtered out early is gone forever - no amount of downstream reasoning recovers it. DCI sidesteps this entirely. The agent can grep for an exact string, read surrounding context, refine its hypothesis, and search again - all without any offline indexing or embedding infrastructure. On several BRIGHT and BEIR datasets, this approach outperformed strong sparse, dense, and reranking baselines. On BrowseComp-Plus and multi-hop QA, it achieved strong accuracy with zero reliance on semantic retrieval. No precomputed embeddings. No reranker pipeline. No index maintenance. Just a richer interface changes what's retrievable. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡
Like Comment
To view or add a comment, sign in
Vinod Sharma
1w
Report this post
Interesting direction from the paper: retrieval may be the wrong abstraction for strong agents. Traditional RAG: Corpus → Retriever → Top-k chunks → LLM reasons Proposed shift (“Direct Corpus Interaction”): Corpus ↔ Agent ↔ tools (grep/bash/files/scripts) Instead of one-shot retrieval, the agent iteratively explores the corpus using tools, refining its search as it learns more. This closely mirrors Claude Code-style agents: they don’t depend on a single retrieval step, but repeatedly search, inspect files, and update hypotheses as they navigate a codebase.

Pascal Biese

AI Lead at PwC </> Daily AI highlights for 80k+ experts 📲🤗
1w

No embedding model. No vector index. Just grep. A new paper shows that letting an AI agent search a raw corpus with basic terminal tools - grep, file reads, shell commands - substantially outperforms conventional retrieval systems on multiple benchmarks. The setup is called Direct Corpus Interaction (DCI). Instead of compressing all corpus access into a single top-k similarity search, the agent interacts with documents directly, the same way a developer would navigate a codebase from the command line. Why does this work? Standard retrievers force everything through one narrow step: query in, ranked list out. Exact lexical constraints, multi-step hypothesis refinement, and combining weak clues across documents all get bottlenecked by that single retrieval call. Evidence filtered out early is gone forever - no amount of downstream reasoning recovers it. DCI sidesteps this entirely. The agent can grep for an exact string, read surrounding context, refine its hypothesis, and search again - all without any offline indexing or embedding infrastructure. On several BRIGHT and BEIR datasets, this approach outperformed strong sparse, dense, and reranking baselines. On BrowseComp-Plus and multi-hop QA, it achieved strong accuracy with zero reliance on semantic retrieval. No precomputed embeddings. No reranker pipeline. No index maintenance. Just a richer interface changes what's retrievable. ↓ 𝐖𝐚𝐧𝐭 𝐭𝐨 𝐤𝐞𝐞𝐩 𝐮𝐩? Join my newsletter with 50k+ readers and be the first to learn about the latest AI research: llmwatch.com 💡

4 Comments
Like Comment
To view or add a comment, sign in
OURS GLOBAL

420 followers
2w
Report this post
I Tried Running Claude Code for Free — It Turned Into a Much Bigger Discovery I thought it would take 20 minutes. Install Claude Code, plug in something free, done. That was the plan. Instead, I spent hours figuring out why things weren’t working — CLI issues, environment variables not loading, APIs behaving differently than expected. At one point, I wasn’t even sure if the idea itself would work. But then something clicked. And once it did, I realized I hadn’t just set up Claude Code for free… I had unlocked a completely different way to use AI tools. Read more: https://lnkd.in/gFyNZ6vm

I Tried Running Claude Code for Free — It Turned Into a Much Bigger Discovery medium.com
Like Comment
To view or add a comment, sign in

86,257 followers

View Profile Connect

LinkedIn respects your privacy

Grep Beats Vector Databases for Coding Agents

More from this author

OpenAI Introduces Operator & Agents

My Favorite LLM Papers for October

Tracking LLMs with Comet

Explore content categories

Grep Beats Vector Databases for Coding Agents

More Relevant Posts

More from this author

OpenAI Introduces Operator & Agents

My Favorite LLM Papers for October

Tracking LLMs with Comet

Explore related topics

Explore content categories