When Should You Normalize Embeddings? New Research Challenges Conventional Wisdom Researchers from Nara Institute of Science and Technology have published findings that fundamentally question how we…

When Should You Normalize Embeddings? New Research Challenges Conventional Wisdom Researchers from Nara Institute of Science and Technology have published findings that fundamentally question how we handle embedding magnitudes in retrieval systems and other similarity-based tasks. The core insight: cosine similarity and dot product represent just two points on a spectrum. By independently controlling normalization on each side of the comparison, they expose two intermediate variants- QNorm (query normalized, document unnormalized) and DNorm (document normalized, query unnormalized)- that significantly outperform both standard approaches. The results are striking: On out-of-domain retrieval benchmarks, unilateral normalization achieves up to 72% relative improvement over cosine similarity. On downstream RAG tasks, these gains translate to 24% better accuracy. The pattern holds across BERT-based encoders and LLM-based retrievers alike. Under the hood: Document magnitude directly affects ranking at inference time- larger magnitudes can signal relevance strength. Query magnitude operates differently: it modulates gradient flow during training by acting as a per-example temperature parameter in the softmax distribution. This asymmetry explains why fixing one side while preserving the other creates a more stable optimization landscape. The paper introduces a practical decision framework based on functional symmetry- whether the task treats both inputs as interchangeable. Asymmetric tasks (retrieval, recommendation, few-shot classification) benefit from unilateral normalization. Symmetric tasks (semantic textual similarity, CLIP with bidirectional loss, knowledge graph completion) still prefer cosine. For practitioners uncertain about task symmetry, learnable normalization parameters provide a data-driven solution. The Fisher Information Matrix condition number, computed on pretrained weights before training, predicts whether to preserve query or document magnitude. The framework extends beyond retrieval. On few-shot classification, DNorm beats both cosine similarity and the Euclidean distance baseline of Prototypical Networks by 5% accuracy. On collaborative filtering, unilateral variants outperform cosine by 38% on MovieLens-100K. This challenges a decade of inherited conventions where similarity functions were chosen by community default rather than task structure. The code and comprehensive benchmark results across six task families are available in the paper.

To view or add a comment, sign in

LinkedIn respects your privacy

Kuldeep Singh Sidhu’s Post

Explore content categories