Blog on Thamme Gowda

Blog on Thamme Gowdahttps://gowda.ai/posts/Recent content in Blog on Thamme GowdaHugoen-usMon, 30 Mar 2026 20:30:00 +0000From O(N) to O(log N): A Faster BPE Training Algorithm, Buried and Rediscoveredhttps://gowda.ai/posts/2026/03/faster-bpe-learn/Mon, 30 Mar 2026 20:30:00 +0000https://gowda.ai/posts/2026/03/faster-bpe-learn/I wrote a fast BPE training algorithm in 2020, buried it in a Python codebase, and forgot about it. Five years later, I rewrote it in C++ and benchmarked it: up to 11× faster than SentencePiece. The trick? A max-heap with lazy deletion instead of periodic linear scans.Building a Jinja2 Template Engine from Scratch in C++https://gowda.ai/posts/2026/03/parsing-tutorial-jinja/Tue, 10 Mar 2026 12:00:00 +0000https://gowda.ai/posts/2026/03/parsing-tutorial-jinja/A tutorial on building a Jinja2 template engine in C++ for rendering LLM chat templates. Covers the lexer, recursive descent parser, and tree-walking evaluator, with real examples from HuggingFace model templates.I Let Two AI Agents Race to Modernize pigzhttps://gowda.ai/posts/2026/03/pigzpp-with-agents/Sat, 07 Mar 2026 20:20:00 +0000https://gowda.ai/posts/2026/03/pigzpp-with-agents/I gave Claude Opus 4.6 and GPT 5.4 the same task: rewrite pigz in modern C++23 as a thread-safe library. One agent did a clean-room rewrite, the other wrapped the legacy code. The winner went on to beat pigz by up to 1.8x compression and 2.4x decompression.Sequence Transduction: Generalization and Challengeshttps://gowda.ai/posts/2021/05/nmt-generalization-n-challenges/Tue, 04 May 2021 10:20:00 +0000https://gowda.ai/posts/2021/05/nmt-generalization-n-challenges/Sequence to sequence transduction is a general problem, for which many other problems are special cases. I also highlight some challenges of this general problem.Many-to-English Machine Translation Tools, Data, and Pretrained Modelshttps://gowda.ai/posts/2021/04/mtdata-nlcodec-rtg-many-english/Sun, 25 Apr 2021 10:20:00 +0000https://gowda.ai/posts/2021/04/mtdata-nlcodec-rtg-many-english/We present useful tools for machine translation research: MTData, NLCodec, and RTG. We demonstrate their usefulness by creating a multilingual neural machine translation model capable of translating from 500 source languages to English. We make this multilingual model readily downloadable and usable as a service, or as a parent model for transfer-learning to even lower-resource languages.Macro-Average: Rare Types Are Important Toohttps://gowda.ai/posts/2021/03/macroavg-rare-types-important/Thu, 11 Mar 2021 10:20:00 +0000https://gowda.ai/posts/2021/03/macroavg-rare-types-important/We explore the simple type-based classifier metric, maf1, and study its applicability to MT evaluation. We find that MacroF1 is competitive on direct assessment, and outperforms others in indicating downstream cross-lingual information retrieval task performance.Finding the Optimal Vocabulary for Neural Machine Translationhttps://gowda.ai/posts/2020/11/2020-optimal-vocab-nmt/Sun, 01 Nov 2020 10:20:00 +0000https://gowda.ai/posts/2020/11/2020-optimal-vocab-nmt/We cast neural machine translation (NMT) as a classification task in an autoregressive setting and analyze the limitations of both classification and autoregression components. Classifiers are known to perform better with balanced class distributions during training. Since the Zipfian nature of languages causes imbalanced classes, we explore its effect on NMT.