<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://jluebeck.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://jluebeck.github.io/" rel="alternate" type="text/html" /><updated>2025-08-28T09:04:50-07:00</updated><id>https://jluebeck.github.io/feed.xml</id><title type="html">About</title><subtitle>Detecting and reconstructing focal amplifications in cancer genomes</subtitle><author><name>Jens Luebeck</name><email>jluebeck(at)ucsd.edu</email></author><entry><title type="html">Maximum entropy solver for the “Wordle” problem</title><link href="https://jluebeck.github.io/posts/WordleSolver" rel="alternate" type="text/html" title="Maximum entropy solver for the “Wordle” problem" /><published>2022-01-10T00:00:00-08:00</published><updated>2022-01-10T00:00:00-08:00</updated><id>https://jluebeck.github.io/posts/blog-post</id><content type="html" xml:base="https://jluebeck.github.io/posts/WordleSolver"><![CDATA[<h1 id="maximum-entropy-solver-for-the-wordle-problem">Maximum entropy solver for the “Wordle” problem.</h1>

<p>If you don’t know what Wordle is, take a quick break to familiarize yourself over at <a href="https://www.powerlanguage.co.uk/wordle">https://www.powerlanguage.co.uk/wordle/</a>.</p>

<p><strong>If you just want to see the solver itself, see the link at the bottom of the page.</strong></p>

<p>This simple and elegantly designed game captures just the right amount of randomness with strategy, making it an addictive pandemic-era hobby, <a href="https://www.nytimes.com/2022/01/03/technology/wordle-word-game-creator.html">with a neat backstory</a>. In this word-guessing game, feedback is given on the basis of the identity and locations of letters in each guessed word, making it an elimination problem to identify the correct word. This game is very similar to the game <a href="https://en.wikipedia.org/wiki/Mastermind_(board_game)">Mastermind</a> but with words.</p>

<p>The rules are simple, if the player guesses a letter correctly, in the correct position, it is marked green. If the player guesses a letter correctly in the incorrect position, it is marked yellow. If the player guesses an incorrect letter (or exceeds the number of times the letter appears in the word), it is marked grey. Players have six turns to identify the “secret” word.</p>

<p align="center">
  <img src="https://raw.githubusercontent.com/jluebeck/jluebeck.github.io/master/images/example_wordle.png" alt="Example Wordle" />
</p>

<p>Alright so what? The game actually presents an interesting computational challenge. Can a player always pick the right combination of words in order to win the game in six turns or less?
From a cursory check of <a href="https://twitter.com/search?q=wordle%20solver&amp;src=typed_query">Twitter</a>, there are dozens of people who have developed or are developing their own solvers for this game.</p>

<p>Wordle basically uses a Scrabble dictionary as a basis for the words the user can guess. However, the developers decided that many of the words are too obscure (e.g. <code class="language-plaintext highlighter-rouge">VOZHD</code>) for use as answers in the game, and thus use a reduced letter set of 2315 words which are simple enough to be in the common lexicon.</p>

<p>Disappointingly, this reduced word list from which the answer may be drawn is available in the source code, and it appears that the game simply iterates over the list in ordered fashion, making it possible to immediately see which word will be selected the next day. :( While this doesn’t ruin the game really, it’s better to protect things that are not supposed to be known by a player in order to keep the game fun even for those who are curious enough to look at the source code. It’s the same reason magicians use curtains - without them the magic is gone.</p>

<p>Source-code cheating aside, the computational challenge still stands - how would you pick words so that you maximize your chances of winning the game?</p>

<p>One of the more basic strategies is to make picks based on (positional) letter frequencies - e.g. start with <code class="language-plaintext highlighter-rouge">AROSE</code> or similar words which contain a combination of the most frequent letters in all five letter words. This may get a user pretty far, but struggles on certain words which contain more infrequently used letters of the alphabet. The approach also struggles to reduce wordsets that are highly similar, such as</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>WATER
HATER
LATER
MATER
RATER
</code></pre></div></div>
<p>etc…</p>

<p>A second strategy is to guess words which eliminate as many words as possible on each guess. There are multiple ways to accomplish this, one naive approach is to look at which words have any overlap with other words, and pick ones that so that overlap and non-overlap words are separated as equally as possible, <em>a la</em> binary search.</p>

<p>This idea is in theory quite good. A more refined version of this idea goes as follows. Instead of a binary overlap/no overlap strategy, what if the method attempts to maximize the entropy of the possible ways clues could be returned from the guess?</p>

<p>That is, for a given guess and a set of candidate words which may be the answer based on prior feedback, evaluate the possible set of feedbacks that can occur for the possible guess. This results in a collection of different feedback strings, between the possible guess and each of the candidates, e.g. <code class="language-plaintext highlighter-rouge">10010</code> or <code class="language-plaintext highlighter-rouge">22001</code> or <code class="language-plaintext highlighter-rouge">00000</code>, etc. The entropy is computed on the number of candidates assigned to each bin for the current candidate guess. The best guess maximized the entropy in order to give the greatest chance of reducing the candidate set by as much as possible. 
This way, for any feedback that is returned by Wordle (the “oracle”), then the probability that the remaining set of words is as small as possible is maximized. If the bins did not have maximum entropy (words not spread as evenly as possible), then a random feedback would be more likely to land on a larger bin than a smaller one, and thus would not eliminate as many words as possible given the feedback.</p>

<p>A diagram of how the method works is shown below:</p>

<p align="center">
  <img src="https://raw.githubusercontent.com/jluebeck/jluebeck.github.io/master/images/WordleSolver_v2.png" alt="Wordle solver schematic" width="800" align="center" />
</p>

<p>Here’s what the entropy distribution looks like for the top 15 hits in the full Wordle set. I found the maximum-entropy initial word for default Wordle is <code class="language-plaintext highlighter-rouge">SOARE</code> while for expanded wordlists it is <code class="language-plaintext highlighter-rouge">TARES</code>.</p>

<p align="center">
  <img src="https://raw.githubusercontent.com/jluebeck/jluebeck.github.io/master/images/entropies_full.png" alt="Wordle full set entropies" width="500" align="center" />
</p>

<p>Ties introduced by this maximum entropy method can be resolved with a few heuristics.</p>
<ol>
  <li>Check the non-overlap bin <code class="language-plaintext highlighter-rouge">00000</code> and pick the guess which minimizes its size.</li>
  <li>Pick a guess which is also a candidate (important when remaining set very small and few guesses left)</li>
  <li>Maximize the entropy of the letters in the guess.</li>
  <li>Consider the positional frequencies of the guess letters in the remaining possible answers and maximizes the total positional frequency.</li>
</ol>

<p>So, how well does this strategy perform?</p>

<p align="center">
  <img src="https://raw.githubusercontent.com/jluebeck/jluebeck.github.io/master/images/WordleResults_full.png" alt="Wordle solver results" width="800" />
</p>

<p>On the default Wordle answer set, <strong>the strategy always guesses the correct answer within six turns</strong> (<strong>100% win rate</strong>), and uses the starting word <code class="language-plaintext highlighter-rouge">SOARE</code>. However since the Wordle developers only use a reduced answer-space
to make the game easier, how well does it do if the possible answer can be any of the 12927 words in the Wordle dictionary? How about for the Scrabble five 
letter words? In those two cases, this strategy gets <strong><mark>99.67%</mark></strong> and <strong><mark>99.71%</mark></strong> of the words correct, respectively within six turns when we start with <code class="language-plaintext highlighter-rouge">TARES</code>.</p>

<p>One of the most revealing things about this method is that sometimes when the player is getting close to having an answer, it is better to take a step back to a guess which uses fewer correct letters, but which reduces the remaining search space by a larger amount. For instance, in our <code class="language-plaintext highlighter-rouge">WATER</code> example, if one knew <code class="language-plaintext highlighter-rouge">-ATER</code>, then the maximum entropy answer actually backs off and picks something like <code class="language-plaintext highlighter-rouge">ELCHI</code>, which eliminates <code class="language-plaintext highlighter-rouge">EATER</code>, <code class="language-plaintext highlighter-rouge">LATER</code>, <code class="language-plaintext highlighter-rouge">CATER</code> and <code class="language-plaintext highlighter-rouge">HATER</code> all in one go!</p>

<p>There is also a fun variation of this, where the game is “evil” and picks a different word after you make your guess, making it as hard as possible to get the right answer. This version is actually extraordinarily fun and I recommend checking out one implementation of it <a href="https://qntm.org/files/wordle/">here</a>.</p>

<h3 id="wordlesolver">WordleSolver</h3>
<p><strong><a href="https://wordle-solver.herokuapp.com/">Here is a link to the web-app which runs this method.</a></strong></p>

<p>The source-code for the solver is also available here: https://github.com/jluebeck/WordleSolver</p>

<h4 id="acknowledgments">Acknowledgments:</h4>
<p>I’d like to thank Ben Pullman for good discussions about this problem.</p>

<hr />]]></content><author><name>Jens Luebeck</name><email>jluebeck(at)ucsd.edu</email></author><category term="games" /><category term="algorithms" /><summary type="html"><![CDATA[Maximum entropy solver for the “Wordle” problem.]]></summary></entry><entry><title type="html">AA Quickstart</title><link href="https://jluebeck.github.io/posts/AA_quickstart" rel="alternate" type="text/html" title="AA Quickstart" /><published>2020-12-16T00:00:00-08:00</published><updated>2020-12-16T00:00:00-08:00</updated><id>https://jluebeck.github.io/posts/blog-post</id><content type="html" xml:base="https://jluebeck.github.io/posts/AA_quickstart"><![CDATA[<h1 id="quickstart-guide-for-aa-available-on-github">Quickstart guide for AA available on GitHub.</h1>

<p>For information on getting started with AmpliconArchitect, please check out my <a href="https://github.com/jluebeck/PrepareAA/blob/master/GUIDE.md">quickstart guide</a> - designed to help new users get going with AA in a manner
which uses our known best practices and addresses from FAQs.</p>

<hr />]]></content><author><name>Jens Luebeck</name><email>jluebeck(at)ucsd.edu</email></author><category term="AmpliconArchitect" /><category term="ecDNA" /><category term="extrachromosomal dna" /><category term="AA" /><summary type="html"><![CDATA[Quickstart guide for AA available on GitHub.]]></summary></entry><entry><title type="html">AmpliconReconstructor</title><link href="https://jluebeck.github.io/posts/AR_post" rel="alternate" type="text/html" title="AmpliconReconstructor" /><published>2020-09-14T00:00:00-07:00</published><updated>2020-09-14T00:00:00-07:00</updated><id>https://jluebeck.github.io/posts/blog-post</id><content type="html" xml:base="https://jluebeck.github.io/posts/AR_post"><![CDATA[<h1 id="ampliconreconstructor-paper-out-in-nature-communications">AmpliconReconstructor paper out in <em>Nature Communications</em></h1>

<p>Check out the blog post at the <a href="https://cancercommunity.nature.com/posts/ampliconreconstructor-revealing-focally-amplified-rearrangements-in-cancer">Nature Cancer Community</a></p>

<hr />]]></content><author><name>Jens Luebeck</name><email>jluebeck(at)ucsd.edu</email></author><category term="AmpliconReconstructor" /><category term="Bionano" /><category term="ecDNA" /><category term="extrachromosomal dna" /><summary type="html"><![CDATA[AmpliconReconstructor paper out in Nature Communications]]></summary></entry></feed>