Why this matters
Wordle looks simple, but from an algorithmic standpoint it’s a hard search problem with a huge discrete action space and delayed rewards. Most existing solvers rely on information theory and handcrafted strategies; very little work tests whether deep reinforcement learning (DRL) can learn to play Wordle directly from feedback. This project explored how far an Advantage Actor–Critic (A2C) agent can go toward solving Wordle when the rules, word length, and dictionary size are varied.
What we built
As a team, we implemented a Wordle environment and A2C pipeline in PyTorch Lightning. We created both character-level and word-level agents for 4-, 5-, and 6-letter versions of the game. The models use compact state encodings to track remaining turns, used letters, and letter-status information (green / yellow / gray), and output either per-character probabilities or word-level policies over the allowed vocabulary. We designed custom reward functions that score unique greens, yellows, and grays, added entropy regularization for exploration, and introduced dictionary reduction to shrink the effective action space for word-level agents.
Key findings
On the full action space (thousands of possible words), the character-level A2C agent was the only model that achieved a reasonable win rate across 4-, 5-, and 6-letter games, though it often needed many turns and did not always play valid English words. Word-level models struggled on the unrestricted dictionary, getting stuck in local minima, but their performance improved significantly when we restricted the action space to smaller subsets of words. Overall, the project showed both the promise and the limitations of off-the-shelf DRL methods on combinatorial word games like Wordle, and highlighted clear directions for future work in reward shaping, curriculum over action-space size, and better exploration.