Blog posts


NMT with xFormers: Part 2

7 minute read


In the last post I explored subword tokenization, data prep, and the training loop for a transformer-based NMT model. While I had promised the next post would begin implementing decoding strategies, I wanted to explore efficiency gains to be had by digging into batch samplers. I’ll start by describing the problem we’re trying to solve.

NMT with xFormers: Part 1

8 minute read


In this post I explore the APIs of the wonderful new xFormers library while revisiting torchtext which has undergone a great deal of maturation since I last used it. This serves as an exploration and hopefully some motivation to develop tooling around the ideas explored here in hopes of developing a principled neural machine translation library that leverages optimized tooling under the hood. Let’s jump in.


Kantorovich’s Inequality

2 minute read


Kantorovich’s inequality is an inequality used in convergence analysis of the method of steepest descent. Specifically, we concern ourselves with the optimization problem