Blog posts

2022

NMT with xFormers: Part 2

7 minute read

Published:

In the last post I explored subword tokenization, data prep, and the training loop for a transformer-based NMT model. While I had promised the next post would begin implementing decoding strategies, I wanted to explore efficiency gains to be had by digging into batch samplers. I’ll start by describing the problem we’re trying to solve.

NMT with xFormers: Part 1

8 minute read

Published:

In this post I explore the APIs of the wonderful new xFormers library while revisiting torchtext which has undergone a great deal of maturation since I last used it. This serves as an exploration and hopefully some motivation to develop tooling around the ideas explored here in hopes of developing a principled neural machine translation library that leverages optimized tooling under the hood. Let’s jump in.

2021

Kantorovich’s Inequality

2 minute read

Published:

Kantorovich’s inequality is an inequality used in convergence analysis of the method of steepest descent. Specifically, we concern ourselves with the optimization problem