NMT with xFormers: Part 2

In the last post I explored subword tokenization, data prep, and the training loop for a transformer-based NMT model. While I had promised the next post would begin implementing decoding strategies, I wanted to explore efficiency gains to be had by digging into batch samplers. I’ll start by describing the problem we’re trying to solve.

NMT with xFormers: Part 1

In this post I explore the APIs of the wonderful new xFormers library while revisiting torchtext which has undergone a great deal of maturation since I last used it. This serves as an exploration and hopefully some motivation to develop tooling around the ideas explored here in hopes of developing a principled neural machine translation library that leverages optimized tooling under the hood. Let’s jump in.

Kantorovich’s Inequality

Kantorovich’s inequality is an inequality used in convergence analysis of the method of steepest descent. Specifically, we concern ourselves with the optimization problem





Introduction to Scala

Internal course, The MITRE Corporation, MITRE Institute, 2017

An introductory professional course covering the Scala programming language.

Introduction to Reactive Microservices

Internal course, The MITRE Corporation, MITRE Institute, 2019

An intensive professional course covering the basics of domain-driven design, the reactive manifesto, and microservice design.