MTMA22 Day 2: Hardening the API

3 minute read

Published:

After yesterday’s productive day of fighting with setuptools and cmake we were able to get a kind-of-awkward-to-use pymarian translator working. Our outstanding TODO’s were to test the speed (to ensure we aren’t incurring terrible latency by adding bindings) and to clean up some of the API’s.

I started the day by benchmarking the existing OPUS MT models using the Marian command-line interfaces. We wrote a script which tries to approximate the internal batching done by Marian for the Python driver:

import sys
import itertools

from pymarian import Translator

marian = Translator(sys.argv[1])

def batch(s, batch_size=1600):
    it = iter(s)
    while True:
        chunk = list(itertools.islice(it, batch_size))
        if not chunk:
            return
        yield chunk

for b in batch(map(str.strip, sys.stdin)):
    print(marian.translate("\n".join(b)))

We then compare compare the speed and translations of WMT20 en-de using both with a tiny microbenchmarking script:

#!/usr/bin/env sh

function translate_native {
 sacrebleu -t wmt20 -l en-de --echo src | \
   spm_encode --model MTMA/source.spm |  ../build/temp.linux-x86_64-3.8/PyMarian/marian-decoder --quiet \
     -c  MTMA/decoder.yml -b4 --mini-batch 16 --maxi-batch 100 -d 0 1 2 3 \
   | spm_decode --model  MTMA/target.spm > translations_native.txt
}

function translate_pybind {
   sacrebleu -t wmt20 -l en-de --echo src | \
   spm_encode --model MTMA/source.spm | \
   python test_translate.py '--config MTMA/decoder.yml -b4 --mini-batch 16 --quiet --maxi-batch 100 -d 0 1 2 3' \
   | spm_decode --model  MTMA/target.spm > translations_pybind.txt
}

echo -n "Native: "
time translate_native
echo
echo -n "Pybind: "
time translate_pybind

diff *.txt

and find that the speeds were reasonably close and the translations were identical. This is a great start to the work day. :-)

The next thing to address was the overloaded pymarian.Translator.translate issue described in the last post. We added an overloaded method to the underlying C++ which currently reuses the str-in, str-out method by joining on newline and splitting on newline. This is a bit of a hack, but gives us the shape of the API we want and a direction for refactoring Marian in the future. This required learning a bit about overloading in Pybind11, but it’s surprisingly clean so no big deal there.

We also began simplifying the hobbled-together interface between setuptools and cmake with scikit-build which transparently handles the touchpoints and compiles cmake extensions with clean option forwarding. Like most packaging nonsense, this took the longest to figure out and likely requires some debugging still.

Meanwhile, Liling Tan and Alex Muzio began working on a thin shim between Huggingface’s interfaces for seq2seq and a mocked-up pymarian API. There is some discontinuity due to the types expected by the various libraries at inference time: HF transormers expects a dictionary containing indices (among other things), whereas pymarian expects a subword-segmented string. More thought needs to go into this, but we might just work around this by taking advantage of duck-typing. Another option is to simplfy create a transformers.Pipeline which only handles inference and has a more-or-less str-in, str-out interface. Otherwise this shim code looks nice. Our next step is to solidify packaging, begin working on CI, and hopefully publish some artifacts. Stay tuned. :-)