MTMA22 Day 2: Hardening the API
Published:
After yesterday’s productive day of fighting with setuptools and cmake we were able to get a kind-of-awkward-to-use pymarian translator working. Our outstanding TODO’s were to test the speed (to ensure we aren’t incurring terrible latency by adding bindings) and to clean up some of the API’s.
I started the day by benchmarking the existing OPUS MT models using the Marian command-line interfaces. We wrote a script which tries to approximate the internal batching done by Marian for the Python driver:
import sys
import itertools
from pymarian import Translator
marian = Translator(sys.argv[1])
def batch(s, batch_size=1600):
it = iter(s)
while True:
chunk = list(itertools.islice(it, batch_size))
if not chunk:
return
yield chunk
for b in batch(map(str.strip, sys.stdin)):
print(marian.translate("\n".join(b)))
We then compare compare the speed and translations of WMT20 en-de using both with a tiny microbenchmarking script:
#!/usr/bin/env sh
function translate_native {
sacrebleu -t wmt20 -l en-de --echo src | \
spm_encode --model MTMA/source.spm | ../build/temp.linux-x86_64-3.8/PyMarian/marian-decoder --quiet \
-c MTMA/decoder.yml -b4 --mini-batch 16 --maxi-batch 100 -d 0 1 2 3 \
| spm_decode --model MTMA/target.spm > translations_native.txt
}
function translate_pybind {
sacrebleu -t wmt20 -l en-de --echo src | \
spm_encode --model MTMA/source.spm | \
python test_translate.py '--config MTMA/decoder.yml -b4 --mini-batch 16 --quiet --maxi-batch 100 -d 0 1 2 3' \
| spm_decode --model MTMA/target.spm > translations_pybind.txt
}
echo -n "Native: "
time translate_native
echo
echo -n "Pybind: "
time translate_pybind
diff *.txt
and find that the speeds were reasonably close and the translations were identical. This is a great start to the work day. :-)
The next thing to address was the overloaded pymarian.Translator.translate
issue described in the last post. We added an overloaded method to the underlying C++ which currently reuses the str
-in, str
-out method by joining on newline and splitting on newline. This is a bit of a hack, but gives us the shape of the API we want and a direction for refactoring Marian in the future. This required learning a bit about overloading in Pybind11, but it’s surprisingly clean so no big deal there.
We also began simplifying the hobbled-together interface between setuptools
and cmake
with scikit-build
which transparently handles the touchpoints and compiles cmake extensions with clean option forwarding. Like most packaging nonsense, this took the longest to figure out and likely requires some debugging still.
Meanwhile, Liling Tan and Alex Muzio began working on a thin shim between Huggingface’s interfaces for seq2seq and a mocked-up pymarian API. There is some discontinuity due to the types expected by the various libraries at inference time: HF transormers expects a dictionary containing indices (among other things), whereas pymarian expects a subword-segmented string. More thought needs to go into this, but we might just work around this by taking advantage of duck-typing. Another option is to simplfy create a transformers.Pipeline
which only handles inference and has a more-or-less str
-in, str
-out interface. Otherwise this shim code looks nice. Our next step is to solidify packaging, begin working on CI, and hopefully publish some artifacts. Stay tuned. :-)