Neural Machine Translation

Thang Luong, Kyunghyun Cho and Christopher D. Manning


Neural Machine Translation (NMT) is a simple new architecture for getting machines to learn to translate. Despite being relatively new (Kalchbrenner and Blunsom, 2013; Cho et al., 2014; Sutskever et al., 2014), NMT has already shown promising results, achieving state-of-the-art performances for various language pairs (Luong et al, 2015a; Jean et al, 2015; Luong et al, 2015b; Sennrich et al., 2016; Luong and Manning, 2016). While many of these NMT papers were presented to the ACL community, research and practice of NMT are only at their beginning stage. This tutorial would be a great opportunity for the whole community of machine translation and natural language processing to learn more about a very promising new approach to MT. This tutorial has four parts. 

In the first part, we start with an overview of MT approaches, including: (a) traditional methods that have been dominant over the past twenty years and (b) recent hybrid models with the use of neural network components. From these, we motivate why an end-to-end approach like neural machine translation is needed. The second part introduces a basic instance of NMT. We start out with a discussion of recurrent neural networks, including the back-propagation-through-time algorithm and stochastic gradient descent optimizers, as these are the foundation on which NMT builds. We then describe in detail the basic sequence-to-sequence architecture of NMT (Cho et al., 2014; Sutskever et al., 2014), the maximum likelihood training approach, and a simple beam-search decoder to produce translations.

The third part of our tutorial describes techniques to build state-of-the-art NMT. We start with approaches to extend the vocabulary coverage of NMT (Luong et al., 2015a; Jean et al., 2015; Chitnis and DeNero, 2015). We then introduce the idea of jointly learning both translations and alignments through an attention mechanism (Bahdanau et al., 2015); other variants of attention (Luong et al., 2015b; Tu et al., 2016) are discussed too. We describe a recent trend in NMT, that is to translate at the sub-word level (Chung et al., 2016; Luong and Manning, 2016; Sennrich et al., 2016), so that language variations can be effectively handled. We then give tips on training and testing NMT systems such as batching and ensembling. In the final part of the tutorial, we briefly describe promising approaches, such as (a) how to combine multiple tasks to help translation (Dong et al., 2015; Luong et al., 2016; Firat et al., 2016; Zoph and Knight, 2016) and (b) how to utilize monolingual corpora (Sennrich et al., 2016). Lastly, we conclude with challenges remained to be solved for future NMT.

PS: we would also like to acknowledge the very first paper by Forcada and Ñeco (1997) on sequence-to-sequence models for translation!


Tutorial Content

  1. Introduction - 30mins (Chris Manning)
    • Traditional approaches: statistical MT.
    • Hybrid approaches: neural network components.
    • End-to-end approaches: neural machine translation.
  2. Basic NMT - 60mins (Kyunghyun Cho)
    • Recurrent language modeling
    • Training: maximum likelihood estimation with backpropagation through time.
    • Conditional recurrent language modeling: Encoder-Decoder.
    • Decoding strategies
  3. Coffee Break - 30min
  4. Advanced NMT - 60mins (Thang Luong)
    • Extending the vocabulary coverage.
    • Learning alignment: attention mechanism.
    • Handling language variations: character-level translation.
    • Tips & tricks: batching, ensembling.
  5. Closing - 30mins (Chris Manning)
    • Multi-task learning.
    • Unsupervised learning with monolingual data.
    • Future of NMT.

For updated information and material, please refer to our tutorial website

About the presenters 

Thang Luong
Stanford University

Thang Luong is currently a 5th-year PhD student in the Stanford NLP group under Prof. Christopher Manning. In the past, he has published papers on various different NLP-related areas such as digital library, machine translation, speech recognition, parsing, psycholinguistics, and word embedding learning. Recently, his main interest shifts towards the area of deep learning using sequence to sequence models to tackle various NLP problems, especially neural machine translation. He has built state-of-the-art (academically) neural machine translation systems both at Google and at Stanford.

Kyunghyun Cho
Courant Institute of Mathematical Sciences and Center for Data Sciences, New York University

Kyunghyun Cho is an assistant professor in the Department of Computer Science and the Center for Data Science at New York University. He has worked in deep learning for natural language processing, language translation, image captioning, and a variety of other subjects including the core methods of deep learning. He completed a post-doctoral fellowship at the Montreal Institute for Learning Algorithms. He earned a Ph.D and an M.Sc (with distinction) from the Aalto University School of Science, and a B.Sc in Computer Science from the Korea Advanced Institute of Science and Technology.

Christopher Manning
Stanford University

Christopher Manning is a professor of computer science and linguistics at Stanford University. He works on software that can intelligently process, understand, and generate human language material.  He is a leader in applying Deep Learning to Natural Language Processing, including exploring Tree Recursive Neural Networks, sentiment analysis, neural network dependency parsing, the GloVe model of word vectors, neural machine translation, and deep language understanding.  Manning is an ACM Fellow, a AAAI Fellow, and an ACL Fellow, and has coauthored leading textbooks on statistical natural language processing and information retrieval. He is a member of the Stanford NLP group (@stanfordnlp).


Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. ICLR.

Rohan Chitnis and John DeNero. 2015. Variable-Length Word Encodings for Neural Translation Models. EMNLP.

Kyunghyun Cho, Bart Van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. EMNLP.

Junyoung Chung, Kyunghyun Cho, and Yoshua Bengio. 2016. A Character-level Decoder without Explicit Segmentation for Neural Machine Translation. ACL.

Daxiang Dong, Hua Wu, Wei He, Dianhai Yu, and Haifeng Wang. 2015. Multi-task learning for multiple language translation. ACL.

Orhan Firat, Kyunghyun Cho, Yoshua Bengio. 2016. Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism. NAACL.

Mikel L. Forcada and Ramón Ñeco. 1997. Recursive hetero-associative memories for translation. In Biological and Artificial Computation: From Neuroscience to Technology, pages 453–462. Springer.

Sebastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio. 2015. On using very large target vocabulary for neural machine translation. ACL.

Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent continuous translation models. EMNLP.

Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba, 2015a. Addressing the rare word problem in neural machine translation. ACL.

Minh-Thang Luong, Hieu Pham, and Christopher D Manning. 2015b. Effective approaches to attention-based neural machine translation. EMNLP.

Minh-Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser, 2016. Multi-task Sequence to Sequence Learning. ICLR.

Minh-Thang Luong and Christopher D Manning. 2016. Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models. ACL.

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Improving neural machine translation models with monolingual data. ACL.

Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. ACL.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. NIPS.

Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li. 2016. Modeling Coverage for Neural Machine Translation. ACL.

Barret Zoph and Kevin Knight. 2016. Multi-source neural translation. In NAACL.