Semantic Representations of Word Senses and Concepts
José Camacho-Collados, Ignacio Iacobacci, Roberto Navigli and Mohammad Taher Pilehvar
Representing the semantics of linguistic items in a machine interpretable form has been a major goal of Natural Language Processing since its earliest days. Among the range of different linguistic items, words have attracted the most research attention. However, word representations have an important limitation: they conflate different meanings of a word into a single vector. Representations of word senses have the potential to overcome this inherent limitation. Indeed, the representation of individual word senses and concepts has recently gained in popularity with several experimental results showing that a considerable performance improvement can be achieved across different NLP applications upon moving from word level to the deeper sense and concept levels. Another interesting point regarding the representation of concepts and word senses is that these models can be seamlessly applied to other linguistic items, such as words, phrases, sentences, etc.
This tutorial will first provide a brief overview of the recent literature concerning word representation (both count based and neural network based). It will then describe the advantages of moving from the word level to the deeper level of word senses and concepts, providing an extensive review of state of the art systems. Approaches covered will not only include those which draw upon knowledge resources such as WordNet, Wikipedia, BabelNet or FreeBase as reference, but also the so called multi prototype approaches which learn sense distinctions by using different clustering techniques. Our tutorial will discuss the advantages and potential limitations of all approaches, showing their most successful applications to date. We will conclude by presenting current open problems and lines of future work.
1. Semantic Representation: Foundations (25 minutes)
This session provides the necessary background for semantic representation. We will briefly cover the traditional vector space model (Turney and Pantel, 2010) followed by the more recent approaches based on neural networks (Mikolov et al., 2013). We then provide reasons for the need to produce semantic representations for the deeper word sense level, focusing on the main limitation of the word-based approaches which is their inherent ambiguity. We then show how sense-based representations overcome these limitations, hence providing improvements across several tasks.
2. Knowledge-based representations (65 minutes)
We start this session by briefly talking about some of the most popular lexical knowledge resources that provide sense inventories, that have been used by different sense representation techniques. We put emphasis on WordNet (Miller et al., 1990), the de facto standard sense inventory in the community, and Wikipedia, the largest collaboratively-constructed resource of the type, both of which have been extensively used by many researchers in the area. We discuss the advantages each of these resources provides and show how they are usually viewed as semantic networks and exploited for representation purposes.
Then, we provide a deep review of different techniques that learn representations for individual concepts in a target sense inventory. We cover all the existing approaches that model concepts in WordNet (Pilehvar and Navigli, 2015), articles in Wikipedia (Hassan and Mihalcea, 2011), or concepts in larger sense inventories such as BabelNet (Iacobacci et al., 2015, Camacho-Collados et al. 2015) or FreeBase (Bordes et al., 2013). We will also cover some approaches that make use of additional external corpora (or word representations learned on the basis of statistical clues) besides the target knowledge resource (Chen et al., 2014, Chen et al, 2015, Johansson and Nieto Piña, 2015, Jauhar et al., 2015; Rothe and Schütze, 2015). We discuss the advantages of these knowledge-based representations and focus on how neural network-based learning has played a role in this area in the past few years.
3. Multi-prototype representations (45 minutes)
In this session we cover the so-called multi-prototype techniques that learn multiple representations per word, each corresponding to a specific meaning of the word. We will illustrate how these approaches leverage clustering algorithms for dividing the contexts of a word into multiple contexts for its different meanings (Reisinger and Mooney, 2010, Huang et al., 2012, Neelakantan et al., 2014, Tian et al., 2014, Guo et al., 2014, Wu and Giles, 2015, Liu et al., 2015, Vu and Parker 2016, Šuster et al., 2016).
4. Advantages and limitations of knowledge-based and multi-prototype approaches (10 minutes)
This session reviews some of the advantages and limitations of the knowledge-based and multi-prototype techniques, describing the applications for which they are suitable and mentioning some issues such as the knowledge acquisition bottleneck.
5. Applications (20 minutes)
This session focuses on different applications of sense representations. We briefly mention some of the main applications and tasks to which sense representations can be applied. Sense representations may be used in virtually every task in which word representations have been traditionally applied. Examples of such tasks include automatic thesaurus generation (Crouch, 1988; Curran and Moens, 2002), information extraction (Laender et al. 2002), semantic role labelling (Erk, 2007; Pennacchiotti et al., 2008), and word similarity (Deerwester et al., 1990; Turney et al. 2003; Radinsky et al. 2011; Mikolov et al., 2013) and clustering (Pantel and Lin, 2002). We will provide comparisons between word and sense representations performance, discussing the advantages and limitations of each approach. Moreover, we will show how sense representations can also be applied to a wide variety of additional tasks such as entity linking and word sense disambiguation (Navigli, 2009; Chen et al., 2014; Camacho-Collados et al. 2015b; Rothe and Schütze, 2015), sense clustering (Snow et al., 2007; Camacho-Collados et al., 2015a), alignment of lexical resources (Niemann and Gurevych, 2011; Navigli and Ponzetto, 2012; Pilehvar and Navigli, 2014), taxonomy learning (Espinosa-Anke et al. 2016), lexical substitution (McCarthy and Navigli, 2009), or sense-based semantic similarity (Budanitsky and Hirst, 2006; Pilehvar et al., 2013, Iacobacci et al., 2015), to name a few.
6. Open Problems and Future Work in Semantic Representation (15 minutes)
This last session provides a summary of possible directions of future work on semantic sense representation. We discuss various problems associated with the current representation approaches and propose lines of research in order to effectively apply sense representations in natural language understanding tasks.
José Camacho Collados is a Google Doctoral Fellow and PhD student at the Sapienza University of Rome (; http://wwwusers.di.uniroma1.it/~collados/), working under the supervision of Prof. Roberto Navigli. His research focuses on Natural Language Processing and on the area of lexical semantics in particular. He has developed NASARI, a novel semantic vector representation for concepts and named entities (http://lcl.uniroma1.it/nasari/), which led to two publications in NAACL and ACL 2015. José will co-organize a SemEval 2017 task on multilingual semantic similarity. His background education includes an Erasmus Mundus Master in Natural Language Processing and Human Language Technology and a 5-year BSc degree in Mathematics.
Ignacio Iacobacci is a PhD student at the Sapienza University of Rome (; https://iiacobac.wordpress.com/), working under the supervision of Prof. Roberto Navigli. His research interests lie in the fields of Machine Learning, Natural Language Processing, Neural Networks. He is currently working on Word Sense Disambiguation and Distributional Semantics. Ignacio presented SensEmbed at ACL 2015, a novel approach for word and relational similarity built from exploiting semantic knowledge for modeling arbitrary word senses in a large sense inventory. His background includes a MSc. in Computer Science and 8 years as a developer including 4 years as a Machine Learning - NLP specialist.
Roberto Navigli is an Associate Professor in the Department of Computer Science at La Sapienza University of Rome and a member of the Linguistic Computing Laboratory (; http://wwwusers.di.uniroma1.it/~navigli/). His research interests lie in the field of Natural Language Processing, including: Word Sense Disambiguation and Induction, Ontology Learning, Knowledge Representation and Acquisition, and multilinguality. In 2007 he received a Ph.D. in Computer Science from La Sapienza and he was awarded the Marco Cadoli 2007 AI*IA national prize for the Best Ph.D. Thesis in Artificial Intelligence. In 2013 he received the Marco Somalvico AI*IA prize, awarded every two years to the best young Italian researcher in Artificial Intelligence. He is the creator and founder of BabelNet (http://www.babelnet.org), both a multilingual encyclopedic dictionary and a semantic network, and its related project Babelfy (http://www.babelfy.org), a state-of-the-art multilingual disambiguation and entity linking system. He is also the Principal Investigator of MultiJEDI (http://multijedi.org/), a 1.3M euro 5-year Starting Grant funded by the European Research Council and the responsible person of the Sapienza unit in LIDER, an EU project on content analytics and language technologies. He is also the Co-PI of "Language Understanding cum Knowledge Yield" (LUcKY), a Google Focused Research Award on Natural Language Understanding.
Mohammad Taher Pilehvar is a Research Associate in the Language Technology Lab of the University of Cambridge (; http://www.pilevar.com/taher/) where he is currently working on NLP in the biomedical domain. Taher completed his PhD in 2015 under the supervision of Prof. Roberto Navigli. Taher’s research lies in lexical semantics, mainly focusing on semantic representation, semantic similarity, and Word Sense Disambiguation. He has co-organized two semeval tasks and has authored multiple conference and journal papers on semantic representation and similarity in top tier venues. He is the first author of a paper on semantic similarity that was nominated for the best paper award at ACL 2013.
Bordes, A., Usunier N., Garcia-Duran, A., Weston, J., Yakhnenko, O., 2013. Translating Embeddings for Modeling Multi-relational Data. In: Advances in Neural Information Processing Systems 26 (NIPS 2013). pp 2787–2795.
Budanitsky, A., Hirst, G., 2006. Evaluating WordNet-based measures of Lexical Semantic Relatedness. Computational Linguistics 32 (1), 13–47.
Camacho-Collados, J., Pilehvar, M. T., Navigli, R., 2015a. NASARI: a Novel Approach to a Semantically-Aware Representation of Items. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2010). Denver, Colorado. pp. 567–577.
Camacho-Collados, J., Pilehvar, M. T., Navigli, R., 2015b. A Unified Multilingual Semantic Representation of Concepts. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China, pp. 741–751.
Chen, T., Xu, R. He, Y., Wang, X., 2015. Improving Distributed Representation of Word Sense via WordNet Gloss Composition and Context Clustering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Short Papers). Beijing, China. pp. 15-20.
Chen, X., Liu, Z., Sun, M., 2014. A unified model for word sense representation and disambiguation. In: In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP '14. Doha, Qatar. pp. 1025–1035.
Crouch, C. J., 1988. A cluster-based approach to thesaurus construction. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '88. pp. 309–320.
Curran, J. R., Moens, M., 2002. Improvements in automatic thesaurus extraction. In: Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition - Volume 9. ULA '02. pp. 59–66.
Deerwester, S. C., Dumais, S. T., Landauer, T. K., Furnas, G. W., Harshman, R. A., 1990. Indexing by latent semantic analysis. Journal of American Society for Information Science 41 (6), 391–407.
Erk, K., 2007. A simple, similarity-based model for selectional preferences. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic.
Esponisa-Anke, L, Saggion, H., Ronzano, F., Navigli, R., 2016. ExTaSem! Extending, Taxonomizing and Semantifying Domain Terminologies. In: Proceedings of AAAI, Phoenix, USA. pp. 2594–2600
Guo, J., Che, W., Wang, H., Liu, T., 2014. Learning Sense-specific Word Embeddings By Exploiting Bilingual Resources. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin, Ireland. pp 497–507.
Hassan, S., Mihalcea, R., 2011. Semantic relatedness using salient semantic analysis. In: Proceedings of AAAI. pp. 884–889.
Huang, E. H., Socher, R. ; Manning, C. D., Ng, A. Y., 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, Jeju Island, Korea. pp 873–882.
Iacobacci, I., Pilehvar, M. T., Navigli, R., 2015. SensEmbed: Learning Sense Embeddings for Word and Relational Similarity. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing, China., pp 95–105.
Jauhar, S. K., Dyer, C. Hovy, E., 2015. Ontologically Grounded Multi-sense Representation Learning for Semantic Vector Space Models. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-2015. pp 683–693.
Johansson, R., Nieto Piña, L., 2015. Embedding a Semantic Network in a Word Space. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2010). Denver, Colorado. pp 1428–1433.
Laender, A. H. F., Ribeiro-Neto, B. A., da Silva, A. S., Teixeira, J. S., 2002. A brief survey of web data extraction tools. SIGMOD Rec.
31 (2), 84–93.
Liu Y., Liu Z., Chua T.,Sun M. Topical Word Embeddings, 2015. Topical Word Embeddings. The 29th AAAI Conference on Artificial Intelligence (AAAI'15). Austin, Texas.
McCarthy, D., Navigli, R., 2009. The English lexical substitution task. Language Resources and Evaluation 43 (2), 139–159.
Miller, G. A., Beckwith, R., Fellbaum, C. D., Gross, D., Miller, K., 1990. WordNet: an online lexical database. International Journal of Lexicography 3 (4), 235–244.
Mikolov, T., Chen, K., Corrado, G., Dean, J., 2013. Efficient estimation of word representations in vector space. CoRR abs/1301.3781. URL http://arxiv.org/abs/1301.3781
Navigli, R., 2009. Word Sense Disambiguation: A survey. ACM Computing Surveys 41 (2), 1–69.
Navigli, R., Ponzetto, S. P., 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250.
Neelakantan, A., Shankar. J. Passos, A., McCallum, A. 2014, Efficient Non-parametric Estimation of Multiple Embeddings per Word in Vector Space. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP '14. Doha, Qatar. pp. 457–465.
Niemann, E., Gurevych, I., 2011. The people's web meets linguistic knowledge: automatic sense alignment of Wikipedia and Wordnet. In: Proceedings of the Ninth International Conference on Computational Semantics. pp. 205–214.
Pantel, P., Lin, D., 2002. Discovering word senses from text. In: Proceedings of KDD. pp. 613–619.
Pennacchiotti, M., De Cao, D., Basili, R., Croce, D., Roth, M., 2008. Automatic induction of FrameNet lexical units. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP '08. pp. 457–465.
Pilehvar, M. T., Jurgens, D., Navigli, R., 2013. Align, Disambiguate and Walk: a Unified Approach for Measuring Semantic Similarity. In: Proceedings of ACL. pp. 1341–1351.
Pilehvar, M. T., Navigli, R., 2015. From senses to texts: An all-in-one graph-based approach for measuring semantic similarity. Artificial Intelligence 228, 95–128.
Pilehvar, M. T., Navigli, R., 2014. A robust approach to aligning heterogeneous lexical resources. In: Proceedings of ACL. pp. 468–478.
Radinsky, K., Agichtein, E., Gabrilovich, E., Markovitch, S., 2011. A word at a time: Computing word relatedness using temporal semantic analysis. In: Proceedings of the 20th International Conference on World Wide Web. WWW '11. pp. 337–346.
Reisinger, J., Mooney R. J., 2010. Multi-Prototype Vector-Space Models of Word Meaning. In: Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-2010). Los Angeles, California, pp=109-117
Rothe, S., Schütze, H., July 2015. Autoextend: Extending word embeddings to embeddings for synsets and lexemes. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Beijing, China, pp. 1793–1803.
Snow, R., Prakash, S., Jurafsky, D., Ng, A. Y., 2007. Learning to merge word senses. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Prague, Czech Republic, pp. 1005–1014.
Suster, S., Titov, I., van Noord, G., 2016. Bilingual Learning of Multi-sense Embeddings with Discrete Autoencoders. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics. San Diego, USA, pp. 1346–1356
Tian, F., Dai, H., Bian, J., Gao, B., Zhang, R., Chen, E. and Liu, T., 2014. A Probabilistic Model for Learning Multi-Prototype Word Embeddings. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin, Ireland. pp 151–160.
Turney, P. D., Littman, M. L., Bigham, J., Shnayder, V., 2003. Combining independent modules to solve multiple-choice synonym and analogy problems. In: Proceedings of Recent Advances in Natural Language Processing. Borovets, Bulgaria, pp. 482–489.
Turney, P. D., Pantel, P., 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, pp. 141–188.
Vu, T. , Parker, D. S., 2016. K-Embeddings: Learning Conceptual Embeddings for Words using Context. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics. San Diego, USA, pp. 1262–1267.
Wu Z., Giles, C. L., 2015. Sense-aware Semantic Analysis: A Multi-Prototype Word Representation Model Using Wikipedia. In: Proceedings of AAAI. Austin, Texas.