Representation learning of natural language and its application to language understanding and generation
Gong, Hongyu
Permalink
https://hdl.handle.net/2142/108110
Description
- Title
- Representation learning of natural language and its application to language understanding and generation
- Author(s)
- Gong, Hongyu
- Issue Date
- 2020-04-15
- Director of Research (if dissertation) or Advisor (if thesis)
- Bhat, Suma
- Doctoral Committee Chair(s)
- Bhat, Suma
- Committee Member(s)
- Viswanath, Pramod
- Srikant, Rayadurgam
- Hwu, Wen-mei
- Fanti, Giulia
- Department of Study
- Electrical & Computer Eng
- Discipline
- Electrical & Computer Engr
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Date of Ingest
- 2020-08-26T23:54:33Z
- Keyword(s)
- Natural Language Processing
- Representation Learning
- Language Understanding
- Language Generation
- Abstract
- How to properly represent language is a crucial and fundamental problem in Natural Language Processing (NLP). Language representation learning aims to encode rich information such as the syntax and semantics of the language into dense vectors. It facilitates the modeling, manipulation and analysis of natural language in computational linguistics. Existing algorithms utilize corpus statistics such as word co-occurrences to learn general-purpose language representation. Recent advances in generic representation integrate intensive information such as contextualized features from unlabeled text corpora.In this dissertation, we continue this line of research to incorporate rich knowledge into generic embeddings. We show that word representation could be enriched with various information including temporal and spatial variations as well as syntactic functionalities, and that text representation could be refined with topical knowledge. Moreover, we develop an insight into the geometry of pre-trained representation, and connect it to the semantic understanding such as identifying the idiomatic word usage. Besides generic representation, task-dependent representation is also extensively studied in downstream applications, where the representation is trained to encode domain information from labeled datasets. This dissertation leverages the capability of neural network models to integrate the task-specific supervision into language representations. We introduce new deep learning models and algorithms to train representations with external knowledge in annotated data. It is shown that the learned representation can assist in various downstream tasks in language understanding such as text classification and language generation such as text style transfer.
- Graduation Semester
- 2020-05
- Type of Resource
- Thesis
- Permalink
- http://hdl.handle.net/2142/108110
- Copyright and License Information
- copyright 2020 Hongyu Gong