The following paragraph is from page no 331 of the textbookNatural Language Processing byJacob Eisenstein. It mentions about certain type of tasks called as downstream tasks. But, it provide no further examples or details regarding these tasks.
Learning algorithms like perceptron and conditional random fieldsoften perform better with discrete feature vectors. A simple way toobtain discrete representations from distributional statistics is byclustering, so that words in the same cluster have similardistributional statistics. This can help indownstream tasks, bysharing features between all words in the same cluster. However, thereis an obvious tradeoff: if the number of clusters is too small, thewords in each cluster will not have much in common; if the number ofclusters is too large, then the learner will not see enough examplesfrom each cluster to generalize.
Which tasks in artificial intelligence or NLP are called as downstream tasks?
- $\begingroup$downstream$\endgroup$hanugm– hanugm2021-10-08 08:08:50 +00:00CommentedOct 8, 2021 at 8:08
1 Answer1
In the context ofself-supervised learning (which is also used in NLP), adownstream task is the task that you actually want to solve. This definition makes sense if you're familiar with transfer learning or self-supervised learning, which are also used for NLP. In particular, in transfer learning, you first pre-train a model with some "general" dataset (e.g. ImageNet), which doesnot represent the task that you want to solve, but allows the model to learn some "general" features. Then you fine-tune this pre-trained model on the dataset that represents the actual problem that you want to solve. This latter task/problem is what would be called, in the context of self-supervised learning, adownstream task. Inthis answer, I mention these downstream tasks.
In the same book that you quote, the author also writes (section14.6.2 Extrinsic evaluations, p. 339 of the book)
Word representations contribute to downstream tasks likesequence labeling anddocument classification by enabling generalization across words. The use of distributed representations as features is a form of semi-supervised learning, in which performance on a supervised learning problem is augmented by learning distributed representations from unlabeled data (Miller et al., 2004; Koo et al., 2008; Turian et al., 2010). Thesepre-trained word representations can be used as features in a linear prediction model, or as the input layer in a neural network, such as a Bi-LSTM tagging model (§ 7.6). Word representations can be evaluated by the performance of the downstream systems that consume them: for example, GloVe embeddings are convincingly better than Latent Semantic Analysis as features in the downstream task ofnamed entity recognition (Pennington et al., 2014). Unfortunately, extrinsic and intrinsic evaluations do not always point in the same direction, and the best word representations for one downstream task may perform poorly on another task (Schnabel et al., 2015).
When word representations are updated from labeled data in the downstream task, they are said to befine-tuned.
So, to me, after having read this section of the book, it seems that the author is using the term "downstream task" as it's used in self-supervised learning. Examples of downstream tasks are thus
- sequence labeling
- documentation classification
- named entity recognition
Tasks like training a model to learn word embeddings are not downstream tasks, because these tasks are not really the ultimate tasks that you want to solve, but they are solved in order to solve other tasks (i.e. the downstream tasks).
- 2
- 1$\begingroup$Are there any upstream tasks as opposed to downstream tasks?$\endgroup$b.g.– b.g.2022-10-23 20:19:01 +00:00CommentedOct 23, 2022 at 20:19
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.