Downloaded 16 times

![What is Juman++外国人参政権[Morita+ EMNLP2015]外 国foreign 人 参carrot政 権regime人person参 政vote権right0.399 0.000030.0020.00210.909p(外国,人参,政権)= 0.252 x 10-8p(外国,人,参政,権)= 0.761 x 10-7Main idea: Use a Recurrent Neural Network Language Model toconsider semantic plausibility in addition to usual model score 2](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fnlp2018export-180314025133%2f75%2fJuman-v2-A-Practical-and-Modern-Morphological-Analyzer-2-2048.jpg&f=jpg&w=240)











![Quiz: Where is the bottleneck?1. Dictionary lookup/lattice construction• Trie lookup, dictionary access2. Feature computation• Hashing, many conditionals3. Score computation• Score += weights[feature & (length – 1)];4. Output• Formatting, string concatenation14](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fnlp2018export-180314025133%2f75%2fJuman-v2-A-Practical-and-Modern-Morphological-Analyzer-14-2048.jpg&f=jpg&w=240)
![Quiz: Where is the bottleneck?1. Dictionary lookup/lattice construction• Trie lookup2. Feature computation• Hashing, many conditionals3. Score computation• Score += weights[feature & (length – 1)];4. Output• Formatting, string concatenation15Array access (not sum)was taking ~80% of alltime. Reason:L2 cache/dTLB misses.](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fnlp2018export-180314025133%2f75%2fJuman-v2-A-Practical-and-Modern-Morphological-Analyzer-15-2048.jpg&f=jpg&w=240)












Juman++ v2 is a fast and accurate morphological analyzer for Japanese. It uses a combination of linear and neural network models to assign part-of-speech tags and analyze word segmentation. Version 2 improves on speed by optimizing dictionary representation and reducing search space through "global beam" pruning. It achieves state-of-the-art accuracy on standard datasets while using a smaller model size than prior versions. Future work will focus on improving performance on informal language and integrating it with other Japanese language resources.

![What is Juman++外国人参政権[Morita+ EMNLP2015]外 国foreign 人 参carrot政 権regime人person参 政vote権right0.399 0.000030.0020.00210.909p(外国,人参,政権)= 0.252 x 10-8p(外国,人,参政,権)= 0.761 x 10-7Main idea: Use a Recurrent Neural Network Language Model toconsider semantic plausibility in addition to usual model score 2](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fnlp2018export-180314025133%2f75%2fJuman-v2-A-Practical-and-Modern-Morphological-Analyzer-2-2048.jpg&f=jpg&w=240)











![Quiz: Where is the bottleneck?1. Dictionary lookup/lattice construction• Trie lookup, dictionary access2. Feature computation• Hashing, many conditionals3. Score computation• Score += weights[feature & (length – 1)];4. Output• Formatting, string concatenation14](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fnlp2018export-180314025133%2f75%2fJuman-v2-A-Practical-and-Modern-Morphological-Analyzer-14-2048.jpg&f=jpg&w=240)
![Quiz: Where is the bottleneck?1. Dictionary lookup/lattice construction• Trie lookup2. Feature computation• Hashing, many conditionals3. Score computation• Score += weights[feature & (length – 1)];4. Output• Formatting, string concatenation15Array access (not sum)was taking ~80% of alltime. Reason:L2 cache/dTLB misses.](/image.pl?url=https%3a%2f%2fimage.slidesharecdn.com%2fnlp2018export-180314025133%2f75%2fJuman-v2-A-Practical-and-Modern-Morphological-Analyzer-15-2048.jpg&f=jpg&w=240)











