Improved Automated Labeling of Mathematical Exercises in Japanese
Authors
- Taisei YAMAUCHIAuthor
- Ryosuke NAKAMOTOAuthor
- Yiling DAIAuthor
- Kyosuke TAKAMIAuthor
- Brendan FlanaganAuthor
- Hiroaki OGATAAuthor
Abstract
This study aims at improving the prediction quality of the automatic labeling of learning materials. Labeling learning materials has two existing issues: establishing completely automated labeling and reducing manual labor for assigning labels to materials. Labels of the materials are utilized for analyzing students' learning patterns, tracing knowledge, and recommending exercises to students. Since it is too burdensome to manually assign several labels to many learning materials, an automatic, algorithm-based labeling system is desirable. However, classification using word embedding has often yielded lower accuracy for mathematics learning materials with short texts. In this research, we have conceived and implemented an improved approach to predict a label by calculating the similarity of n-gram of sentences using Jaccard coefficients, weighting them to create a vector representation, and using it to predict the label of the exercise. We compared the accuracy and F score of the prediction results of the weighted n-gram similarity model with those of the state-of-the-art word embedding model. We found that the n-gram approach was superior in both accuracy and F score. Furthermore, we plotted the vectors obtained from each model in two-dimensional coordinates and observed that the n-gram model produced more flexible predictions, regardless of the vector's position. These results suggest the classification effectiveness with weighted similarity of n-gram for materials with a small amount of text.