Deokhyung Kang

Ph.D Student.POSTECH

prof_pic.jpg

I’m a Ph.D. student in the Natural Language Processing (NLP) group, working with Prof. Gary Geunbae Lee at POSTECH, South Korea. My primary research interest lies in multilingual language processing, with a special focus on semantic parsing. I aim to expand semantic parsing systems across multiple languages while maintaining their reasoning capabilities.

Beyond semantic parsing, my research interests include multilingual NLP, question answering, and information retrieval. I am particularly fascinated by the challenges of enabling robust reasoning across diverse linguistic landscapes.

Previously, I completed my B.S.E. in Computer Science and Engineering at POSTECH.

news

Feb 25, 2025 Release new preprint! “Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation” explores generating Ladder Programs using LLMs. Check it outhere!
Sep 21, 2024 Our paper, “Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsing,” has been accepted at EMNLP 2024—my first EMNLP paper of my PhD!
Feb 20, 2024 “Denoising Table-Text Retrieval for Open-Domain Question Answering,” has been accepted to LREC-COLING 2024.

selected publications

  1. Recent efforts have aimed to utilize multilingual pretrained language models (mPLMs) to extend semantic parsing (SP) across multiple languages without requiring extensive annotations. However, achieving zero-shot cross-lingual transfer for SP remains challenging, leading to a performance gap between source and target languages. In this study, we propose Cross-Lingual Back-Parsing (CBP), a novel data augmentation methodology designed to enhance cross-lingual transfer for SP. Leveraging the representation geometry of the mPLMs, CBP synthesizes target language utterances from source meaning representations. Our methodology effectively performs cross-lingual data augmentation in challenging zero-resource settings, by utilizing only labeled data in the source language and monolingual corpora. Extensive experiments on two cross-language SP benchmarks (Mschema2QA and Xspider) demonstrate that CBP brings substantial gains in the target language. Further analysis of the synthesized utterances shows that our method successfully generates target language utterances with high slot value alignment rates while preserving semantic integrity. Our codes and data are publicly available at https://github.com/deokhk/CBP.

  2. In table-text open-domain question answering, a retriever system retrieves relevant evidence from tables and text to answer questions. Previous studies in table-text open-domain question answering have two common challenges: firstly, their retrievers can be affected by false-positive labels in training datasets; secondly, they may struggle to provide appropriate evidence for questions that require reasoning across the table. To address these issues, we propose Denoised Table-Text Retriever (DoTTeR). Our approach involves utilizing a denoised training dataset with fewer false positive labels by discarding instances with lower question-relevance scores measured through a false positive detection model. Subsequently, we integrate table-level ranking information into the retriever to assist in finding evidence for questions that demand reasoning across the table. To encode this ranking information, we fine-tune a rank-aware column encoder to identify minimum and maximum values within a column. Experimental results demonstrate that DoTTeR significantly outperforms strong baselines on both retrieval recall and downstream QA tasks. Our code is available at https://github.com/deokhk/DoTTeR.

You can even add a little note about which of these is the best way to reach you.