Entities can be expressed in diverse formats, such as texts, images, or column names and cell values in tables. While existing entity linking (EL) models work well on per modality configuration, such as text-only EL, visual grounding or schema linking, it is more challenging to design a unified model for diverse modality configurations. To bring various modality configurations together, we constructed a benchmark for diverse-modal EL (DMEL) from existing EL datasets, covering all three modalities including text, image and table. To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm. Pre-training GDMM with rich corpora builds a solid foundation for DMEL without storing the entire KB for inference. Fine-tuning GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average. Additionally, extensive error analyses are conducted to highlight the challenge of DMEL, facilitating future researches on this task.
Sijia Wang, Alexander Hanbo Li, Henghui Zhu, Sheng Zhang, Pramuditha Perera, Chung-Wei Hang, Jie Ma, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Bing Xiang, and Patrick Ng. 2023.Benchmarking Diverse-Modal Entity Linking with Generative Models. InFindings of the Association for Computational Linguistics: ACL 2023, pages 7841–7857, Toronto, Canada. Association for Computational Linguistics.
@inproceedings{wang-etal-2023-benchmarking, title = "Benchmarking Diverse-Modal Entity Linking with Generative Models", author = "Wang, Sijia and Li, Alexander Hanbo and Zhu, Henghui and Zhang, Sheng and Perera, Pramuditha and Hang, Chung-Wei and Ma, Jie and Wang, William Yang and Wang, Zhiguo and Castelli, Vittorio and Xiang, Bing and Ng, Patrick", editor = "Rogers, Anna and Boyd-Graber, Jordan and Okazaki, Naoaki", booktitle = "Findings of the Association for Computational Linguistics: ACL 2023", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-acl.497/", doi = "10.18653/v1/2023.findings-acl.497", pages = "7841--7857", abstract = "Entities can be expressed in diverse formats, such as texts, images, or column names and cell values in tables. While existing entity linking (EL) models work well on per modality configuration, such as text-only EL, visual grounding or schema linking, it is more challenging to design a unified model for diverse modality configurations. To bring various modality configurations together, we constructed a benchmark for diverse-modal EL (DMEL) from existing EL datasets, covering all three modalities including text, image and table. To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm. Pre-training GDMM with rich corpora builds a solid foundation for DMEL without storing the entire KB for inference. Fine-tuning GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average. Additionally, extensive error analyses are conducted to highlight the challenge of DMEL, facilitating future researches on this task."}
%0 Conference Proceedings%T Benchmarking Diverse-Modal Entity Linking with Generative Models%A Wang, Sijia%A Li, Alexander Hanbo%A Zhu, Henghui%A Zhang, Sheng%A Perera, Pramuditha%A Hang, Chung-Wei%A Ma, Jie%A Wang, William Yang%A Wang, Zhiguo%A Castelli, Vittorio%A Xiang, Bing%A Ng, Patrick%Y Rogers, Anna%Y Boyd-Graber, Jordan%Y Okazaki, Naoaki%S Findings of the Association for Computational Linguistics: ACL 2023%D 2023%8 July%I Association for Computational Linguistics%C Toronto, Canada%F wang-etal-2023-benchmarking%X Entities can be expressed in diverse formats, such as texts, images, or column names and cell values in tables. While existing entity linking (EL) models work well on per modality configuration, such as text-only EL, visual grounding or schema linking, it is more challenging to design a unified model for diverse modality configurations. To bring various modality configurations together, we constructed a benchmark for diverse-modal EL (DMEL) from existing EL datasets, covering all three modalities including text, image and table. To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm. Pre-training GDMM with rich corpora builds a solid foundation for DMEL without storing the entire KB for inference. Fine-tuning GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average. Additionally, extensive error analyses are conducted to highlight the challenge of DMEL, facilitating future researches on this task.%R 10.18653/v1/2023.findings-acl.497%U https://aclanthology.org/2023.findings-acl.497/%U https://doi.org/10.18653/v1/2023.findings-acl.497%P 7841-7857
Sijia Wang, Alexander Hanbo Li, Henghui Zhu, Sheng Zhang, Pramuditha Perera, Chung-Wei Hang, Jie Ma, William Yang Wang, Zhiguo Wang, Vittorio Castelli, Bing Xiang, and Patrick Ng. 2023.Benchmarking Diverse-Modal Entity Linking with Generative Models. InFindings of the Association for Computational Linguistics: ACL 2023, pages 7841–7857, Toronto, Canada. Association for Computational Linguistics.