GN-TRVN: A Benchmark for Vietnamese Table Markdown Retrieval Task
Published in ICISN, 2025
Information retrieval often comes in plain text, lacking semi-structured text such as HTML and markdown, retrieving data that contains rich format such as table became non-trivial. In this paper, we tackle this challenge by introducing a new dataset, GreenNode Table Retrieval VN (GN-TRVN), which is collected from a massive corpus, a wide range of topics, and a longer context compared to ViQuAD2.0. To evaluate the effectiveness of our proposed dataset, we introduce two versions, M3-GN-VN and M3-GN-VN-Mixed, by fine-tuning the M3-Embedding model on this dataset. Experimental results show that our models consistently outperform the baselines, including the base model, across most evaluation criteria on various datasets such as VieQuADRetrieval, ZacLegalTextRetrieval, and GN-TRVN. In general, we release a more comprehensive dataset and two model versions that improve response performance for Vietnamese Markdown Table Retrieval.
Recommended citation: Pham, B.L., Hoang, Q.V., Luu, Q.T., Vo, T.T. (2026). GN-TRVN: A Benchmark for Vietnamese Table Markdown Retrieval Task. In: Thi Dieu Linh, N., Yu, S., Selamat, A., Tran, D.T. (eds) Proceedings of the Fifth International Conference on Intelligent Systems and Networks. ICISN 2025 2025. Lecture Notes in Networks and Systems, vol 1596. Springer, Singapore. https://doi.org/10.1007/978-981-95-1746-6_17
Download Paper | Download Slides
