Margin-aware Unsupervised Domain Adaptation for Cross-lingual Text Labeling

Abstract

Unsupervised domain adaptation addresses the problem of leveraging labeled data in a source domain to learn a well-performing model in a target domain where labels are unavailable. In this paper, we follow recent theoretical work and adopt the Margin Disparity Discrepancy (MDD) unsupervised domain adaptation algorithm to solve the cross-lingual text labeling problems. Experiments on cross-lingual document classification and NER demonstrate the proposed domain adaptation approach achieves significant improvements over state-of-the-art results. We further improve MDD by efficiently optimizing the margin loss on the source domain via Virtual Adversarial Training (VAT). This bridges the gap between the theoretical results and the actually used loss function in the original MDD work, and hence boosting the performance remarkably. Our numerical results also indicate that VAT can generally improve the generalization performance of both domains for different domain adaptation approaches.

Publication
Findings of the Association for Computational Linguistics: EMNLP 2020
Avatar
Dejiao Zhang
Senior Applied Scientist