A Novel Transformer Pre-training Objective and a Novel Fine-Tuning Method for Abstractive Summarization
Pre-training Transformer has been widely used in many NLP tasks including document summarization. Researchers designed many different self-supervised objectives for their pre-training transformer models, then based on the seq2seq model to fine tune on these pre-trained Transformer models for downstream tasks. However, most researchers designed their self-supervised objectives for all NLP tasks, the ability of self-supervised objectives for a specific task such as abstractive document summary hasn’t been largely explored. This article designed a novel self-supervised objective MSLM (Mask Summary Language Model) for document summarization. MSLM uses labeled document summary corpus for pre-training, where some words have been removed/masked from the summary. The source text concatenates the masked summary as the input, while the output is the summary with the original words masked. The objective is to predict the masked words from the summary. We first pre-trained on three variants of MSLM that remove nouns, verbs, and all the other words from the summary respectively. We found that removing nouns from the summary obtained the best ROUGE score on the downstream abstractive document summarization task. Then, inspired by BERT (Devlin et al., 2018) and Roberta (Liu et al., 2019), we pre-trained the concatenation of MLM (Mask Language Model that first been proposed in BERT) and our best MSLM variant, we found that fine-tuning the model that pre-trained on the concatenation of MLM and MSLM obtained higher ROUGE score than the model that pre-trained on MLM only.