RoBERTa project that aggregates the improvements of the previous BERT and create the configuration RoBERTa for Robustly optimized BERT approach.

Thanks to the longer training and larger data set inclusion, RoBERTa has shown substantial improvement over BERT (Large) results. RoBERTa has been trained on longer sequences, with an improved pretraining procedure and downstream performance. RoBERTa has also removed the next sentence prediction capabilities of the BERT configuration.

RoBERTa builds on BERT’s language masking strategy, wherein the system learns to predict intentionally hidden sections of text within otherwise unannotated language examples.

Project Background

  • Project: RoBERTa
  • Author: Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
  • Initial Release: 2019
  • Type: Machine Learning
  • Contains: BERT (Large) architecture
  • Language:  Python
  • GitHub: /roberta with 14.3k stars


  • Filling masks
  • Lessen dependency on data labelling, which consumes a lot of time and resources
  • Pronoun Disambiguation
  • Realign word level tokenization
Scroll to Top