Menu
DistilBERT
DistilBERT is a distilled version of BERT that retains the performance capabilities of BERT but uses only half of the parameters, is faster, and smaller. It does not have token-type embeddings that BERT does.
DiustilBERT uses a technique called ‘distillation’ where it closely resembles Google’s large neural network with a smaller one. The idea is to create task-specific models.
Project Background
- Project:Â DistilBERT
- Author: Victor Sanh, Julien Chaumond, and Thomas Wolf
- Initial Release: 2019
- Type: Transfer Learning
- GitHub:/distilbert.rst with 53.3k stars and 9 contributors
- Twitter: None
Applications
- Leverage the inductive biases learned by larger models during pretraining
- Prediction models with faster inference speed
- Knowledge distillation for reducing size.