团队还使用两个技巧,加速模型的训练过程,一个是常见的batch-size warmup,另一个是受微软Phi系列模型启发,利用现有的性能良好的ModernBERT-base模型权重,通过将基础模型的权重“平铺”扩展到更大的模型,提高权重初始化的效果。
Hugging Face, Nvidia, Johns Hopkins University, along with Answer.AI and LightOn, announced a successor to the encoder-only ...
AI research institutes Answer.AI and LightOn have developed ModernBERT, an improved version of Google's natural language processing model BERT ... Decoder-only models can perform similarly to ...