espnet2.asr.encoder.beats_encoder.init_bert_params
Less than 1 minute
espnet2.asr.encoder.beats_encoder.init_bert_params
espnet2.asr.encoder.beats_encoder.init_bert_params(module)
Initialize the weights specific to the BERT model.
This function overrides the default weight initializations based on the specified arguments for various layer types, including linear, embedding, and multi-head attention layers. The initialization is done using a normal distribution with a mean of 0.0 and a standard deviation of 0.02.
- Parameters:
- module (nn.Module) – The PyTorch module (e.g., Linear, Embedding,
- initialized. (MultiheadAttention ) whose weights are to be)
Notes
- For linear layers, weights are initialized with a normal distribution, and biases are set to zero.
- For embedding layers, weights are also initialized with a normal distribution, and padding indices (if any) are set to zero.
- For multi-head attention layers, the weights for query, key, and value projections are initialized using the same normal distribution.
Examples
>>> linear_layer = nn.Linear(10, 5)
>>> init_bert_params(linear_layer)
>>> assert linear_layer.weight.data.mean() == 0.0 # mean should be near 0
>>> embedding_layer = nn.Embedding(10, 5)
>>> init_bert_params(embedding_layer)
>>> assert embedding_layer.weight.data[0].sum() == 0.0 # padding idx should be zero
>>> attention_layer = MultiheadAttention(embed_dim=5, num_heads=2)
>>> init_bert_params(attention_layer)
>>> assert attention_layer.q_proj.weight.data.mean() == 0.0 # mean should be near 0