espnet2.gan_svs.pits.ying_decoder.YingDecoder
espnet2.gan_svs.pits.ying_decoder.YingDecoder
class espnet2.gan_svs.pits.ying_decoder.YingDecoder(hidden_channels, kernel_size, dilation_rate, n_layers, yin_start, yin_scope, yin_shift_range, gin_channels=0)
Bases: Module
YingDecoder is a neural network module for decoding yin signals in the
context of GAN-based singing voice synthesis.
This module takes a yin target signal and processes it through several convolutional layers to generate a predicted yin signal. It supports shifting the input signal within a specified range to improve the robustness of the output.
in_channels
Number of input channels for the first convolution.
- Type: int
out_channels
Number of output channels for the final convolution.
- Type: int
hidden_channels
Number of hidden channels in the convolutional layers.
- Type: int
kernel_size
Size of the convolutional kernel.
- Type: int
dilation_rate
Dilation rate for the convolutional layers.
- Type: int
n_layers
Number of convolutional layers.
- Type: int
gin_channels
Number of global conditioning channels.
- Type: int
yin_start
Start point of the yin target signal.
- Type: int
yin_scope
Scope of the yin target signal.
- Type: int
yin_shift_range
Maximum number of frames to shift the yin target signal.
Type: int
Parameters:
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Size of the convolutional kernel.
- dilation_rate (int) – Dilation rate of the convolutional layers.
- n_layers (int) – Number of convolutional layers.
- yin_start (int) – Start point of the yin target signal.
- yin_scope (int) – Scope of the yin target signal.
- yin_shift_range (int) – Maximum number of frames to shift the yin target signal.
- gin_channels (int , optional) – Number of global conditioning channels. Defaults to 0.
########### Examples
Create an instance of the YingDecoder
decoder = YingDecoder(hidden_channels=64, kernel_size=3, dilation_rate=1,
n_layers=4, yin_start=10, yin_scope=20, yin_shift_range=5)
Perform a forward pass
predicted, shifted_gt, cropped_hat, cropped_yin, scope_shift = decoder(
z_yin=torch.randn(2, 20, 64), yin_gt=torch.randn(2, 20, 64), z_mask=torch.randn(2, 1, 20)
)
- Raises:ValueError – If the input tensors do not have the expected shapes.
NOTE
The input tensors should be of appropriate shapes as defined in the arguments to ensure correct processing through the network.
Initialize the YingDecoder module.
- Parameters:
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Size of the convolutional kernel.
- dilation_rate (int) – Dilation rate of the convolutional layers.
- n_layers (int) – Number of convolutional layers.
- yin_start (int) – Start point of the yin target signal.
- yin_scope (int) – Scope of the yin target signal.
- yin_shift_range (int) – Maximum number of frames to shift the yin target signal.
- gin_channels (int , optional) – Number of global conditioning channels. Defaults to 0.
crop_scope(x, yin_start, scope_shift)
Crop the input tensor to a specified scope based on the yin start position.
This method extracts a segment from the input tensor x starting from yin_start and extending to yin_scope, with an optional shift defined by scope_shift. The function is designed to handle batch processing efficiently.
- Parameters:
- x (torch.Tensor) – Input tensor of shape [B, C, T], where B is the batch size, C is the number of channels, and T is the length of the sequence.
- yin_start (int) – Starting point of the yin target signal.
- scope_shift (torch.Tensor) – Shift tensor of shape [B], indicating the shift to apply for each element in the batch.
- Returns: Cropped tensor of shape [B, C, yin_scope], containing : the segments extracted from the input tensor based on the specified parameters.
- Return type: torch.Tensor
########### Examples
>>> import torch
>>> decoder = YingDecoder(hidden_channels=64, kernel_size=3,
... dilation_rate=1, n_layers=2,
... yin_start=10, yin_scope=5,
... yin_shift_range=2)
>>> x = torch.randn(2, 3, 20) # Batch of 2, 3 channels, sequence length 20
>>> yin_start = 10
>>> scope_shift = torch.tensor([1, -1]) # Example shifts for batch
>>> cropped_tensor = decoder.crop_scope(x, yin_start, scope_shift)
>>> print(cropped_tensor.shape) # Should output: torch.Size([2, 3, 5])
forward(z_yin, yin_gt, z_mask, g=None)
Forward pass of the decoder.
This method processes the input yin note sequence, applies the necessary transformations, and generates the predicted output along with the shifted and cropped ground truth sequences. The forward method is a crucial part of the YingDecoder, enabling the model to learn from the provided data.
- Parameters:
- z_yin (torch.Tensor) – The input yin note sequence of shape (B, C, T_yin).
- yin_gt (torch.Tensor) – The ground truth yin note sequence of shape (B, C, T_yin).
- z_mask (torch.Tensor) – The mask tensor of shape (B, 1, T_yin).
- g (torch.Tensor , optional) – The global conditioning tensor of shape (B, gin_channels, 1). Defaults to None.
- Returns: A tuple containing: : - torch.Tensor: The predicted yin note sequence of shape (B, C, T_yin).
- torch.Tensor: The shifted ground truth yin note sequence of shape : (B, C, T_yin).
- torch.Tensor: The cropped ground truth yin note sequence of shape : (B, C, T_yin).
- torch.Tensor: The cropped input yin note sequence of shape (B, C, T_yin).
- torch.Tensor: The scope shift tensor of shape (B,).
- Return type: tuple
########### Examples
>>> decoder = YingDecoder(hidden_channels=64, kernel_size=3,
... dilation_rate=1, n_layers=4,
... yin_start=0, yin_scope=10,
... yin_shift_range=5)
>>> z_yin = torch.randn(8, 10, 20) # Example input tensor
>>> yin_gt = torch.randn(8, 10, 20) # Example ground truth tensor
>>> z_mask = torch.ones(8, 1, 20) # Example mask tensor
>>> output = decoder(z_yin, yin_gt, z_mask)
>>> print([o.shape for o in output]) # Check shapes of output tensors
infer(z_yin, z_mask, g=None)
Ying decoder module for generating yin predictions from input tensors.
This module is responsible for decoding input yin target signals and generating predictions based on them. It leverages convolutional layers and global conditioning to enhance the prediction quality.
in_channels
Number of input channels, set to yin_scope.
- Type: int
out_channels
Number of output channels, set to yin_scope.
- Type: int
hidden_channels
Number of hidden channels in the network.
- Type: int
kernel_size
Size of the convolutional kernel.
- Type: int
dilation_rate
Dilation rate for the convolutional layers.
- Type: int
n_layers
Number of convolutional layers in the network.
- Type: int
gin_channels
Number of global conditioning channels.
- Type: int
yin_start
Start point of the yin target signal.
- Type: int
yin_scope
Scope of the yin target signal.
- Type: int
yin_shift_range
Maximum number of frames to shift the yin target signal.
Type: int
Parameters:
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Size of the convolutional kernel.
- dilation_rate (int) – Dilation rate of the convolutional layers.
- n_layers (int) – Number of convolutional layers.
- yin_start (int) – Start point of the yin target signal.
- yin_scope (int) – Scope of the yin target signal.
- yin_shift_range (int) – Maximum number of frames to shift the yin target signal.
- gin_channels (int , optional) – Number of global conditioning channels. Defaults to 0.
########### Examples
Create an instance of YingDecoder
decoder = YingDecoder(hidden_channels=64, kernel_size=3, dilation_rate=1,
n_layers=4, yin_start=10, yin_scope=20, yin_shift_range=5)
Perform inference
z_yin = torch.randn(32, 20, 64) # Example input tensor z_mask = torch.randn(32, 20, 1) # Example mask tensor predictions = decoder.infer(z_yin, z_mask)
- Raises:ValueError – If the dimensions of the input tensors do not match the expected shapes.