espnet2.gan_svs.pits.ying_decoder.YingDecoder

About 4 min

espnet2.gan_svs.pits.ying_decoder.YingDecoder

class espnet2.gan_svs.pits.ying_decoder.YingDecoder(hidden_channels, kernel_size, dilation_rate, n_layers, yin_start, yin_scope, yin_shift_range, gin_channels=0)

Bases: Module

YingDecoder is a neural network module for decoding yin signals in the

context of GAN-based singing voice synthesis.

This module takes a yin target signal and processes it through several convolutional layers to generate a predicted yin signal. It supports shifting the input signal within a specified range to improve the robustness of the output.

in_channels

Number of input channels for the first convolution.

Type: int

out_channels

Number of output channels for the final convolution.

Type: int

hidden_channels

Number of hidden channels in the convolutional layers.

Type: int

kernel_size

Size of the convolutional kernel.

Type: int

dilation_rate

Dilation rate for the convolutional layers.

Type: int

n_layers

Number of convolutional layers.

Type: int

gin_channels

Number of global conditioning channels.

Type: int

yin_start

Start point of the yin target signal.

Type: int

yin_scope

Scope of the yin target signal.

Type: int

yin_shift_range

Maximum number of frames to shift the yin target signal.

Type: int
Parameters:
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Size of the convolutional kernel.
- dilation_rate (int) – Dilation rate of the convolutional layers.
- n_layers (int) – Number of convolutional layers.
- yin_start (int) – Start point of the yin target signal.
- yin_scope (int) – Scope of the yin target signal.
- yin_shift_range (int) – Maximum number of frames to shift the yin target signal.
- gin_channels (int , optional) – Number of global conditioning channels. Defaults to 0.

########### Examples

Create an instance of the YingDecoder

decoder = YingDecoder(hidden_channels=64, kernel_size=3, dilation_rate=1,

n_layers=4, yin_start=10, yin_scope=20, yin_shift_range=5)

Perform a forward pass

predicted, shifted_gt, cropped_hat, cropped_yin, scope_shift = decoder(

z_yin=torch.randn(2, 20, 64), yin_gt=torch.randn(2, 20, 64), z_mask=torch.randn(2, 1, 20)

)

Raises:ValueError – If the input tensors do not have the expected shapes.

NOTE

The input tensors should be of appropriate shapes as defined in the arguments to ensure correct processing through the network.

Initialize the YingDecoder module.

Parameters:
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Size of the convolutional kernel.
- dilation_rate (int) – Dilation rate of the convolutional layers.
- n_layers (int) – Number of convolutional layers.
- yin_start (int) – Start point of the yin target signal.
- yin_scope (int) – Scope of the yin target signal.
- yin_shift_range (int) – Maximum number of frames to shift the yin target signal.
- gin_channels (int , optional) – Number of global conditioning channels. Defaults to 0.

crop_scope(x, yin_start, scope_shift)

Crop the input tensor to a specified scope based on the yin start position.

This method extracts a segment from the input tensor x starting from yin_start and extending to yin_scope, with an optional shift defined by scope_shift. The function is designed to handle batch processing efficiently.

Parameters:
- x (torch.Tensor) – Input tensor of shape [B, C, T], where B is the batch size, C is the number of channels, and T is the length of the sequence.
- yin_start (int) – Starting point of the yin target signal.
- scope_shift (torch.Tensor) – Shift tensor of shape [B], indicating the shift to apply for each element in the batch.
Returns: Cropped tensor of shape [B, C, yin_scope], containing : the segments extracted from the input tensor based on the specified parameters.
Return type: torch.Tensor

########### Examples

>>> import torch
>>> decoder = YingDecoder(hidden_channels=64, kernel_size=3,
...                        dilation_rate=1, n_layers=2,
...                        yin_start=10, yin_scope=5,
...                        yin_shift_range=2)
>>> x = torch.randn(2, 3, 20)  # Batch of 2, 3 channels, sequence length 20
>>> yin_start = 10
>>> scope_shift = torch.tensor([1, -1])  # Example shifts for batch
>>> cropped_tensor = decoder.crop_scope(x, yin_start, scope_shift)
>>> print(cropped_tensor.shape)  # Should output: torch.Size([2, 3, 5])

forward(z_yin, yin_gt, z_mask, g=None)

Forward pass of the decoder.

This method processes the input yin note sequence, applies the necessary transformations, and generates the predicted output along with the shifted and cropped ground truth sequences. The forward method is a crucial part of the YingDecoder, enabling the model to learn from the provided data.

Parameters:
- z_yin (torch.Tensor) – The input yin note sequence of shape (B, C, T_yin).
- yin_gt (torch.Tensor) – The ground truth yin note sequence of shape (B, C, T_yin).
- z_mask (torch.Tensor) – The mask tensor of shape (B, 1, T_yin).
- g (torch.Tensor , optional) – The global conditioning tensor of shape (B, gin_channels, 1). Defaults to None.
Returns: A tuple containing: : - torch.Tensor: The predicted yin note sequence of shape (B, C, T_yin).
- torch.Tensor: The shifted ground truth yin note sequence of shape : (B, C, T_yin).
- torch.Tensor: The cropped ground truth yin note sequence of shape : (B, C, T_yin).
- torch.Tensor: The cropped input yin note sequence of shape (B, C, T_yin).
- torch.Tensor: The scope shift tensor of shape (B,).
Return type: tuple

########### Examples

>>> decoder = YingDecoder(hidden_channels=64, kernel_size=3,
...                        dilation_rate=1, n_layers=4,
...                        yin_start=0, yin_scope=10,
...                        yin_shift_range=5)
>>> z_yin = torch.randn(8, 10, 20)  # Example input tensor
>>> yin_gt = torch.randn(8, 10, 20)  # Example ground truth tensor
>>> z_mask = torch.ones(8, 1, 20)  # Example mask tensor
>>> output = decoder(z_yin, yin_gt, z_mask)
>>> print([o.shape for o in output])  # Check shapes of output tensors

infer(z_yin, z_mask, g=None)

Ying decoder module for generating yin predictions from input tensors.

This module is responsible for decoding input yin target signals and generating predictions based on them. It leverages convolutional layers and global conditioning to enhance the prediction quality.

in_channels

Number of input channels, set to yin_scope.

Type: int

out_channels

Number of output channels, set to yin_scope.

Type: int

hidden_channels

Number of hidden channels in the network.

Type: int

kernel_size

Size of the convolutional kernel.

Type: int

dilation_rate

Dilation rate for the convolutional layers.

Type: int

n_layers

Number of convolutional layers in the network.

Type: int

gin_channels

Number of global conditioning channels.

Type: int

yin_start

Start point of the yin target signal.

Type: int

yin_scope

Scope of the yin target signal.

Type: int

yin_shift_range

Maximum number of frames to shift the yin target signal.

Type: int
Parameters:
- hidden_channels (int) – Number of hidden channels.
- kernel_size (int) – Size of the convolutional kernel.
- dilation_rate (int) – Dilation rate of the convolutional layers.
- n_layers (int) – Number of convolutional layers.
- yin_start (int) – Start point of the yin target signal.
- yin_scope (int) – Scope of the yin target signal.
- yin_shift_range (int) – Maximum number of frames to shift the yin target signal.
- gin_channels (int , optional) – Number of global conditioning channels. Defaults to 0.

########### Examples

Create an instance of YingDecoder

decoder = YingDecoder(hidden_channels=64, kernel_size=3, dilation_rate=1,

n_layers=4, yin_start=10, yin_scope=20, yin_shift_range=5)

Perform inference

z_yin = torch.randn(32, 20, 64) # Example input tensor z_mask = torch.randn(32, 20, 1) # Example mask tensor predictions = decoder.infer(z_yin, z_mask)

Raises:ValueError – If the dimensions of the input tensors do not match the expected shapes.