espnet2.diar.layers.tcn_nomask.TemporalConvNet

About 2 min

espnet2.diar.layers.tcn_nomask.TemporalConvNet

class espnet2.diar.layers.tcn_nomask.TemporalConvNet(N, B, H, P, X, R, norm_type='gLN', causal=False)

Bases: Module

Temporal Convolutional Network for speech separation.

This class implements the Temporal Convolutional Network (TCN) as proposed in Luo et al. in the paper “Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation”. The TCN is designed to process time-series data and is utilized in the context of speech separation tasks.

network

A sequential container comprising layer normalization, a bottleneck convolution, and multiple temporal blocks.

Parameters:
- N (int) – Number of filters in the autoencoder.
- B (int) – Number of channels in the bottleneck 1x1 convolution block.
- H (int) – Number of channels in convolutional blocks.
- P (int) – Kernel size in convolutional blocks.
- X (int) – Number of convolutional blocks in each repeat.
- R (int) – Number of repeats of the block structure.
- norm_type (str) – Normalization type, can be ‘BN’, ‘gLN’, or ‘cLN’.
- causal (bool) – If True, applies causal convolutions; otherwise, applies non-causal convolutions.
Returns: A tensor of shape [M, B, K] where M is the batch size, : B is the number of bottleneck channels, and K is the length of the input sequences.
Return type: bottleneck_feature

####### Examples

>>> model = TemporalConvNet(N=64, B=16, H=32, P=3, X=4, R=2)
>>> mixture_w = torch.randn(10, 64, 100)  # Batch of 10, 64 channels, length 100
>>> output = model(mixture_w)
>>> output.shape
torch.Size([10, 16, 100])  # Output shape after processing

NOTE

The output length will remain consistent with the input length if appropriate padding is used.

Raises:ValueError – If norm_type is not one of ‘BN’, ‘gLN’, or ‘cLN’.

Basic Module of tasnet.

Parameters:
- N – Number of filters in autoencoder
- B – Number of channels in bottleneck 1 * 1-conv block
- H – Number of channels in convolutional blocks
- P – Kernel size in convolutional blocks
- X – Number of convolutional blocks in each repeat
- R – Number of repeats
- norm_type – BN, gLN, cLN
- causal – causal or non-causal

forward(mixture_w)

Forward pass for the TemporalConvNet.

This method processes the input mixture of waveforms through the temporal convolutional network to produce bottleneck features.

Parameters:
- mixture_w (torch.Tensor) – Input tensor of shape [M, N, K],
- size (where M is the batch)
- filters (N is the number of)

:param : :param and K is the sequence length.:

Returns: Output tensor of shape [M, B, K], where B is the number of channels in the bottleneck layer.
Return type: torch.Tensor

####### Examples

>>> model = TemporalConvNet(N=64, B=32, H=128, P=3, X=4, R=2)
>>> mixture = torch.randn(16, 64, 100)  # Batch of 16, 64 filters, 100 length
>>> output = model(mixture)
>>> print(output.shape)
torch.Size([16, 32, 100])  # Expected output shape

NOTE

Ensure that the input tensor is properly shaped as specified to avoid dimension errors during processing.