espnet2.diar.layers.tcn_nomask.TemporalConvNet
espnet2.diar.layers.tcn_nomask.TemporalConvNet
class espnet2.diar.layers.tcn_nomask.TemporalConvNet(N, B, H, P, X, R, norm_type='gLN', causal=False)
Bases: Module
Temporal Convolutional Network for speech separation.
This class implements the Temporal Convolutional Network (TCN) as proposed in Luo et al. in the paper “Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation”. The TCN is designed to process time-series data and is utilized in the context of speech separation tasks.
network
A sequential container comprising layer normalization, a bottleneck convolution, and multiple temporal blocks.
- Parameters:
- N (int) – Number of filters in the autoencoder.
- B (int) – Number of channels in the bottleneck 1x1 convolution block.
- H (int) – Number of channels in convolutional blocks.
- P (int) – Kernel size in convolutional blocks.
- X (int) – Number of convolutional blocks in each repeat.
- R (int) – Number of repeats of the block structure.
- norm_type (str) – Normalization type, can be ‘BN’, ‘gLN’, or ‘cLN’.
- causal (bool) – If True, applies causal convolutions; otherwise, applies non-causal convolutions.
- Returns: A tensor of shape [M, B, K] where M is the batch size, : B is the number of bottleneck channels, and K is the length of the input sequences.
- Return type: bottleneck_feature
####### Examples
>>> model = TemporalConvNet(N=64, B=16, H=32, P=3, X=4, R=2)
>>> mixture_w = torch.randn(10, 64, 100) # Batch of 10, 64 channels, length 100
>>> output = model(mixture_w)
>>> output.shape
torch.Size([10, 16, 100]) # Output shape after processing
NOTE
The output length will remain consistent with the input length if appropriate padding is used.
- Raises:ValueError – If norm_type is not one of ‘BN’, ‘gLN’, or ‘cLN’.
Basic Module of tasnet.
- Parameters:
- N – Number of filters in autoencoder
- B – Number of channels in bottleneck 1 * 1-conv block
- H – Number of channels in convolutional blocks
- P – Kernel size in convolutional blocks
- X – Number of convolutional blocks in each repeat
- R – Number of repeats
- norm_type – BN, gLN, cLN
- causal – causal or non-causal
forward(mixture_w)
Forward pass for the TemporalConvNet.
This method processes the input mixture of waveforms through the temporal convolutional network to produce bottleneck features.
- Parameters:
- mixture_w (torch.Tensor) – Input tensor of shape [M, N, K],
- size (where M is the batch)
- filters (N is the number of)
:param : :param and K is the sequence length.:
- Returns: Output tensor of shape [M, B, K], where B is the number of channels in the bottleneck layer.
- Return type: torch.Tensor
####### Examples
>>> model = TemporalConvNet(N=64, B=32, H=128, P=3, X=4, R=2)
>>> mixture = torch.randn(16, 64, 100) # Batch of 16, 64 filters, 100 length
>>> output = model(mixture)
>>> print(output.shape)
torch.Size([16, 32, 100]) # Expected output shape
NOTE
Ensure that the input tensor is properly shaped as specified to avoid dimension errors during processing.