espnet2.diar.layers.tcn_nomask.DepthwiseSeparableConv

About 2 min

espnet2.diar.layers.tcn_nomask.DepthwiseSeparableConv

class espnet2.diar.layers.tcn_nomask.DepthwiseSeparableConv(in_channels, out_channels, kernel_size, stride, padding, dilation, norm_type='gLN', causal=False)

Bases: Module

Depthwise Separable Convolution Layer.

This class implements a depthwise separable convolution, which consists of a depthwise convolution followed by a pointwise convolution. The depthwise convolution applies a single filter per input channel, while the pointwise convolution mixes the outputs of the depthwise layer.

Parameters:
- in_channels (int) – Number of input channels.
- out_channels (int) – Number of output channels.
- kernel_size (int) – Size of the convolutional kernel.
- stride (int) – Stride of the convolution.
- padding (int) – Padding added to both sides of the input.
- dilation (int) – Dilation factor for the convolution.
- norm_type (str) – Type of normalization to use (‘gLN’, ‘cLN’, ‘BN’).
- causal (bool) – If True, applies causal convolution (output at time t does not depend on future time steps).

net

Sequential container of depthwise and pointwise convolutions along with normalization and activation.

Type: nn.Sequential
Returns: Output tensor after applying the depthwise separable : convolution.
Return type: result (Tensor)

####### Examples

>>> model = DepthwiseSeparableConv(in_channels=64, out_channels=128,
...                                  kernel_size=3, stride=1, padding=1,
...                                  dilation=1, norm_type='gLN', causal=False)
>>> input_tensor = torch.randn(32, 64, 100)  # Batch of 32, 64 channels, 100 length
>>> output_tensor = model(input_tensor)
>>> output_tensor.shape
torch.Size([32, 128, 100])  # Output will have 128 channels

NOTE

The depthwise convolution is performed with groups set to in_channels to achieve depthwise separability. If causal is set to True, a Chomp layer is used to ensure the output length matches the input length.

Raises:ValueError – If an unsupported normalization type is specified.

Initialize internal Module state, shared by both nn.Module and ScriptModule.

forward(x)

Forward pass for the TemporalConvNet.

This method takes the input tensor mixture_w and passes it through the temporal convolution network to extract bottleneck features. The expected input shape is [M, N, K], where M is the batch size, N is the number of filters in the autoencoder, and K is the sequence length. The output will be of shape [M, B, K], where B is the number of channels in the bottleneck 1x1-conv block.

Parameters:mixture_w – A tensor of shape [M, N, K], representing the input mixture signal, where M is the batch size, N is the number of channels, and K is the length of the signal.
Returns: A tensor of shape [M, B, K], representing the : extracted bottleneck features after passing through the network.
Return type: bottleneck_feature

####### Examples

>>> model = TemporalConvNet(N=64, B=32, H=128, P=3, X=4, R=3)
>>> mixture = torch.randn(10, 64, 100)  # Batch of 10, 64 channels, length 100
>>> output = model(mixture)
>>> print(output.shape)
torch.Size([10, 32, 100])  # Output shape should match [M, B, K]

NOTE

Ensure that the input tensor mixture_w is correctly shaped according to the specifications to avoid dimension mismatch errors.