espnet2.tasks.asr_transducer.ASRTransducerTask

About 4 min

espnet2.tasks.asr_transducer.ASRTransducerTask

class espnet2.tasks.asr_transducer.ASRTransducerTask

Bases: AbsTask

ASR Transducer Task definition.

This class implements the ASR (Automatic Speech Recognition) Transducer task, which includes functionalities for building models, processing data, and handling task-specific arguments.

num_optimizers

Number of optimizers used for training.

Type: int

class_choices_list

List of available class choices for frontend, specaug, normalization, and decoder.

Type: List[ClassChoices]

trainer

Trainer class used for managing training processes.

Type:Trainer

add_task_arguments(parser

argparse.ArgumentParser): Adds ASR Transducer task arguments to the provided argument parser.

build_collate_fn(args

argparse.Namespace, train: bool) -> Callable: Builds a collate function for batching data.

build_preprocess_fn(args

argparse.Namespace, train: bool) -> Optional[Callable]: Builds a pre-processing function for input data.

required_data_names(train

bool = True, inference: bool = False) -> Tuple[str, …]: Returns the required data names based on task mode.

optional_data_names(train

bool = True, inference: bool = False) -> Tuple[str, …]: Returns the optional data names based on task mode.

build_model(args

argparse.Namespace) -> ESPnetASRTransducerModel: Builds and returns the ASR Transducer model based on the provided arguments.

################# Examples

To add task arguments: : parser = argparse.ArgumentParser() ASRTransducerTask.add_task_arguments(parser)

To build a model: : args = parser.parse_args() model = ASRTransducerTask.build_model(args)

Raises:
- RuntimeError – If the token_list is not of type str or list.
- NotImplementedError – If the initialization is not supported.

######### NOTE The class relies on various components like frontend, specaug, normalization, and decoders, which can be customized through the task arguments.

classmethod add_task_arguments(parser: ArgumentParser)

Add Transducer task arguments.

This method is responsible for adding command-line arguments specific to the ASR Transducer task to the provided argument parser. The arguments include configuration options for the model, encoder, joint network, preprocessing, and data augmentation.

Parameters:
- cls – ASRTransducerTask object.
- parser – Transducer arguments parser.

################# Examples

To add task arguments to an argument parser, you can use:

``

python import argparse from your_module import ASRTransducerTask

parser = argparse.ArgumentParser() ASRTransducerTask.add_task_arguments(parser) args = parser.parse_args()

``

######### NOTE The method modifies the parser in-place by adding a group of arguments related to the ASR Transducer task, such as –token_list, –input_size, –init, and others.

classmethod build_collate_fn(args: Namespace, train: bool) → Callable[[Collection[Tuple[str, Dict[str, ndarray]]]], Tuple[List[str], Dict[str, Tensor]]]

Build collate function.

This method constructs a collate function that is used to combine multiple samples into a mini-batch during training or evaluation.

Parameters:
- cls – ASRTransducerTask object.
- args – Task arguments containing configurations for the collate function.
- train – A boolean indicating whether the function is for training mode or not.
Returns: A collate function that takes a collection of tuples, where each tuple contains a string and a dictionary of NumPy arrays, and returns a tuple containing a list of strings and a dictionary of PyTorch tensors.
Return type: Callable

################# Examples

>>> collate_fn = ASRTransducerTask.build_collate_fn(args, train=True)
>>> batch = collate_fn(data)
>>> print(batch)
(['example1', 'example2'], {'features': tensor(...), ...})

classmethod build_model(args: Namespace) → ESPnetASRTransducerModel

Builds the ASR Transducer model based on provided arguments.

This method constructs an instance of the ESPnetASRTransducerModel by configuring the frontend, data augmentation, normalization, encoder, decoder, and joint network based on the parameters specified in the args argument.

Parameters:
- cls – ASRTransducerTask object.
- args – Task arguments containing configurations for model components.
Returns: An instance of the ASR Transducer model configured as per the specified arguments.
Return type: model
Raises:
- RuntimeError – If token_list is neither a string nor a list.
- NotImplementedError – If model initialization is requested but not
- currently supported. –

################# Examples

>>> args = argparse.Namespace()
>>> args.token_list = "path/to/token_list.txt"
>>> args.input_size = None
>>> args.specaug = "specaug"
>>> model = ASRTransducerTask.build_model(args)
>>> print(model)

######### NOTE The token_list is read from a file if provided as a string. If it’s a list, it is used directly. The method logs the vocabulary size and initializes various components of the model as specified in the arguments.

classmethod build_preprocess_fn(args: Namespace, train: bool) → Callable[[str, Dict[str, array]], Dict[str, ndarray]] | None

Build pre-processing function.

This method constructs a pre-processing function based on the provided arguments. If preprocessing is enabled, it utilizes the CommonPreprocessor to handle various preprocessing tasks such as tokenization, noise application, and volume normalization.

Parameters:
- cls – ASRTransducerTask object.
- args – Task arguments containing configurations for preprocessing.
- train – A boolean indicating whether the function is for training mode.
Returns: A callable pre-processing function that takes a string and a dictionary of numpy arrays as input and returns a dictionary of numpy arrays, or None if preprocessing is not enabled.

################# Examples

>>> from argparse import Namespace
>>> args = Namespace(use_preprocessor=True, token_type='bpe', ...)
>>> preprocess_fn = ASRTransducerTask.build_preprocess_fn(args, train=True)
>>> result = preprocess_fn("sample text", {"feature": np.array([1, 2, 3])})

######### NOTE This function is intended for use in preparing input data for the ASR model during training or inference.

class_choices_list

*: List[[ClassChoices](../train/ClassChoices.md#espnet2.train.class_choices.ClassChoices)]* *= [<espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>, <espnet2.train.class_choices.ClassChoices object>]*

num_optimizers

*: int* *= 1*

classmethod optional_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]

Optional data depending on task mode.

This method returns a tuple of optional data names that may be used during the training or inference modes of the ASR Transducer Task.

Parameters:
- cls – ASRTransducerTask object.
- train – A boolean indicating whether the task is in training mode.
- inference – A boolean indicating whether the task is in inference mode.
Returns: A tuple containing the optional task data names.
Return type: retval

################# Examples

>>> ASRTransducerTask.optional_data_names(train=True)
()
>>> ASRTransducerTask.optional_data_names(inference=True)
()

######### NOTE The default implementation returns an empty tuple, indicating that there are no optional data names. Subclasses may override this method to provide specific optional data names.

classmethod required_data_names(train: bool = True, inference: bool = False) → Tuple[str, ...]

Required data depending on task mode.

This method returns the names of the required data based on whether the task is in training or inference mode.

Parameters:
- cls – ASRTransducerTask object.
- train – A boolean indicating if the task is in training mode.
- inference – A boolean indicating if the task is in inference mode.
Returns: A tuple containing the required task data names. : - If not in inference mode, returns (“speech”, “text”).
- If in inference mode, returns (“speech”,).
Return type: Tuple[str, …]

################# Examples

>>> ASRTransducerTask.required_data_names(train=True, inference=False)
('speech', 'text')
>>> ASRTransducerTask.required_data_names(train=False, inference=True)
('speech',)

trainer

alias of Trainer