Skip to content

Python package

make_g2p

The easiest way to create a transducer programmatically is to use the g2p.make_g2p function.

To use it, first import the function:

from g2p import make_g2p

Then, call it with an argument for in_lang and out_lang. Both must be strings equal to the name of a particular mapping.

>>> transducer = make_g2p("dan", "eng-arpabet")
>>> transducer("hej").output_string
'HH EH Y'

There must be a valid path between the in_lang and out_lang in order for this to work. If you've edited a mapping or added a custom mapping, you must update g2p to include it: g2p update

make_tokenizer

Basic usage for the language-aware tokenizer:

from g2p import make_tokenizer
tokenizer = make_tokenizer("dan")
for token in tokenizer.tokenize_text("Åh, hvordan har du det, Åbenrå?"):
    if token.is_word
        word = token.text
    else:
        interword_punctuation_and_spaces = token.text

Note that selecting the tokenizer language is important to make sure punctuation-like letters are handled correctly. For example : and ' are punctuation in English but they will be part of the word tokens in Kanien'kéha (moh):

>>> list(make_tokenizer("moh").tokenize_text("Kanien'kéha"))
[{'text': "Kanien'kéha", 'is_word': True}]
>>> list(make_tokenizer("eng").tokenize_text("Kanien'kéha"))
[{'text': 'Kanien', 'is_word': True}, {'text': "'", 'is_word': False}, {'text': 'kéha', 'is_word': True}]

A look under the hood

A Mapping object is a list of defined rules. A Rule has the following permitted fields:

g2p.mappings.Rule

Bases: BaseModel

rule_input: str = Field(alias='in') class-attribute instance-attribute

The character(s) to convert

rule_output: str = Field(alias='out') class-attribute instance-attribute

What to convert the 'in' characters to

context_before: str = '' class-attribute instance-attribute

The context before 'in' required for the rule to apply

context_after: str = '' class-attribute instance-attribute

The context after 'in' required for the rule to apply

prevent_feeding: bool = False class-attribute instance-attribute

Whether to prevent the rule from feeding other rules

comment: Optional[str] = None class-attribute instance-attribute

An optional comment about the rule.

export_to_dict(exclude=None, exclude_none=True, exclude_defaults=True, by_alias=True)

All the options for exporting are tedious to keep track of so this is a helper function

Comments