Python package¶
make_g2p
¶
The easiest way to create a transducer programmatically is to use the g2p.make_g2p
function.
To use it, first import the function:
from g2p import make_g2p
Then, call it with an argument for in_lang
and out_lang
. Both must be strings equal to the name of a particular mapping.
>>> transducer = make_g2p("dan", "eng-arpabet")
>>> transducer("hej").output_string
'HH EH Y'
There must be a valid path between the in_lang
and out_lang
in order for this to work. If you've edited a mapping or added a custom mapping, you must update g2p to include it: g2p update
make_tokenizer
¶
Basic usage for the language-aware tokenizer:
from g2p import make_tokenizer
tokenizer = make_tokenizer("dan")
for token in tokenizer.tokenize_text("Åh, hvordan har du det, Åbenrå?"):
if token.is_word
word = token.text
else:
interword_punctuation_and_spaces = token.text
Note that selecting the tokenizer language is important to make sure punctuation-like letters are handled correctly. For example :
and '
are punctuation in English but they will be part of the word tokens in Kanien'kéha (moh):
>>> list(make_tokenizer("moh").tokenize_text("Kanien'kéha"))
[{'text': "Kanien'kéha", 'is_word': True}]
>>> list(make_tokenizer("eng").tokenize_text("Kanien'kéha"))
[{'text': 'Kanien', 'is_word': True}, {'text': "'", 'is_word': False}, {'text': 'kéha', 'is_word': True}]
A look under the hood¶
A Mapping object is a list of defined rules. A Rule
has the following permitted fields:
g2p.mappings.Rule
¶
Bases: BaseModel
rule_input: str = Field(alias='in')
class-attribute
instance-attribute
¶
The character(s) to convert
rule_output: str = Field(alias='out')
class-attribute
instance-attribute
¶
What to convert the 'in' characters to
context_before: str = ''
class-attribute
instance-attribute
¶
The context before 'in' required for the rule to apply
context_after: str = ''
class-attribute
instance-attribute
¶
The context after 'in' required for the rule to apply
prevent_feeding: bool = False
class-attribute
instance-attribute
¶
Whether to prevent the rule from feeding other rules
comment: Optional[str] = None
class-attribute
instance-attribute
¶
An optional comment about the rule.
export_to_dict(exclude=None, exclude_none=True, exclude_defaults=True, by_alias=True)
¶
All the options for exporting are tedious to keep track of so this is a helper function