Python package¶

The easiest way to create a transducer programmatically is to use the g2p.make_g2p function.

To use it, first import the function:

from g2p import make_g2p

Then, call it with an argument for in_lang and out_lang. Both must be strings equal to the name of a particular mapping.

>>> transducer = make_g2p('dan', 'eng-arpabet')
>>> transducer('hej').output_string
'HH EH Y'

There must be a valid path between the in_lang and out_lang in order for this to work. If you've edited a mapping or added a custom mapping, you must update g2p to include it: g2p update

A look under the hood¶

A Mapping object is a list of defined rules.

`g2p.mappings.Mapping` ¶

Class for lookup tables

@param as_is: bool = True Affects whether or not rules are sorted or left as is. Please use rule_ordering instead. If True, Evaluate g2p rules in mapping in the order they are written. If False, rules will be reverse sorted by length.

.. deprecated:: 0.6
    use ``rule_ordering`` instead

@param case_sensitive: bool = True Lower all rules and conversion input

@param escape_special: bool = False Escape special characters in rules

@param norm_form: str = "NFD" Normalization standard to follow. NFC | NKFC | NFD | NKFD | none

@param out_delimiter: str = "" Separate output transformations with a delimiter

@param reverse: bool = False Reverse all mappings

@param rule_ordering: str = "as-written" Affects in what order the rules are applied.

If set to ``"as-written"``, rules are applied from top-to-bottom in the order that they
are written in the source file
(previously this was accomplished with ``as_is=True``).

If set to ``"apply-longest-first"``, rules are first sorted such that rules with the longest
input are applied first. Sorting the rules like this prevents shorter rules
from taking part in feeding relations
(previously this was accomplished with ``as_is=False``).

@param prevent_feeding: bool = False Converts each rule into an intermediary form

@param type: str = None Type of mapping, either "mapping" (rules), "unidecode" (magical Unicode guessing) or "lexicon" (lookup in an aligned lexicon).

@param alignments: str = None A string specifying a file from which to load alignments when type = "lexicon".

`add_abbreviations(abbs, mappings)` ¶

Return abbreviated forms, given a list of abbreviations.

{'in': 'a', 'out': 'b', 'context_before': 'V', 'context_after': '' } {'abbreviation': 'V', 'stands_for': ['a','b','c']} -> {'in': 'a', 'out': 'b', 'context_before': 'a|b|c', 'context_after': ''}

`config_to_file(output_path=os.path.join(GEN_DIR, 'config.yaml'), mapping_type='json')` ¶

Write config to file

`deduplicate()` ¶

Remove duplicate rules found in self, keeping the first copy found.

`extend(mapping)` ¶

Add all the rules from mapping into self, effectively merging two mappings

Caveat: if self and mapping have contradictory rules, which one will "win" is unspecified, and may depend on mapping configuration options.

`find_mapping_by_id(map_id)` `staticmethod` ¶

Find the mapping with a given ID

`index(item)` ¶

Find the location of an item in self

`inventory(in_or_out='in')` ¶

Return just inputs or outputs as inventory of mapping

`mapping_to_file(output_path=GEN_DIR, file_type='json')` ¶

Write mapping to file

`mapping_to_stream(out_stream, file_type='json')` ¶

Write mapping to a stream

`mapping_type(name)` `staticmethod` ¶

Return the type of a mapping given its name

`plain_mapping(skip_empty_contexts=False)` ¶

Return the plain mapping for displaying or saving to disk.

Parameters:

Name	Type	Description	Default
`skip_empty_contexts`	`bool`	when set, filter out empty context_before/after	`False`

`process_kwargs(mapping)` ¶

Apply kwargs in the order they are provided. kwargs are ordered as of python 3.6

`process_loaded_config(config)` ¶

For a mapping loaded from a file, take the keyword arguments and supply them to the Mapping, and get any abbreviations data.

`reverse_mappings(mapping)` ¶

Reverse the mapping

`rule_to_regex(rule)` ¶

Turns an input string (and the context) from an input/output pair into a regular expression pattern"

The 'in' key is the match. The 'context_after' key creates a lookahead. The 'context_before' key creates a lookbehind.

Parameters:

Name	Type	Description	Default
`rule`	`dict`	A dictionary containing 'in', 'out', 'context_before', and 'context_after' keys	required

Raises:

Type	Description
`Exception`	This is raised when un-supported regex characters or symbols exist in the rule

Returns:

Name	Type	Description
`Pattern`	`Union[Pattern, None]`	returns a regex pattern (re.Pattern)
`None`	`Union[Pattern, None]`	if input is null

`wants_rules_sorted()` ¶

Returns whether the rules will be sorted prior to finalizing the mapping.

Returns:

Name	Type	Description
`bool`	`bool`	True if the rules should be sorted.

A Transducer object is initialized with a Mapping object and when called, applies each rule of the Mapping in sequence on the input to produce the resulting output.

`g2p.transducer.Transducer` ¶

This is the fundamental class for performing conversions in the g2p library.

Each Transducer must be initialized with a Mapping object. The Transducer object can then be called to apply the rules from Mapping on a given input.

Attributes:

Name	Type	Description
`mapping`	`Mapping`	Formatted input/output pairs using the g2p.mappings.Mapping class.

`in_lang: str` `property` ¶

Input language node name

`out_lang: str` `property` ¶

Output language node name

`transducers: List[Transducer]` `property` ¶

Sequence of underlying Transducer objects

`call(to_convert, index=False, debugger=False)` ¶

The basic method to transduce an input. A proxy for self.apply_rules.

Parameters:

Name	Type	Description	Default
`to_convert`	`str`	The string to convert.	required

Returns:

Name	Type	Description
`TransductionGraph`		Returns an object with all the nodes representing input
		and output characters and their corresponding edges representing the indices
		of the transformation.

`change_character(tg, character, index_to_change)` ¶

Change character at index_to_change in TransductionGraph output to character

Parameters:

Name	Type	Description	Default
`tg`	`TransductionGraph`	the current Transduction Graph	required
`character`	`str`	the character to change to	required
`index_to_change`	`int`	index of character to change	required

`delete_character(tg, index_to_delete, ahh)` ¶

Delete character at index_to_delete in TransductionGraph output

Parameters:

Name	Type	Description	Default
`tg`	`TransductionGraph`	the current Transduction Graph	required
`index_to_delete`	`int`	index of character to delete	required
`ahh`	`int`	current value of i in calling loop	required

`get_longest_and_shortest(in_string_or_matches, out_string_or_matches)` ¶

Given two strings or match lists determine the longest and shortest. If the input is longer than the output, the process is to delete, if the output is longer than the input, the process is to insert. If the input and output are the same length, the process is basic.

Parameters:

Name	Type	Description	Default
`in_string_or_matches`	`str \| List`	input string	required
`out_string_or_matches`	`str \| List`	output string	required

`get_match_groups(tg, start_end, io, diff_from_input, out_string, output_start)` ¶

Take the inputs to explicit indices matching and create groups of Input and Output matches that are grouped by their explicit indices.

For example, applying a rule that is defined: a{1}b{2} → b{2}a{1} on the input "ab"
will return inputs, outputs where:

inputs = {'1': [{'index': 0, 'string': 'a'}], '2': [{'index': 1, 'string': 'b'}] }
outputs = {'1': [{'index': 0, 'string': 'b'}], '2': [{'index': 1, 'string': 'a'}] }

This allows input match groups to be iterated through in sequence regardless of their
character sequence.

Parameters:

Name	Type	Description	Default
`tg`	`TransductionGraph`	the graph holding information about the transduction	required
`start_end`	`Tuple(int, int`	a tuple containing the start and end of the input match	required
`io`	`List`	an input/output rule	required
`diff_from_input`	`DefaultDict`	A dictionary containing the single character distance from a given character index to its input	required
`out_string`	`str`	the raw output string	required
`output_start`	`int`	the diff-offset start of the match with respect to the output	required

Returns:

Name	Type	Description
`inputs`	`dict`	dictionary containing matches grouped by explicit index match
`outputs`	`dict`	dictionary containing matches grouped by explicit index match

`insert_character(tg, character_to_insert, index_to_insert_character)` ¶

Insert character at index_to_insert_character in TransductionGraph output

Parameters:

Name	Type	Description	Default
`tg`	`TransductionGraph`	the current Transduction Graph	required
`character_to_insert`	`str`	the character to insert	required
`index_to_insert_character`	`int`	index of character to insert	required

`resolve_intermediate_chars(output_string)` ¶

Go through all chars and resolve any intermediate characters from the Private Supplementary Use Area to their mapped equivalents.

`update_explicit_indices(tg, match, start_end, io, diff_from_input, diff_from_output, out_string)` ¶

Takes an arbitrary number of input & output strings and their corresponding index offsets. It then zips them up according to the provided indexing notation.

Example

A rule that turns a sequence of k\u0313 to 'k might would have a default indexing of k -> ' and \u0313 -> k It might be desired though to show that k -> k and \u0313 -> ' and their indices were transposed. For this, the Mapping could be given the following: [{'in': 'k{1}\u0313{2}', 'out': "'{2}k{1}"}] Indices are found with r'(?<={)\d+(?=})' and characters are found with r'[^0-9{}]+(?={\d+})'

Python package¶

A look under the hood¶

g2p.mappings.Mapping ¶

add_abbreviations(abbs, mappings) ¶

config_to_file(output_path=os.path.join(GEN_DIR, 'config.yaml'), mapping_type='json') ¶

deduplicate() ¶

extend(mapping) ¶

find_mapping_by_id(map_id) staticmethod ¶

index(item) ¶

inventory(in_or_out='in') ¶

mapping_to_file(output_path=GEN_DIR, file_type='json') ¶

mapping_to_stream(out_stream, file_type='json') ¶

mapping_type(name) staticmethod ¶

plain_mapping(skip_empty_contexts=False) ¶

process_kwargs(mapping) ¶

process_loaded_config(config) ¶

reverse_mappings(mapping) ¶

rule_to_regex(rule) ¶

wants_rules_sorted() ¶

g2p.transducer.Transducer ¶

in_lang: str property ¶

out_lang: str property ¶

transducers: List[Transducer] property ¶

__call__(to_convert, index=False, debugger=False) ¶

change_character(tg, character, index_to_change) ¶

delete_character(tg, index_to_delete, ahh) ¶

get_longest_and_shortest(in_string_or_matches, out_string_or_matches) ¶

get_match_groups(tg, start_end, io, diff_from_input, out_string, output_start) ¶

insert_character(tg, character_to_insert, index_to_insert_character) ¶

resolve_intermediate_chars(output_string) ¶

update_explicit_indices(tg, match, start_end, io, diff_from_input, diff_from_output, out_string) ¶

Comments

`g2p.mappings.Mapping` ¶

`add_abbreviations(abbs, mappings)` ¶

`config_to_file(output_path=os.path.join(GEN_DIR, 'config.yaml'), mapping_type='json')` ¶

`deduplicate()` ¶

`extend(mapping)` ¶

`find_mapping_by_id(map_id)` `staticmethod` ¶

`index(item)` ¶

`inventory(in_or_out='in')` ¶

`mapping_to_file(output_path=GEN_DIR, file_type='json')` ¶

`mapping_to_stream(out_stream, file_type='json')` ¶

`mapping_type(name)` `staticmethod` ¶

`plain_mapping(skip_empty_contexts=False)` ¶

`process_kwargs(mapping)` ¶

`process_loaded_config(config)` ¶

`reverse_mappings(mapping)` ¶

`rule_to_regex(rule)` ¶

`wants_rules_sorted()` ¶

`g2p.transducer.Transducer` ¶

`in_lang: str` `property` ¶

`out_lang: str` `property` ¶

`transducers: List[Transducer]` `property` ¶

`call(to_convert, index=False, debugger=False)` ¶

`change_character(tg, character, index_to_change)` ¶

`delete_character(tg, index_to_delete, ahh)` ¶

`get_longest_and_shortest(in_string_or_matches, out_string_or_matches)` ¶

`get_match_groups(tg, start_end, io, diff_from_input, out_string, output_start)` ¶

`insert_character(tg, character_to_insert, index_to_insert_character)` ¶

`resolve_intermediate_chars(output_string)` ¶

`update_explicit_indices(tg, match, start_end, io, diff_from_input, diff_from_output, out_string)` ¶