Skip to content

Python package

The easiest way to create a transducer programmatically is to use the g2p.make_g2p function.

To use it, first import the function:

from g2p import make_g2p

Then, call it with an argument for in_lang and out_lang. Both must be strings equal to the name of a particular mapping.

>>> transducer = make_g2p('dan', 'eng-arpabet')
>>> transducer('hej').output_string
'HH EH Y'

There must be a valid path between the in_lang and out_lang in order for this to work. If you've edited a mapping or added a custom mapping, you must update g2p to include it: g2p update

A look under the hood

A Mapping object is a list of defined rules.

g2p.mappings.Mapping

Class for lookup tables

@param as_is: bool = True Affects whether or not rules are sorted or left as is. Please use rule_ordering instead. If True, Evaluate g2p rules in mapping in the order they are written. If False, rules will be reverse sorted by length.

.. deprecated:: 0.6
    use ``rule_ordering`` instead

@param case_sensitive: bool = True Lower all rules and conversion input

@param escape_special: bool = False Escape special characters in rules

@param norm_form: str = "NFD" Normalization standard to follow. NFC | NKFC | NFD | NKFD | none

@param out_delimiter: str = "" Separate output transformations with a delimiter

@param reverse: bool = False Reverse all mappings

@param rule_ordering: str = "as-written" Affects in what order the rules are applied.

If set to ``"as-written"``, rules are applied from top-to-bottom in the order that they
are written in the source file
(previously this was accomplished with ``as_is=True``).

If set to ``"apply-longest-first"``, rules are first sorted such that rules with the longest
input are applied first. Sorting the rules like this prevents shorter rules
from taking part in feeding relations
(previously this was accomplished with ``as_is=False``).

@param prevent_feeding: bool = False Converts each rule into an intermediary form

@param type: str = None Type of mapping, either "mapping" (rules), "unidecode" (magical Unicode guessing) or "lexicon" (lookup in an aligned lexicon).

@param alignments: str = None A string specifying a file from which to load alignments when type = "lexicon".

add_abbreviations(abbs, mappings)

Return abbreviated forms, given a list of abbreviations.

{'in': 'a', 'out': 'b', 'context_before': 'V', 'context_after': '' } {'abbreviation': 'V', 'stands_for': ['a','b','c']} -> {'in': 'a', 'out': 'b', 'context_before': 'a|b|c', 'context_after': ''}

config_to_file(output_path=os.path.join(GEN_DIR, 'config.yaml'), mapping_type='json')

Write config to file

deduplicate()

Remove duplicate rules found in self, keeping the first copy found.

extend(mapping)

Add all the rules from mapping into self, effectively merging two mappings

Caveat: if self and mapping have contradictory rules, which one will "win" is unspecified, and may depend on mapping configuration options.

find_mapping_by_id(map_id) staticmethod

Find the mapping with a given ID

index(item)

Find the location of an item in self

inventory(in_or_out='in')

Return just inputs or outputs as inventory of mapping

mapping_to_file(output_path=GEN_DIR, file_type='json')

Write mapping to file

mapping_to_stream(out_stream, file_type='json')

Write mapping to a stream

mapping_type(name) staticmethod

Return the type of a mapping given its name

plain_mapping(skip_empty_contexts=False)

Return the plain mapping for displaying or saving to disk.

Parameters:

Name Type Description Default
skip_empty_contexts bool

when set, filter out empty context_before/after

False
process_kwargs(mapping)

Apply kwargs in the order they are provided. kwargs are ordered as of python 3.6

process_loaded_config(config)

For a mapping loaded from a file, take the keyword arguments and supply them to the Mapping, and get any abbreviations data.

reverse_mappings(mapping)

Reverse the mapping

rule_to_regex(rule)

Turns an input string (and the context) from an input/output pair into a regular expression pattern"

The 'in' key is the match. The 'context_after' key creates a lookahead. The 'context_before' key creates a lookbehind.

Parameters:

Name Type Description Default
rule dict

A dictionary containing 'in', 'out', 'context_before', and 'context_after' keys

required

Raises:

Type Description
Exception

This is raised when un-supported regex characters or symbols exist in the rule

Returns:

Name Type Description
Pattern Union[Pattern, None]

returns a regex pattern (re.Pattern)

None Union[Pattern, None]

if input is null

wants_rules_sorted()

Returns whether the rules will be sorted prior to finalizing the mapping.

Returns:

Name Type Description
bool bool

True if the rules should be sorted.

A Transducer object is initialized with a Mapping object and when called, applies each rule of the Mapping in sequence on the input to produce the resulting output.

g2p.transducer.Transducer

This is the fundamental class for performing conversions in the g2p library.

Each Transducer must be initialized with a Mapping object. The Transducer object can then be called to apply the rules from Mapping on a given input.

Attributes:

Name Type Description
mapping Mapping

Formatted input/output pairs using the g2p.mappings.Mapping class.

in_lang: str property

Input language node name

out_lang: str property

Output language node name

transducers: List[Transducer] property

Sequence of underlying Transducer objects

__call__(to_convert, index=False, debugger=False)

The basic method to transduce an input. A proxy for self.apply_rules.

Parameters:

Name Type Description Default
to_convert str

The string to convert.

required

Returns:

Name Type Description
TransductionGraph

Returns an object with all the nodes representing input

and output characters and their corresponding edges representing the indices

of the transformation.

change_character(tg, character, index_to_change)

Change character at index_to_change in TransductionGraph output to character

Parameters:

Name Type Description Default
tg TransductionGraph

the current Transduction Graph

required
character str

the character to change to

required
index_to_change int

index of character to change

required
delete_character(tg, index_to_delete, ahh)

Delete character at index_to_delete in TransductionGraph output

Parameters:

Name Type Description Default
tg TransductionGraph

the current Transduction Graph

required
index_to_delete int

index of character to delete

required
ahh int

current value of i in calling loop

required
get_longest_and_shortest(in_string_or_matches, out_string_or_matches)

Given two strings or match lists determine the longest and shortest. If the input is longer than the output, the process is to delete, if the output is longer than the input, the process is to insert. If the input and output are the same length, the process is basic.

Parameters:

Name Type Description Default
in_string_or_matches str | List

input string

required
out_string_or_matches str | List

output string

required
get_match_groups(tg, start_end, io, diff_from_input, out_string, output_start)

Take the inputs to explicit indices matching and create groups of Input and Output matches that are grouped by their explicit indices.

For example, applying a rule that is defined: a{1}b{2} → b{2}a{1} on the input "ab"
will return inputs, outputs where:

inputs = {'1': [{'index': 0, 'string': 'a'}], '2': [{'index': 1, 'string': 'b'}] }
outputs = {'1': [{'index': 0, 'string': 'b'}], '2': [{'index': 1, 'string': 'a'}] }

This allows input match groups to be iterated through in sequence regardless of their
character sequence.

Parameters:

Name Type Description Default
tg TransductionGraph

the graph holding information about the transduction

required
start_end Tuple(int, int

a tuple containing the start and end of the input match

required
io List

an input/output rule

required
diff_from_input DefaultDict

A dictionary containing the single character distance from a given character index to its input

required
out_string str

the raw output string

required
output_start int

the diff-offset start of the match with respect to the output

required

Returns:

Name Type Description
inputs dict

dictionary containing matches grouped by explicit index match

outputs dict

dictionary containing matches grouped by explicit index match

insert_character(tg, character_to_insert, index_to_insert_character)

Insert character at index_to_insert_character in TransductionGraph output

Parameters:

Name Type Description Default
tg TransductionGraph

the current Transduction Graph

required
character_to_insert str

the character to insert

required
index_to_insert_character int

index of character to insert

required
resolve_intermediate_chars(output_string)

Go through all chars and resolve any intermediate characters from the Private Supplementary Use Area to their mapped equivalents.

update_explicit_indices(tg, match, start_end, io, diff_from_input, diff_from_output, out_string)

Takes an arbitrary number of input & output strings and their corresponding index offsets. It then zips them up according to the provided indexing notation.

Example

A rule that turns a sequence of k\u0313 to 'k might would have a default indexing of k -> ' and \u0313 -> k It might be desired though to show that k -> k and \u0313 -> ' and their indices were transposed. For this, the Mapping could be given the following: [{'in': 'k{1}\u0313{2}', 'out': "'{2}k{1}"}] Indices are found with r'(?<={)\d+(?=})' and characters are found with r'[^0-9{}]+(?={\d+})'

Comments