Python package¶
The easiest way to create a transducer programmatically is to use the g2p.make_g2p
function.
To use it, first import the function:
from g2p import make_g2p
Then, call it with an argument for in_lang
and out_lang
. Both must be strings equal to the name of a particular mapping.
>>> transducer = make_g2p('dan', 'eng-arpabet')
>>> transducer('hej').output_string
'HH EH Y'
There must be a valid path between the in_lang
and out_lang
in order for this to work. If you've edited a mapping or added a custom mapping, you must update g2p to include it: g2p update
A look under the hood¶
A Mapping object is a list of defined rules.
g2p.mappings.Mapping
¶
Class for lookup tables
@param as_is: bool = True
Affects whether or not rules are sorted or left as is.
Please use rule_ordering
instead.
If True, Evaluate g2p rules in mapping in the order they are written.
If False, rules will be reverse sorted by length.
.. deprecated:: 0.6
use ``rule_ordering`` instead
@param case_sensitive: bool = True Lower all rules and conversion input
@param escape_special: bool = False Escape special characters in rules
@param norm_form: str = "NFD" Normalization standard to follow. NFC | NKFC | NFD | NKFD | none
@param out_delimiter: str = "" Separate output transformations with a delimiter
@param reverse: bool = False Reverse all mappings
@param rule_ordering: str = "as-written" Affects in what order the rules are applied.
If set to ``"as-written"``, rules are applied from top-to-bottom in the order that they
are written in the source file
(previously this was accomplished with ``as_is=True``).
If set to ``"apply-longest-first"``, rules are first sorted such that rules with the longest
input are applied first. Sorting the rules like this prevents shorter rules
from taking part in feeding relations
(previously this was accomplished with ``as_is=False``).
@param prevent_feeding: bool = False Converts each rule into an intermediary form
add_abbreviations(abbs, mappings)
¶
Return abbreviated forms, given a list of abbreviations.
{'in': 'a', 'out': 'b', 'context_before': 'V', 'context_after': '' } {'abbreviation': 'V', 'stands_for': ['a','b','c']} -> {'in': 'a', 'out': 'b', 'context_before': 'a|b|c', 'context_after': ''}
config_to_file(output_path=os.path.join(GEN_DIR, 'config.yaml'), mapping_type='json')
¶
Write config to file
deduplicate()
¶
Remove duplicate rules found in self, keeping the first copy found.
extend(mapping)
¶
Add all the rules from mapping into self, effectively merging two mappings
Caveat: if self and mapping have contradictory rules, which one will "win" is unspecified, and may depend on mapping configuration options.
find_mapping_by_id(map_id)
staticmethod
¶
Find the mapping with a given ID
inventory(in_or_out='in')
¶
Return just inputs or outputs as inventory of mapping
mapping_to_file(output_path=GEN_DIR, file_type='json')
¶
Write mapping to file
mapping_to_stream(out_stream, file_type='json')
¶
Write mapping to a stream
plain_mapping(skip_empty_contexts=False)
¶
Return the plain mapping for displaying or saving to disk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
skip_empty_contexts |
bool
|
when set, filter out empty context_before/after |
False
|
process_kwargs(mapping)
¶
Apply kwargs in the order they are provided. kwargs are ordered as of python 3.6
process_loaded_config(config)
¶
For a mapping loaded from a file, take the keyword arguments and supply them to the Mapping, and get any abbreviations data.
reverse_mappings(mapping)
¶
Reverse the mapping
rule_to_regex(rule)
¶
Turns an input string (and the context) from an input/output pair into a regular expression pattern"
The 'in' key is the match. The 'context_after' key creates a lookahead. The 'context_before' key creates a lookbehind.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
rule |
dict
|
A dictionary containing 'in', 'out', 'context_before', and 'context_after' keys |
required |
Raises:
Type | Description |
---|---|
Exception
|
This is raised when un-supported regex characters or symbols exist in the rule |
Returns:
Name | Type | Description |
---|---|---|
Pattern |
Union[Pattern, None]
|
returns a regex pattern (re.Pattern) |
None |
Union[Pattern, None]
|
if input is null |
wants_rules_sorted()
¶
Returns whether the rules will be sorted prior to finalizing the mapping.
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the rules should be sorted. |
A Transducer object is initialized with a Mapping object and when called, applies each rule of the Mapping in sequence on the input to produce the resulting output.
g2p.transducer.Transducer
¶
This is the fundamental class for performing conversions in the g2p library.
Each Transducer must be initialized with a Mapping object. The Transducer object can then be called to apply the rules from Mapping on a given input.
Attributes:
Name | Type | Description |
---|---|---|
mapping |
Mapping
|
Formatted input/output pairs using the g2p.mappings.Mapping class. |
__call__(to_convert, index=False, debugger=False)
¶
The basic method to transduce an input. A proxy for self.apply_rules.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
to_convert |
str
|
The string to convert. |
required |
Returns:
Name | Type | Description |
---|---|---|
TransductionGraph |
Returns an object with all the nodes representing input and output characters and their corresponding edges representing the indices of the transformation. |
change_character(tg, character, index_to_change)
¶
Change character at index_to_change
in TransductionGraph output to character
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tg |
TransductionGraph
|
the current Transduction Graph |
required |
character |
str
|
the character to change to |
required |
index_to_change |
int
|
index of character to change |
required |
delete_character(tg, index_to_delete, ahh)
¶
Delete character at index_to_delete
in TransductionGraph output
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tg |
TransductionGraph
|
the current Transduction Graph |
required |
index_to_delete |
int
|
index of character to delete |
required |
ahh |
int
|
current value of i in calling loop |
required |
get_longest_and_shortest(in_string_or_matches, out_string_or_matches)
¶
Given two strings or match lists determine the longest and shortest. If the input is longer than the output, the process is to delete, if the output is longer than the input, the process is to insert. If the input and output are the same length, the process is basic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
in_string_or_matches |
str | List
|
input string |
required |
out_string_or_matches |
str | List
|
output string |
required |
get_match_groups(tg, start_end, io, diff_from_input, out_string, output_start)
¶
Take the inputs to explicit indices matching and create groups of Input and Output matches that are grouped by their explicit indices.
For example, applying a rule that is defined: a{1}b{2} → b{2}a{1} on the input "ab"
will return inputs, outputs where:
inputs = {'1': [{'index': 0, 'string': 'a'}], '2': [{'index': 1, 'string': 'b'}] }
outputs = {'1': [{'index': 0, 'string': 'b'}], '2': [{'index': 1, 'string': 'a'}] }
This allows input match groups to be iterated through in sequence regardless of their character sequence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tg |
TransductionGraph
|
the graph holding information about the transduction |
required |
start_end |
Tuple(int, int
|
a tuple contianing the start and end of the input match |
required |
io |
List
|
an input/output rule |
required |
diff_from_input |
DefaultDict
|
A dictionary containing the single character distance from a given character index to its input |
required |
out_string |
str
|
the raw output string |
required |
output_start |
int
|
the diff-offset start of the match with respect to the output |
required |
Returns:
Name | Type | Description |
---|---|---|
inputs |
dict
|
dictionary containing matches grouped by explicit index match |
outputs |
dict
|
dictionary containing matches grouped by explicit index match |
insert_character(tg, character_to_insert, index_to_insert_character)
¶
Insert character at index_to_insert_character
in TransductionGraph output
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tg |
TransductionGraph
|
the current Transduction Graph |
required |
character_to_insert |
str
|
the character to insert |
required |
index_to_insert_character |
int
|
index of character to insert |
required |
resolve_intermediate_chars(output_string)
¶
Go through all chars and resolve any intermediate characters from the Private Supplementary Use Area to their mapped equivalents.
update_explicit_indices(tg, match, start_end, io, diff_from_input, diff_from_output, out_string)
¶
Takes an arbitrary number of input & output strings and their corresponding index offsets. It then zips them up according to the provided indexing notation.
Example
A rule that turns a sequence of k̓ to 'k might would have a default indexing of k -> ' and ̓ -> k It might be desired though to show that k -> k and ̓ -> ' and their indices were transposed. For this, the Mapping could be given the following: [{'in': 'k{1}̓{2}', 'out': "'{2}k{1}"}] Indices are found with r'(?<={)\d+(?=})' and characters are found with r'[^0-9{}]+(?={\d+})'