Skip to content

Migrating from g2p 1.x

The g2p 2.0 release introduces a number of improvements and changes which, unfortunately, are incompatible with mappings and Python code written for the previous version. We have tried to describe them here with the changes you will need to make to your code and data.

Mapping configurations have changed (for the better)

The configurations for mappings (which you'll find in g2p/mappings/langs/*/config-g2p.yaml) are now validated with a YAML Schema. If you use an editor like Visual Studio Code with the YAML extension, the names of fields will be autocompleted and some warnings will be shown for possible values. This also works with GNU Emacs using Eglot or lsp-mode and any other editor that supports the Language Server Protocol and/or SchemaStore. Some varieties of VIM are known to work, for instance.

In order for this magic to work, we needed to give the configuration files a somewhat more meaningful name than config.yaml, so they must now be called config-g2p.yaml. In addition some fields have changed names to reflect the fact that they refer to files and not the actual rules themselves:

  • mapping is now rules_path
  • abbreviations is now abbreviations_path

The mappings themselves should be compatible with the previous version, please let us know if you encounter any problems.

Submodules of g2p must be imported explicitly

Previously, when you called import g2p, it imported absolutely everything, which caused the command-line interface (and probably your program too) to start up very, very slowly.

If you simply use the public and documented make_g2p and make_tokenizer APIs, this will not change anything, but if you relied on internal classes and functions from g2p.mappings, g2p.transducer, etc, then you can no longer depend on them being also accessible in the top-level g2p package. For example, you will need to make this sort of change:

- from g2p import Mapping, Transducer
+ from g2p.mappings import Mapping
+ from g2p.transducer import Transducer

NOTE These are not public APIs, and are subject to further changes. This guide is provided as a courtesy to anyone who may have been using them and should not be construed as public API documentation.

Mappings and rules use properties to access their fields

Along the same lines, access to the internal structure of rule-based mappings has changed considerably (and for the better) due to the use of Pydantic. This means, however, that you can no longer treat them as the simple dictionaries that they used to be, since they are no longer that. Instead, use properties, which correspond to the names used in config-g2p.yaml.

For example, you can access the case_sensitive flag using the property of the same name (note also that you can no longer construct a Mapping by simply passing the name of the file):

mapping = Mapping.load_from_file("path/to/some/config-g2p.yaml")
print("Case sensitive?", mapping.case_sensitive)

To iterate over the rules in a mapping, you now use the rules property instead of the mapping_data field. The rules themselves now also use properties for access, which do not entirely correspond to the names used in the JSON/YAML definition, because in, for example, is a reserved word in Python. So for instance you would make this change:

- for rule in mapping["mapping_data"]:
-     print("Rule maps", rule["in"], "to", rule["out"])
+ for rule in mapping.rules:
+     print("Rule maps", rule.rule_input, "to", rule.rule_output)

NOTE: These are not public APIs, and are subject to further changes. This guide is provided as a courtesy to anyone who may have been using them and should not be construed as public API documentation.

Some CLI commands no longer exist

Several commands for the g2p command-line have been removed:

  • run
  • routes
  • shell

To run the g2p API server and G2P Studio, you can use:

python run_studio.py

It does not seem that any equivalents of routes or shell exist.

Comments