In this tutorial, we will learn about the TOML which is Tom's Obvious Minimal Language. It is a reasonably new configuration file format that is widely used by the Python community. We will discuss the syntax of TOML, use tomli and tomllib to parse TOML documents and use tomli_w to write data structure as TOML.
Use TOML as a Configuration Format
TOML, which stands for Tom's Obvious Minimal Language, is a configuration file format named after its creator, Tom Preston-Werner. Its design objective was to create a format that is effortlessly parseable into data structures across a broad range of programming languages.
Configurations and Configuration Files
Configuration plays a crucial role in virtually every application or system, enabling the flexibility to modify settings and behavior without altering the source code. Configurations are employed in various scenarios, such as specifying essential details for connecting to external services like databases or cloud storage. Additionally, configuration settings facilitate the customization of user experiences within a project, empowering users to tailor the application according to their preferences.
Utilizing a configuration file in your project offers several benefits, including the separation of code from its settings. It prompts you to carefully identify which aspects of your system should be genuinely configurable, providing a means to assign meaningful names to values that would otherwise be considered "magic" within your source code. Let's consider the usage of a configuration file in a hypothetical tic-tac-toe game for now.
While the standard tic-tac-toe game is traditionally played on a three-by-three grid, the configurability of the board size is uncertain. The existing logic may not be compatible with different board sizes. However, it could still be beneficial to include the board size value in the configuration file. This approach serves two purposes: providing a meaningful name to the value and making it visible, even if it remains unchanged for the standard game.
The project URL holds significant importance when deploying an application. While it may not be subject to change for the average user, power users might require the ability to redeploy the game on a different server.
To provide clarity and accommodate diverse use cases, it can be beneficial to organize the configuration file accordingly. One popular approach is to separate the configuration into multiple files, each addressing a distinct concern. Alternatively, you can group related configuration values for better organization. Here is a possible reorganization of your hypothetical configuration file:
There are various methods exist for specifying configurations, depending on the operating system and specific requirements. For instance, Windows has traditionally employed INI files, which bear resemblance to the configuration file example provided earlier. On Unix systems, plain-text configuration files are commonly used, although the specific format may vary across different services. Regardless of the format, the emphasis is typically placed on maintaining human readability and simplicity for ease of understanding and modification.
As time has progressed, an increasing number of applications have adopted well-defined formats such as XML, JSON, or YAML for their configuration requirements. These formats were originally developed as data interchange or serialization formats, primarily intended for computer communication purposes. However, due to their structured nature and widespread support across programming languages, they have found utility in representing configuration settings as well. Their adoption simplifies parsing, manipulation, and interoperability between different systems and programming languages, making them popular choices for configuration file formats.
TOML - Tom's Obvious Minimal Language
TOML is a relatively recent format, with its initial specification (version 0.1.0) being released in 2013. From its inception, TOML has prioritized being a minimal and human-readable configuration file format. The goals outlined on the TOML website are as follows:
"TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics. TOML is designed to map unambiguously to a hash table. TOML should be easy to parse into data structures in a wide variety of languages. (Source, emphasis added)"
Let's define the above file in the TOML.
TOML draws inspiration from traditional configuration files in its syntax. However, its significant advantage over Windows INI files and UNIX configuration files lies in its well-defined specification. Unlike its counterparts, TOML has a comprehensive specification that precisely outlines the allowed elements in a TOML document and provides clear guidelines on how different values should be interpreted. This specification has evolved and reached stability and maturity, culminating in version 1.0.0 in early 2021.
In contrast to TOML, the INI format lacks a formal specification. Instead, it comprises numerous variants and dialects, typically defined by specific implementations. Python, for instance, includes support for reading INI files in its standard library through ConfigParser. Although ConfigParser is fairly flexible, it may not support all variations of INI files.
Another notable distinction between TOML and many traditional formats is that TOML assigns types to its values. In the provided example, "blue" is interpreted as a string, while 3 is considered a number. One potential criticism of TOML is that human authors need to be conscious of these types when writing TOML documents. In simpler formats, such responsibility is typically handled by the programmer who parses the configuration file
TOML Schema Validation
At present, TOML does not incorporate a schema language that allows for specifying required and optional fields within a TOML document. Various proposals for adding this capability have been put forward, but it remains uncertain if any of them will be accepted and implemented in the near future.
When dealing with more complex TOML documents, the approach of relying solely on the TOML format may not scale effectively. Additionally, if the goal is to provide informative error messages, it requires additional effort. A superior alternative is to leverage a library like pydantic, which utilizes type annotations to perform data validation during runtime. By using pydantic, one can benefit from its precise and helpful built-in error messages. This approach not only simplifies data validation but also enhances the scalability and usability of handling TOML configurations.
More about TOML: Key-Value Pair
TOML store the value in the key-value format. TOML values have different types. A value must be one of the following types.
Note - TOML embraces comments using the same syntax as Python. By utilizing the hash symbol (#), you can designate the remainder of a line as a comment. Introducing comments in your configuration files serves the purpose of enhancing clarity and facilitating comprehension for both yourself and your users.
The key-value pairs are basic building blocks in a TOML document. We mention them with key=value syntax where key is separated from the value with an equal sign. Following is an example of TOML document with one key-value pair.
In TOML, keys are consistently interpreted as strings, irrespective of whether quotation marks surround them or not. Let's consider the following example to illustrate this point.
In TOML, a key like "12" is considered valid, but it will be interpreted as a string rather than a number. Typically, it is recommended to use bare keys whenever possible. Bare keys consist of ASCII letters, numbers, underscores, and dashes. These keys can be written without quotation marks, as demonstrated in the examples provided earlier. By using bare keys, you can enhance the readability and simplicity of your TOML configuration files.
TOML documents must be encoded in UTF-8 Unicode, offering considerable flexibility in representing values. While there are restrictions on bare keys, you can still utilize Unicode characters when defining your keys. However, using Unicode keys necessitates enclosing them in quotation marks. It's important to note that employing Unicode keys comes with an added requirement of using quotation marks to delimit them.
In TOML, dots (.) have a specific significance when used in keys. While you can include dots in unquoted keys, doing so will result in the grouping of values. The dotted key will be split at each dot, creating nested groupings within the data structure. Let's consider the following example to illustrate this behavior.
In the given example, there are two keys that include dots. As both keys start with "match_x", they will be grouped together under a section named "match_x". The keys "symbol" and "color" will be nested within this section, allowing for a structured organization of related data.
String, Numbers, and Boolean
TOML uses a similar data type as Python. The only difference is Boolean values are lowercase: true and false. In TOML, it is common to use double quotation marks (") for defining strings. Within strings, you can utilize backslashes to escape special characters. For instance, "\u03c0 is less than four" includes the escape sequence \u03c0, which represents the Unicode character with codepoint U+03c0, corresponding to the Greek letter π. When interpreted, this string will be rendered as "π is less than four".
In TOML, an alternative way to specify strings is by using single quotation marks ('). These single-quoted strings are known as literal strings and exhibit behavior similar to raw strings in Python. In a literal string, no characters are escaped or interpreted, meaning that '\u03c0 is the Unicode codepoint of π' will treat the initial characters '\u03c0' as literals, rather than interpreting them as a Unicode escape sequence.
In TOML, strings can also be specified using triple quotation marks (""" or '''). Triple-quoted strings provide the capability to write strings spanning multiple lines, resembling multiline strings in Python. This syntax allows for a more convenient representation of lengthy or multiline text within a TOML document.
In basic strings of TOML, control characters, including literal newlines, are not permitted. However, you can use the escape sequence \n to represent a newline within a basic string. If you require formatting your strings over multiple lines, you must use either a multiline string or triple-quoted literal strings. The latter, in addition to being multiline, is the only way to include a single quotation mark inside a literal string. For instance, you can represent π as '''Use '\u03c0' to represent π'''.
In TOML, numbers can be either integers or floating-point numbers. Integers represent whole numbers and can be specified using plain numeric characters. Similar to Python, you have the option to use underscores to improve readability when representing larger numbers.
Floating-point numbers in TOML are used to represent decimal numbers and consist of an integer part, a decimal point, and a fractional part. They can also employ scientific notation to represent extremely small or large numbers. Additionally, TOML supports special float values such as infinity and not a number (NaN), allowing for more comprehensive representation of numerical data.
In TOML, non-negative integer values have the flexibility to be represented in hexadecimal, octal, or binary formats by utilizing specific prefixes. A value starting with "0x" represents a hexadecimal value, such as 0xffff00. Similarly, a value starting with "0o" denotes an octal representation, and a value starting with "0b" indicates a binary representation, as in 0b00101010. This feature allows for versatile ways of expressing integer values in different number systems within a TOML document.
Compare TOML Types and Python Types
The mapping between TOML's data types and Python's data type for basic libraries like tomli and tomllib is quite natural. Following is the conversion table of tomllib.
Python offers various data types, some of which are built-in, while others are part of the datetime module in the standard library.
Using only standard types in Python has its limitations. When representing a TOML document, relying solely on standard types restricts the ability to capture additional information such as comments or indentation. Moreover, the Python representation may not distinguish between values defined within a regular table and those within an inline table.
In many scenarios, the meta-information may be irrelevant, and its absence doesn't cause any loss. However, there are cases where preserving this information becomes crucial. For instance, when inserting a table into an existing TOML file, it's essential to retain all the comments instead of having them disappear.
The load() and loads() method provides the one parameter that helps to customize the TOML parsing. The parameter is passed to the parse_float to specify how the floating-point number should be parsed. The default implementation in Python satisfies the requirement of utilizing 64-bit floats, which generally provide precision for approximately 16 significant digits.
Create New TOML Document
As we have discussed above, we can read and write the TOML document using tomli and tomli_w. There is some limitation in tomli_n when it comes to formatting in the result TOML files. We will explore how to format TOML document to make them easier to use for users. We can also use the tomlkit that allows to play with the TOML documents.
Format and Style TOML Documents
Generally, whitespaces are ignored in TOML files. This facility allows to manage configuration files well organized, readable, and intuitive. Additionally, a hash symbol (#) marks the rest of the line as a comment.
While there isn't a strict style guide for TOML (Tom's Obvious, Minimal Language) documents like PEP 8 is for Python code, the TOML specification does provide some recommendations while allowing flexibility for personal preferences. The specification offers guidance on various aspects of TOML formatting, but it doesn't enforce a rigid set of rules for every aspect of document styling. This allows users to choose their own preferred formatting conventions within the boundaries defined by the specification.
TOML provides a high degree of flexibility in certain features. For instance, you have the freedom to define tables in any order you prefer. Even if a sub-table appears before its parent table, it is still valid because table names are fully qualified. Additionally, TOML disregards whitespace around keys, meaning that the headers [nested.table] and [ nested . table] are interpreted as starting the same nested table. This flexibility allows for more convenient organization and formatting of TOML documents.
The purpose of the TOML kit was to work with Poetry object. Poetry manipulates the pyproject.toml file as part of its dependency management. Poetry must maintain the style and comments within the file.
To build the TOML document from scratch with tomlkit, we need to install tomlkit in a virtual environment.
Let's see the following example.
In TOML, you can utilize functions like loads() and dumps() as well as load() and dump() to read from and write to TOML files, respectively, just as before. However, in the current implementation, these functions preserve not only the data itself but also the string types, indentations, comments, and alignments present in the TOML document. This ensures that the structure and formatting of the original TOML file are maintained when reading and writing TOML data. Now, we will create the document from scratch.
To create a TOML document in the general case, you typically begin by invoking the document() function to create a TOML document instance. This instance serves as the container for your TOML data. Subsequently, you can use the .add() method to include various objects within this document. These objects can include comments, newlines, key-value pairs, and tables, among others. By using .add(), you can progressively build and populate your TOML document with the desired data elements.
This tutorial included the TOML format and the ways that you can use it in Python. We have also explored some of the TOML feature that makes TOML more useful, a flexible and convenient format for configuration files.