Javatpoint Logo
Javatpoint Logo

mbrtoc32() in C/C++

In this article, you will learn about the mbrtoc32() function in C++ with its syntax, parameters, and examples.

A multibyte character sequence in C/C++ can be converted to a wide character (more precisely, a 32-bit wide character represented by char32_t) using the mbrtoc32() function in the standard library. This function is especially helpful when working with character encodings like UTF-8, which need several bytes to represent a single character.

Syntax:

It has the following syntax:

Parameters

  • pc32: A pointer to the char32_t variable's destination, where the outcome will be kept.
  • s: A pointer to the character sequence to be translated that consists of several bytes.
  • n: The number of bytes that must be considered for conversion in the multibyte character sequence.
  • ps: A reference to the conversion state of mbstate_t. This state stores information when converting multibyte sequences that span numerous calls to mbrtoc32().

Return Value

  • '0' indicates the conclusion of the multibyte sequence if 's' points to a null byte.
  • If 's' points to an invalid multibyte sequence, it returns static_cast<size_t>(-1) (a constant representing an error).
  • Otherwise, it returns the number of bytes consumed from the input multibyte sequence.

Key Points of mbrtoc32():

There are several key points of the mbrtoc32() in C++. Some main points of the mbrtoc32() are as follows:

1. Locale Dependency

The locale setting at the moment affects how mbrtoc32() behaves. The character encodings used in various locales can vary, impacting the conversion function.

2. Error Handling

The function is designed to handle invalid multibyte sequences. If an invalid sequence is encountered, it returns an error code (static_cast<size_t>(-1)), signalling an error in the conversion.

3. Stateful Conversion

Stateful conversion can be achieved with the mbstate_t argument. The state is updated when the function calls and can be utilized again.

4. UTF-8 and Unicode Support

UTF-8 multibyte sequences can be translated into matching Unicode code points represented by char32_t using the mbrtoc32() function when encoding in UTF-8. It makes working with many different characters, including those that fall outside of the fundamental multilingual plane, easier.

5. Multibyte Character Handling

Although a single character may be represented by several bytes, this function helps to handle multibyte characters. It guarantees accurate conversion together with information about the amount of bytes used.

Example:

Let us take an example to demonstrate the mbrtoc32() function in C++:

Output:

The String is: 
The Length is: 0
32-bit character = 0g00

Explanation:

1. Header Files

The required header files, including <cstdio>, <cstdlib>, <iostream>, <uchar.h>, and <wchar.h>, are included in the code.

2. Namespace

Standard C++ identifiers can be used without the std::prefix by bringing the full std namespace into scope with the using namespace std; statement.

3. Variable Declarations

  • char32_t hold;: Declare a variable hold of type char32_t to store the transformed wide character.
  • char str[] = "";: It declares a character array str with an empty string as its initial value.
  • mbstate_t arr{};: It initializes the mbstate_t variable arr to the default state {} after declaring it.

4. Function Call - mbrtoc32

  • len = mbrtoc32(&hold, str, MB_CUR_MAX, &arr);: It calls the mbrtoc32() function to convert the multibyte sequence str to a char32_t wide character. MB_CUR_MAX is used to specify the maximum number of bytes in a multibyte character in the current locale. The result is stored in the hold, and the number of bytes consumed is stored in len.

5. Error checking

It verifies if the conversion was unsuccessful (len < 0). If so, an error code is output and an error message using perror before it departs.

6. Output

Uses std::cout function to output the original string str and its length.

The 'printf' is used to print the 32-bit character in octal format.

7. Statement of Return

return 0;: It denotes that the program has been successfully executed.

The output will indicate that the character's length is 0 because the supplied str is an empty string. The 32-bit character that is converted will be determined by the state information. Depending on your use case, you may supply a non-empty multibyte sequence to watch the conversion.







Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA