mbsrtowcs() Function in C/C++

In this article, you will learn about the mbsrtowcs() function in C++ with its example.

In C/C++, the mbsrtowcs() function is an effective tool for managing character conversions within strings. It is an essential component of the Standard C Library that helps developers work with various character encodings, internationalize and localize software applications, and convert multibyte strings to wide character strings.

Function and Purpose:

The main goal of the mbsrtowcs() function is to transform a string of multibyte characters into a string of wide characters. Its ability to convert strings encoded in multibyte encodings (like UTF-8) into wide character strings (often encoded in UTF-16 or UTF-32) makes it indispensable for applications that must handle a variety of character sets. It is particularly crucial in situations where several systems or platforms may employ various character encodings.

Syntax:

It has the following syntax:

Parameters: The following four parameters are required for the function to function:

dest: It indicates the pointer to the array containing the wide character that has been translated and stored.

Ps: The pointer to the conversion state object is specified by this parameter.

Src: The pointer to the first multibyte character to be converted is specified by this parameter.

Len: The maximum number of wide characters to store is indicated by this parameter.

Return Value: The function yields the following two values:

When the function is successful, the mbsrtowcs() returns the total amount of wide characters written to dest, omitting the wide null character at the end.
The return value of the mbsrtowcs function, when the destination (dest) pointer is null, represents the count of wide characters that would have been converted and written if the length of the output were infinite.
Errno is set to EILSEQ and -1 is returned in the event of a conversion error.

Program:

Let's take an example to illustrate the use of the mbsrtowcs() function in C++.

#include <stdio.h>
#include <wchar.h>
#include <locale.h>
int main() {
 setlocale(LC_ALL, ""); // Set the locale according to the system
 const char *mbstr = "Hello, こんにちは"; // Multibyte string (UTF-8)
 wchar_t wcstr[20]; // Destination wide-character string
 // Convert multibyte to wide-character string
 size_t result = mbsrtowcs(wcstr, &mbstr, 20, NULL);
 if (result != (size_t)-1) {
 wprintf(L"Wide character string: %ls\n", wcstr);
 wprintf(L"Number of wide characters written: %zu\n", result);
 } else {
 perror("mbsrtowcs");
 }
 return 0;
}

Output:

Explanation:

The given C code illustrates how to use the mbsrtowcs() function to change a multibyte string that has been encoded with UTF-8 into a wide-character string. First, setlocale(LC_ALL, "") is used to set the program's locale by the system.
Next, initialization is performed on a source multibyte string mbstr that contains the text "Hello, こんにちは". With a buffer size of 20, the target wide-character string, wcstr, is produced.
The conversion is carried out by the mbsrtowcs() function, which accepts as inputs the source multibyte string, the destination wide-character string, a null pointer, and up to 20 wide characters. The number of wide characters sent to wcstr is stored in the return value, result.
A conditional statement makes sure the outcome isn't -1 to determine whether the conversion was successful. If it is successful, the number of wide characters written is printed after the converted wide-character string, wcstr, is printed using wprintf() and the %ls format specifier for wide strings. It uses perror("mbsrtowcs") to generate an error message if a conversion goes wrong.
Overall, this sample of code demonstrates how to handle any issues that may occur while converting multibyte strings to wide-character strings in C using the mbsrtowcs() function.

Program 2:

Let's take another example to illustrate the use of the mbsrtowcs() function in C++.

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
 setlocale(LC_ALL, ""); // Set the locale according to the system

 const char *mbstr = "Bonjour le monde"; // Multibyte string (ASCII)
 wchar_t wcstr[50]; // Destination wide-character string

 // Convert multibyte to wide-character string
 size_t result = mbsrtowcs(wcstr, &mbstr, 50, NULL);

 if (result != (size_t)-1) {
 wprintf(L"Wide character string: %ls\n", wcstr);
 wprintf(L"Number of wide characters written: %zu\n", result);
 } else {
 perror("mbsrtowcs");
 }

 return 0;
}

Output:

Explanation:

This program illustrates a more straightforward instance in which the text "Bonjour le monde", written in standard ASCII characters, is present in the original multibyte string mbstr. Like the previous example, it initializes a destination wide-character string (wcstr) with a buffer size of 50 elements and sets the locale based on the system.
Up to 50 wide characters can be written using the mbsrtowcs() function, which transforms the multibyte string mbstr into the wide-character string wcstr. The result variable contains the conversion's outcome.
After that, the conversion's success is confirmed by making sure the result is not -1. If it is successful, the converted wide-character string (wcstr) and the total number of wide characters typed are printed. If something goes wrong during the conversion process, perror("mbsrtowcs") is used to output an error message.

Crucial Things to Remember:

Locale Setting: The program's current locale setting may affect how mbsrtowcs() behaves. Using setlocale() function to set the locale suitably is necessary to handle various character encodings properly.

Buffer Size: Make sure there is sufficient room in the destination buffer to hold the converted wide characters.

State Handling: When working with incomplete multibyte sequences, the mbstate_t object can be used to keep track of the conversion process's current state across different function calls.

Conclusion:

In conclusion, the mbsrtowcs() function is a useful tool for C/C++ programming handling character encoding conversions. It is essential for supporting internationalization activities, enabling the development of software that must accept several character sets, and guaranteeing the correct handling of diverse encodings. Applications that deal with a variety of character representations might greatly benefit from knowing how to use it and implementing it properly into code.

Next Topicstd::array::crbegin in C++

← prev next →