mbrlen() function in C/C++

In this article, you will learn about the mbrlen() function in C++ with its syntax, parameters, and examples.

The mbrlen() function, used for multibyte character processing, is a component of the <uchar.h> (C) or <cuchar> (C++) header in the programming languages C and C++. This Function aims to ascertain how many bytes constitute the next multibyte character in a sequence of multibyte characters.

Purpose:

The main purpose of the mbrlen() method is to find the number of bytes required to finish the next multibyte character in a given multibyte character string. It facilitates the processing and parsing of multibyte character sequences.

Syntax:

It has the following syntax:

Parameters

s: The multibyte character sequence's pointer.
n: the maximum number of bytes that can be examined.
ps: A pointer to the conversion state tracking object of type mbstate_t.

Return Value

The Function returns the number of bytes that make up the next multibyte character in the sequence if it is valid.
The Function returns 0 in the event that an error occurs or the multibyte character sequence ends.
The Function returns static_cast<size_t>(-2) if the next n bytes do not form a full multibyte character.

Multibyte Character Encoding

Multibyte character encodings like UTF-8, UTF-16, or UTF-32 frequently represent characters in internationalization. These encodings represent a wide range of characters from different languages and scripts. Each character may span multiple bytes, and decoding such symbols requires special handling.

Character encodings are used to represent characters in computer systems. A mapping from characters to their binary representations is called character encoding. A single byte is used for each character in some character sets, such as ASCII. A single byte is insufficient for languages like Chinese, Japanese, or Cyrillic, which have a tremendous character set. Such characters are represented utilizing many bytes through multibyte character encodings.

Use Cases

Processing strings: The mbrlen() function is used to find the length of each multibyte character while processing multibyte character strings.

Internationalization and Localization: The mbrlen() function ensures proper handling of multibyte characters in programs that must handle multiple languages and character sets.

Example:

Let us take an example to illustrate the use of mbrlen() function in C++:

#include <bits/stdc++.h> 
using namespace std; 
//Function to find the size of the multibyte character 
void check_(const char* str, size_t num) 
{ 
	// Multibyte conversion state 
	mbstate_t ps = mbstate_t(); 
	// number of bytes to be saved in returnV 
	int return_V = mbrlen(str, num, &ps); 
	if (return_V == -2) 
		cout << "Next " << num << " byte(s) doesn't"
			<< " represent a complete"
			<< " multibyte character" << endl; 

	else if (return_V == -1) 
		cout << "Next " << num << " byte(s) doesn't "
			<< "represent a valid multibyte character" << endl; 
	else
		cout << "Next " << num << " byte(s) of "
			<< str << "holds " << return_V << " byte"
			<< " multibyte character" << endl; 
} 
int main() 
{ 
	setlocale(LC_ALL, "en_US.utf8"); 
	char str[] = ""; 
	// test for first 1 byte 
	check_(str, 1); 
	// test for first 3 byte 
	check_(str, 3); 
	return 0; 
} 

Output:

Next, 1 byte(s) holds 0 byte multibyte character
Next 3 byte(s) holds 0 byte multibyte character

Explanation:

1. Headers and Namespace

The code uses the std namespace and provides the required headers, such as <bits/stdc++.h>.

2. check_ Function

A multibyte character string (str) and the number of bytes to be examined (num) are required for this Function to work.
A multibyte conversion state (mbstate_t object ps) is initialized.
The size of the multibyte character is calculated starting from the specified point in the string using the mbrlen() function.

The Function then checks mbrlen()'s return value:

The next num bytes do not form a complete multibyte character if the return value is -2.
The following num bytes do not represent a valid multibyte character if the return value is -1.
Without such, it outputs the multibyte character's size in bytes.

3. main function

'setlocale()' sets the locale to "en_US.utf8". A character array str that is empty is defined.

The 'check_ function' is called twice:

Testing the first byte's size with 'num = 1;' comes first.
Next, using 'num = 3', determine the size of the initial three bytes.

4. Output Explanation

Given that str is an empty string (""), the result will show that the following num bytes do not constitute a valid or complete multibyte character.

Next Topicmbrtoc32() in C/C++

← prev next →