Program to convert ASCII to Unicode in C++

Two popular character encoding systems used in programming are ASCII and Unicode. Whereas Unicode can represent over 100,000 characters utilizing code points ranging from 0 to 0x10FFFF, ASCII can only represent 128 characters with 7 bits. When processing or displaying characters not in the ASCII range in C++, it is sometimes helpful to translate ASCII character codes to their corresponding Unicode code points. This post will describe a basic C++ program that transforms user-inputted ASCII code into the appropriate Unicode character. We will map the ASCII values directly to Unicode code points, which works for the standard ASCII range of 0-127. The complete code example shows how this conversion can be done with just a few lines of C++, providing a building block for more robust Unicode handling in applications.

What is the ASCCI Code?

The character encoding system known as ASCII (American system Code for Information Interchange) uses seven bits to encode 128 characters. It was based on the English alphabet when it was first created in the 1960s.

The character set in ASCII encodes:

Both capital and lowercase Letters in English (A-Z, a-z).
Numbers 0 through 9.
Symbols for punctuation.
Control codes: line feed, carriage return, etc.
Special Symbols such as such as !"#$%&'()*+,-./:;<=>?@[]^_}{|}~.

The binary numbers from 0000000 to 1111111, readily expressed as decimal values between 0 and 127, correspond to each character. As an illustration:

Binary 0100001, or decimal 65, corresponds to 'A',
While binary 01000010, or decimal 66, corresponds to 'B'.

The first 32 ASCII codes (0-31 decimal) are reserved for non-printable control characters like null, tab, line feed, carriage return, etc. Codes 32-126 represent printable characters like letters, digits, and punctuation. Code 127 is reserved for the deleted character.

The ASCII standard only uses 7 bits for each character, but most modern systems use 8 bits and set the highest bit to 0. It allows ASCII to be used alongside other encodings in 8-bit environments.

What is Unicode?

In most writing systems, Unicode is a computing industry standard that assures consistent encoding, representation, processing, and text handling. Regardless of platform, Unicode assigns each character a unique number, application, or language.

Some key points about Unicode:

Unicode enables text processing, storage, and transport independently of language and platform.
Unicode standard can encode over 1 million characters. It includes characters of all major languages in the world.
Unicode uses a coding space of 21 bits to define 1,112,064 code points. Each code point represents a unique character.
The 21-bit space is divided into 17 planes, each with 65,536 (= 2^16) code points. The first plane (0000 - F) is called Basic Multilingual Plane (BMP) and contains characters for almost all modern languages.
Unicode has bidirectional text, glyphs, collation and rendering standards to facilitate internationalization.
The Unicode Consortium, a non-profit organization, maintains the Unicode standard. Major companies and organizations participate in developing Unicode standards.
Unicode is device & platform-independent. The character represented by a Unicode code point will render consistently across devices.
Unicode is backwards compatible with ASCII. The first 128 Unicode code points correspond to the ASCII characters.

What is the ASCII Table of Characters?

The ASCII table is a character encoding standard representing 128 characters using 7-bit binary numbers. ASCII is an abbreviation that stands for American Standard Code for Information Interchange.

The ASCII table includes:

Uppercase and lowercase English letters
Numeric digits
Punctuation marks
Control codes
Special characters

Each ASCII character is mapped to a decimal number between 0 and 127. It allows the characters to be encoded using binary numbers from 0000000 to 1111111.

The first 32 ASCII codes (0-31) are reserved for non-printable control function characters like null, tab, line feed, carriage return, etc.

Codes 32 to 47 represent various punctuation symbols.
Codes 48 to 57 represent the numeric digits 0 to 9.
Codes 65 to 90 are the uppercase letters A to Z.
Codes 97 to 122 are the lowercase letters a to z.

The remaining codes are used for additional symbols and control characters. Below is the full ASCII standard table showing each character mapped to its decimal and hex code value:

Decimal	Hex	Character
0	00	NUL (null)
1	01	SOH (start of heading)
2	02	STX (start of text)
3	03	ETX (end of text)
4	04	EOT (end of transmission)
5	05	ENQ (enquiry)
6	06	ACK (acknowledge)
7	07	BEL (bell)
8	08	BS (backspace)
9	09	TAB (horizontal tab)
10	0A	LF (newline)
11	0B	VT (vertical tab)
12	0C	FF (form feed)
13	0D	CR (carriage return)
14	0E	SO (shift out)
15	0F	SI (shift in)
16	10	DLE (data link escape)
17	11	DC1 (device control 1)
18	12	DC2 (device control 2)
19	13	DC3 (device control 3)
20	14	DC4 (device control 4)
21	15	NAK (negative acknowledge)
22	16	SYN (synchronous idle)
23	17	ETB (end of transmission block)
24	18	CAN (cancel)
25	19	EM (end of medium)
26	1A	SUB (substitute)
27	1B	ESC (escape)
28	1C	FS (file separator)
29	1D	GS (group separator)
30	1E	RS (record separator)
31	1F	US (unit separator)
32	20	(space)
33	21	!
34	22	"
35	23	#
36	24	$
37	25	%
38	26	&
39	27	'
40	28	(
41	29	)
42	2A	*
43	2B	+
44	2C	,
45	2D	-
46	2E	.
47	2F	/
48	30	0
49	31	1
50	32	2
51	33	3
52	34	4
53	35	5
54	36	6
55	37	7
56	38	8
57	39	9
58	3A	:
59	3B	;
60	3C	<
61	3D	=
62	3E	>
63	3F	?
64	40	@
65	41	A
66	42	B
67	43	C
68	44	D
69	45	E
70	46	F
71	47	G
72	48	H
73	49	I
74	4A	J
75	4B	K
76	4C	L
77	4D	M
78	4E	N
79	4F	O
80	50	P
81	51	Q
82	52	R
83	53	S
84	54	T
85	55	U
86	56	V
87	57	W
88	58	X
89	59	Y
90	5A	Z
91	5B	[
92	5C	\
93	5D	]
94	5E	^
95	5F	_
96	60	`
97	61	a
98	62	b
99	63	c
100	64	d
101	65	e
102	66	f
103	67	g
104	68	h
105	69	i
106	6A	j
107	6B	k
108	6C	l
109	6D	m
110	6E	n
111	6F	o
112	70	p
113	71	q
114	72	r
115	73	s
116	74	t
117	75	u
118	76	v
119	77	w
120	78	x
121	79	y
122	7A	z
123	7B	{
124	7C	\|
125	7D	}
126	7E	~
127	7F	DEL

It covers the 128-character ASCII set with control codes, printable characters, punctuation, and special symbols. The table provides the decimal and hex values representing each character in the ASCII encoding standard.

C++ Implementation

Get the decimal value of the ASCII character that needs to be converted. For example, 'A' has a decimal value of 65.
For ASCII values between 0 and 127, simply assign the ASCII decimal value directly to the Unicode code point. It works because Unicode is backwards compatible with ASCII and maintains the same values for the first 128 characters.
So for 'A' with ASCII value 65, the equivalent Unicode code point value is also 65.
To convert this to an actual Unicode character cast the code point int variable to a char or wchar_t type in C++.

For example:

int unicode = 65; 
wchar_t unicodeChar = (wchar_t)unicode; // unicodeChar contains 'A'

It copies the ASCII value to the Unicode variable, which interprets it as a Unicode code point and converts it.
For ASCII values above 127, lookup tables or switch statements would be required to map the ASCII value to the appropriate Unicode code point.
Unicode library functions like mbstowcs, or MultiByteToWideChar can also convert ASCII to Unicode.

So, in summary, for the ASCII range 0-127, simply assign/cast the ASCII decimal value as Unicode. Using mapping mechanisms for extended ASCII above 127 to get the equivalent Unicode code point. Cast the resulting integer code point to wchar_t or char to get the Unicode character.

#include <iostream>

int main() {

 std::cout << "Enter an ASCII code (0-127): ";
 int asciiCode;
 std::cin >> asciiCode;

 // Convert ASCII to Unicode
 int unicode = asciiCode;
 
 // Print equivalent Unicode character
 std::wcout << "Unicode character: " << (wchar_t)unicode << std::endl;

 return 0;
}

Output:

Enter an ASCII code (0-127): 65
Unicode character: A

Next TopicThread hardware_concurrency() Function in C++

← prev next →

For Videos Join Our Youtube Channel: Join Now

Feedback

Send your Feedback to [email protected]

Help Others, Please Share

Learn Latest Tutorials

Splunk

SPSS

Swagger

Transact-SQL

Tumblr

ReactJS

Regex

Reinforcement Learning

R Programming

RxJS

React Native

Python Design Patterns

Python Pillow

Python Turtle

Keras

Preparation

Aptitude

Reasoning

Verbal Ability

Interview Questions

Company Questions

Trending Technologies

Artificial Intelligence

AWS

Selenium

Cloud Computing

Hadoop

ReactJS

Data Science

Angular 7

Blockchain

Git

Machine Learning

DevOps

B.Tech / MCA

DBMS

Data Structures

DAA

Operating System

Computer Network

Compiler Design

Computer Organization

Discrete Mathematics

Ethical Hacking

Computer Graphics

Software Engineering

Web Technology

Cyber Security

Automata

C Programming

C++

Java

.Net

Python

Programs

Control System

Data Mining

Data Warehouse

^{Like/Subscribe us for latest updates or newsletter}

C++ Tutorial

C++ Control Statement

C++ Functions

C++ Arrays

C++ Pointers

C++ Object Class

C++ Inheritance

C++ Polymorphism

C++ Abstraction

C++ Namespaces

C++ Strings

C++ Exceptions

C++ Templates

Signal Handling

C++ File & Stream

C++ Misc

C++ STL Tutorial

C++ Iterators

C++ Programs

MCQ

Interview Question