AWK Command

The awk command is used for text processing in Linux. Although, the sed command is also used for text processing, but it has some limitations, so the awk command becomes a handy option for text processing. It provides powerful control to the data.

The Awk is a powerful scripting language used for text scripting. It searches and replaces the texts and sorts, validates, and indexes the database.

It is one of the most widely used tools for the programmer, as they write the scaled-down effective program in the form of a statement to define the text patterns and designs.

It acts as a filter in Linux. It is also referred as gawk (GNU awk) In Linux.

The AWK command s a domain-specific language developed for text processing and used as a reporting and data extraction tool. It is a data-driven scripting language composed of a group of actions to be taken against textual data streams- either directly run on files or utilized as a part of a pipeline- for the aim of transforming and extracting data like generating formatted reports.

This language uses regular expressions, associative arrays, and string datatypes. The language is Turing-complete, and the early AWK Bell Lab users often specified well-structured AWK programs, while AWK has a restricted application domain and was specifically developed for supporting one-liner programs.

How is it named as AWK?

This command is named by using the first letter of the name of three people who wrote the original version of this command in 1977. Their names are Alfred Aho, Peter Weinberger, and Brian Kernighan and they were from AT & T Bell Laboratories.

Features of AWK command

Various features of the Awk command are as follows:

It scans a file line by line.
It splits a file into multiple fields.
It compares the input text or a segment of a text file.
It performs various actions on a file like searching a specified text and more.
It formats the output lines.
It performs arithmetic and string operations.
It applies the conditions and loops on output.
It transforms the files and data on a specified structure.
It produces the format reports.

Syntax:

The Awk command is used as follows:

The options can be:

-f program files: It reads the source code of the script written on the awk command
-F fs: It is used as the input field separator.

AWK program structure

The AWK program is a collection of the pairs of pattern actions, specified as:

condition {action}
condition {action}
...

Where the action is a collection of commands and the condition is an expression. The input is divided into records, where the records are isolated by newline characters by default so that the input is divided into different lines. The program checks all records against all of the conditions and runs the action for all expressions that are true. Either the action or the condition may be absent. The condition defaults to the same as all records. The action (default) is to show the record. Like sed, it is a similar pattern-action structure.

In addition to common logical and arithmetic operators, AWK expressions contain the '~' (tilde operator), which is the same as a regular expression against any string. Without the tilde operator, /regexp/ is the same as the current record as accessible syntactic sugar. The syntax acquires from sed, which is acquired through the ed editor, in which / is applied for searching purposes. This syntax of applying slashes for regular expressions as delimiters was adopted by ECMAScript and Perl and are common now. Also, the tilde operator was accepted by Perl.

Implementations and versions

Originally, AWK was specified in 1977 and distributed using Version 7 Unix. The AWK authors started developing the language in 1985, most importantly by including user-defined functions. This language is defined in the "The AWK Programming Language" book, released in 1988, and its implementation was available in the UNIX System V releases.

This release was sometimes known as nawk or "new awk" for avoiding confusion with the conflicting older versions. This implementation was published in 1996 under a free software license and is still managed by Brian Kernighan.
Old Unix versions, like UNIX/32V, added awkcc, which transformed AWK into C.

BWK awk, also called nawk, represents the release by Brian Kernighan. This version has been titled the "One True AWK" due to the term usage in collaboration with the book that defined the language and the matter that Kernighan was one of the actual AWK authors. FreeBSD called this release one-true-awk.
Also, this release has aspects not in the book, like ENVIRON and tolower. This release is used by illumos, macOS, OpenBSD, NetBSD, FreeBSD, and Android. Arnold Robbins and Brian Kernighan are the primary contributors to the source repository of nawk.
GNU awk, or gawk, is also a free software implementation that makes good progress in implementing TCP/IP networking and internationalization and localization. It was specified before the actual implementation became available freely.
It adds its debugger; its profiler allows the user to have performance enhancements to any script. Also, it allows the user to increase functionality using shared libraries. A few Linux distributions add gawk as the default AWK implementation.
mawk is one of the fastest AWK implementations based on the bytecode interpreter by Mike Brennan.
libmawk is a mawk fork, permitting applications to insert two or more parallel awk interpreter instances.
awka is another AWK script translator into C code.
tawk is an AWK compiler for Windows, OS/2, DOS, and Solaris, previously taken by Thompson Automation Software.
Jawk is a project for implementing AWK in Java, which is hosted on SourceForge.
xgawk is a gawk fork that extends gawk using loadable libraries dynamically. The XMLgawk extension was unified into the official 4.1.0 release of GNU AWK.
QSEAWK is a fixed AWK interpreter implementation added in the QSE library that offers embedding API for C++ and C.
libfawk is an embedded, very small, reentrant, and function-only interpreter specified in C.
CLAWK offers an AWK implementation in Common Lisp by Michael Parker, based on the regular expression library of a similar author.
BusyBox adds an AWK implementation specified by Dmitry Zakharov. It's a very small implementation compatible with embedded systems.

How to define AWK Script?

To define the awk script, use the awk command followed by curly braces {} surrounded by single quotation mark '' as follows:

The above command will print the inputted string every time we execute the command. Press CTRL+D key to terminate the program. Consider the below output:

AWK Command Examples

To better understand the Awk command, have a look at the below example:

Let's create a data to apply the various awk operations. Consider student data from different streams.

To create data, execute the cat command as follows:

cat > student.txt
Sam CS
Daniel IT
John IT
Arya IT
Mike ECE
Helena ECE

Press CTRL + D key to save the file and ESC key to exit from the command-line editor. It will create data. Consider the below output:

A student data has been created, and we will operate the awk command on this data.

Example1: List students with the specified pattern.

Consider the below command:

Output:

Example2: Default behaviour of awk command.

If we do not specify the pattern, it will show all of the content of the file.

Consider the below command:

We have not specified any pattern in the above command so, it will display all lines of the file.

Output:

Example3: Print the specified column.

If we specify the column number on this command, it will print that line only. Consider the below output:

The above command will print the column number 1 and 5. If column 5 does not exist in the file system, it will only print column number 1.

Consider the below output:

Consider the below command:

The above command will list the column number 1 & 2. Consider the below output:

Built-in variables in AWK command

Awk command supports many built-in variables, which include $1, $2, and so on, that break the file content into individual segments.

NR: It is used to show the current count of the lines. The awk command performs action once for each line. These lines are said as records.

NF: It is used to count the number of fields within the current database.

FS: It is used to create a field separator character to divide fields into the input lines.

OFS: It is used to store the output field separator. It separates the output fields.

ORS: It is used to store the output record separator. It separates the output records. It prints the content of the ORS command automatically.

Example4: Print the output and display the line number.

To display the line number in output, use the NR variable with the Awk command as follows:

Consider the below output:

Example5: Print the last field of the file.

To display the last field of the file, execute the NF variable with the Awk command as follows:

Consider the below output:

Example6: Separate the output in the specified format.

To separate the output by a '-' symbol or (:) semicolon, specify it with ORS command as follows:

The above command will separate the output by an underscore (_) symbol. Consider the below output:

Example7: Print the square of the numbers from 1 to 8.

To print the numbers from 1 to 8, execute below command:

The above command will print the square of 1 to 8. consider the below output:

square of 1 is 1
square of 2 is 4
square of 3 is 9
square of 4 is 16
square of 5 is 25
square of 6 is 36
square of 7 is 49
square of 8 is 64

Example8: Calculate the sum of a particular column.

Let's create a data to apply the sum operation on a column. To create students marks data, execute the cat command as follows:

javatpoint@javatpoint-GB-BXBT-2807:~$ cat >  StudentMarks
Name, Marks, Max marks            
Sam,75,100
Daniel,80,100
John,74,100
Arya,85,100
Mike,70,100 
Helena,74,100

Press CTRL+D to save the file. We have successfully created StudentsMarks data. We can check it by executing the cat command as follows:

To calculate the third column of the created data, execute the below command:

Output:

Consider the below output: