Strtok() function in C

The C string manipulation function "strtok()" is used to split a string into tokens or smaller strings based on a designated delimiter. The "string.h" header file contains the function's declaration.

Syntax:

Syntax for using strtok() function is as follows:

The string that is to be tokenized is pointed to by the "str" parameter. A pointer to a string of delimiter characters used to divide the str string into smaller tokens is contained in the "delim" parameter.

Example:

An example code snippet:

#include <stdio.h>
#include <string.h>
int main() {
   char str[] = "Delhi,Hyderabad,Noida";
   char *token;
   /* get the first token */
   token = strtok(str, ",");
   /* loop through the string to extract all other tokens */
   while(token != NULL) {
      printf("%s\n", token);
      token = strtok(NULL, ",");
   }

   return 0;
}

Output:

Delhi
Hyderabad
Noida

Explanation:

In the above example, the comma (,) is used as a delimiter to separate the str string into smaller tokens. The strtok() function is first invoked with str as its first argument. The function returns a pointer to the first token, which is kept in the token variable. Once all tokens have been extracted, the loop resumes by calling strtok() with the delimiter as the second argument and a NULL for the first argument. The NULL first argument tells strtok() to pick up where the previous call left off.

Some other information related to strtok() function:

The function alters the input string as follows: The strtok() function substitutes null ('0') characters for the input string's delimiter characters. Since the original string has been altered, subsequent calls to strtok() will follow up where the first one left off.
The initial string is only taken by the first call to strtok(): Subsequent calls should take a NULL pointer as their first argument to continue tokenizing the original string after the first call to strtok(). It is because the initial call already made changes to the original string.
When there are no more tokens left, the function returns NULL: Strtok() returns a NULL pointer after all the tokens have been extracted from the input string. It can be used as a cue to halt token processing.
The strtok() function should not be used in multi-threaded programs without proper synchronization because it is not thread-safe.
Multiple delimiters can be handled by the function by passing a string containing each delimiter as the second argument to the strtok() function.
The function can also be used to determine how many tokens are present: By calling the strtok() function repeatedly in a loop with NULL as the first argument and incrementing a counter for each NON-NULL token returned, you can count the number of tokens in an input string.
Empty token handling: strtok() treats a string of consecutive delimiter characters as a single delimiter and does not return an empty token for each one.
When using strtok() with dynamically allocated strings, make sure to save a pointer to the original string so you can free it later. You should also look for NULL pointers returned by strtok() to prevent accessing memory that has been freed.
For difficult tokenization tasks, strtok() might not be the best option: You might need to use a more complex parsing method, such as regular expressions, a parser generator, or a manually written parser, if your tokenization task is complicated.
Strtok() may not function properly with non-ASCII characters, especially if the input string is encoded in a multibyte character set, so use caution when using it. You might need to employ a different tokenization method in this situation.
Substrings can be extracted using strtok(): Strtok() can be used to extract substrings from a larger string in addition to tokenizing a string. Use the returned pointer to extract the substring by passing a delimiter string (such as a special character or a string you are sure does not appear in the input string) that does not appear in the input string.
Tokenizing streams is possible with strtok(): Streams of input can be tokenized using strtok(), which is typically used to tokenize strings. You would have to read input into a buffer from a stream (like a file or a network connection) to accomplish this and then pass the buffer to strtok() for tokenization.