CSV file management using C++

In this article, we will discuss CSV file management in C++ with its characteristics, uses, and several examples.

What is CSV?

A basic file format called Comma Separated Values (CSV) stores tabular data in databases and spreadsheets. CSV files contain plain text with values separated by commas, with each line representing a row of data.

Some key characteristics of the CSV format include:

Plain text format: CSV files consist of only ASCII characters, which makes them universally readable.
Comma-delimited values: Commas (or other delimiters like tabs or pipes) separate each field or value per row.
The first row often contains header values: The first line usually contains column names to represent metadata.
Literal values need to be quoted: Any values containing commas, line breaks, or quotes must be wrapped in double quotes.
Interpreted data types: All data types have to be inferred since CSV has no data schema.
Cross-platform portability: CSVs can exchange tabular data easily between programs and platforms.
Compact file sizes: No bulky syntax or tags result in smaller files compared to XML or

Why is a CSV File used?

Here are some of the main reasons why CSV (Comma Separated Values) files are commonly used:

Simplicity: CSV is a very simple file format that is easy to understand and work. It requires minimal formatting compared to other data files.
Portability: CSV files can be opened by almost any application. It is compatible with many databases, spreadsheets, programming languages, etc. It makes data exchange easy.
Editability: Basic text editors can easily view and edit CSV data manually. It is useful for managing small datasets.
Size: The structure of CSV files makes them lightweight and compact compared to other data formats. It is easy to transfer and store.
Volumes: CSV can effectively handle large datasets with millions of rows, where size would be prohibitive in programs like Excel.
Output: Many programs include built-in options to export tabular data into CSV format for interoperability.
Import: At the same time, CSV data can be easily imported into various analytical tools, spreadsheet programs and databases for analysis.
Brevity: CSV is focused on data and contains no metadata bloat, leading to space savings.

Managing Records in a CSV File with C++

CSV (comma-separated values) files are a popular format for storing and exchanging tabular data. While CSV files are simple, managing the records from a C++ program requires some care. Here, we look at functions to safely add, update and remove records from a CSV.

Opening the CSV

We first need to open the CSV file using C++'s ifstream and parse the contents for reading and writing. We can use C++ std::getline() to read each line. A CSV parsing library like CSV.h simplifies separating the comma-delimited fields.

To Open a CSV file for reading:

To Open a CSV file for writing:

To Open a CSV file for both reading and writing:

The key points to remember:

Use ifstream to open the file for reading input.
Use ofstream to open files for writing output.
Use fstream to open in read/write mode.
Pass the filename in double quotes as a parameter.
For fstream, specify ios::in and ios::out access mode.

Create Operation

Using the create operation, we can add a new record (row) to an existing CSV file.

For example, consider a CSV file 'data.csv' with the following contents:

Name, Age, City
John,30, New York
Jenny,25, India

If we want to add a new record, follow these steps:

Open the CSV file using an input file stream and parse it into rows.
Create a new row as a vector of strings.
Append this new row to the rows vector.
Finally, save the updated rows back to the CSV file.

Example:

Let us take an example to illustrate the CSV file using the create method.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int main() {

  ifstream infile("data.csv");
  vector<vector<string>> rows;

  // Read CSV into rows
  string line; 
  while(getline(infile, line)) {
    vector<string> row;
    
    string field;
    stringstream ss(line);

    while(getline(ss, field, ',')) {
      row.push_back(field);
    }

    rows.push_back(row);
  }

  // Create a new record
  vector<string> newRecord {"Sam", "35", "Boston"};

  // Add new record    
  rows.push_back(newRecord);

  // Write updated CSV
  ofstream outfile("data.csv");

  for(auto row : rows) {
    for(auto field : row) {
      outfile << field << ",";
    }
    outfile << "\n";
  }

  outfile.close();
  
  return 0;
}

Output:

Name, Age, City
John,30, New York  
Sarah,28, Miami
Sam,35, Boston

Read a particular record:

To read a record from a CSV file in C++, open the file using an ifstream, and then read each line and use istringstream to extract the desired fields. Compare the values to find the record of interest. If the record is found, process or print the record. Once it is executed, Close the file.

Example:

Let us take an example to illustrate the CSV file using the read method.

#include <iostream>
#include <fstream>
#include <sstream>

void readRecord(const std::string& filename, const std::string& targetName) {
    try {
        std::ifstream csvfile(filename);

        if (!csvfile.is_open()) {
            throw std::runtime_error("Error: File not found.");
        }

        std::string line;
        bool recordFound = false;

        // Read existing data from the CSV file
        while (std::getline(csvfile, line)) {
            std::istringstream iss(line);
            std::string name, age, city;

            std::getline(iss, name, ',');
            std::getline(iss, age, ',');
            std::getline(iss, city, ',');

            if (name == targetName) {
                // Print the found record
                std::cout << "Record found: " << name << " " << age << " " << city << std::endl;
                recordFound = true;
                break;  // Stop searching after finding the record
            }
        }

        csvfile.close();

        if (!recordFound) {
            std::cout << "Record not found." << std::endl;
        }
    } catch (const std::exception& e) {
        std::cerr << e.what() << std::endl;
    }
}

int main() {
    // Example usage
    readRecord("example.csv", "Alice");

    return 0;
}

Output:

Record found: Alice 30 London

Write in CSV File:

In C++, create an ofstream object and open the file in write mode to write to a CSV file. Utilize the << operator to write data, ensuring values are comma-separated for distinct columns. Conclude the process by closing the file using the close() method. Here's a brief code that shows how it works:

Example:

#include <iostream>
#include <fstream>

int main() {
    // Data to be written to the CSV file
    const char* data[][3] = {
        {"Name", "Age", "City"},
        {"John", "25", "New York"},
        {"Alice", "30", "London"},
        {"Bob", "22", "Paris"}
    };

    // Open a CSV file for writing
    std::ofstream csvFile("example.csv");

    // Write data to the CSV file
    for (const auto& row : data) {
        for (int i = 0; i < 3; ++i) {
            csvFile << row[i];
            if (i < 2) {
                csvFile << ","; // Add a comma except for the last field in a row
            }
        }
        csvFile << "\n";
    }

    // Close the CSV file
    csvFile.close();

    std::cout << "Data written to CSV file successfully." << std::endl;

    return 0;
}

Output:

Name, Age, City,
John,25, New York,
Alice,30, London,
Bob,22, Paris,

Update a Record:

Here is one way to update a record in a CSV file:

Open the CSV file.
Please read all the contents of the CSV file line by line and store it in memory. It creates a representation of the CSV data that we can manipulate.
Scan the loaded CSV data to find the row/record we want to update. Identify it based on some unique identifier like an ID column or name.
Once we've found the target row, update the desired column values. For example, update the phone number or email address field.
Completely overwrite the existing CSV file with the modified data in memory, including the updated row. Now, everything will match the updated data.

Example:

Let us take an example to illustrate the CSV file using the update method.

#include <iostream>
#include <fstream>
#include <sstream>

void updateRecord(const std::string& filename, const std::string& targetName, int newAge, const std::string& newCity) {
    try {
        std::ifstream inFile(filename);
        std::ofstream outFile("temp.csv");  // Create a temporary file

        if (!inFile.is_open() || !outFile.is_open()) {
            throw std::runtime_error("Error opening files.");
        }

        std::string line;
        bool recordUpdated = false;

        // Read existing data from the CSV file
        while (std::getline(inFile, line)) {
            std::istringstream iss(line);
            std::string name, age, city;

            std::getline(iss, name, ',');
            std::getline(iss, age, ',');
            std::getline(iss, city, ',');

            if (name == targetName) {
                // Update the record
                outFile << name << "," << newAge << "," << newCity << "\n";
                recordUpdated = true;
            } else {
                outFile << line << "\n";  // Write non-matching records to the temporary file
            }
        }

        inFile.close();
        outFile.close();

        // Put the temporary file in place of the original one
        if (recordUpdated) {
            std::remove(filename.c_str());
            std::rename("temp.csv", filename.c_str());
            std::cout << "Record updated successfully." << std::endl;
        } else {
            std::cout << "Record not found." << std::endl;
            std::remove("temp.csv");  // Remove the temporary file if no record is updated
        }
    } catch (const std::exception& e) {
        std::cerr << e.what() << std::endl;
    }
}

int main() {
    // Example usage
    updateRecord("example.csv", "Alice", 32, "Manchester");

    return 0;
}

Output:

The output after updating Alice's record to age 32 and city "Manchester" would be:

Name, Age, City
John,25, New York
Alice,32, Manchester
Bob,22, Paris

Delete a Record:

Here are the steps to delete a record from a CSV file in a simple way:

Open the CSV file and read the contents into a data structure (like a vector of vectors).
Search through the data to identify the record we want to delete.
Remove the record from the data structure.
After that, open the CSV file in write mode.
Write the updated data from the structure back to the CSV file.
Close the files.

The main points to understand:

Bring CSV files into memory for easy manipulation.
The updated memory structure gets written back to the CSV file.

Example:

Let us take an example to illustrate the CSV file using the delete method.

#include <iostream>
#include <fstream>
#include <sstream>

void deleteRecord(const std::string& filename, const std::string& targetName) {
    std::ifstream inFile(filename);
    std::ofstream outFile("temp.csv");  // Create a temporary file

    if (!inFile.is_open() || !outFile.is_open()) {
        std::cerr << "Error opening files." << std::endl;
        return;
    }

    std::string line;
    bool recordFound = false;

    // Read existing data from the CSV file
    while (std::getline(inFile, line)) {
        std::istringstream iss(line);
        std::string name;
        std::getline(iss, name, ',');

        if (name != targetName) {
            outFile << line << "\n";  // Write non-matching records to the temporary file
        } else {
            recordFound = true;
        }
    }

    inFile.close();
    outFile.close();

    // Put the temporary file in place of the original one.
    if (recordFound) {
        std::remove(filename.c_str());
        std::rename("temp.csv", filename.c_str());
        std::cout << "Record deleted successfully." << std::endl;
    } else {
        std::cout << "Record not found." << std::endl;
        std::remove("temp.csv");  // Remove the temporary file if no record is deleted
    }
}

int main() {
    // Example usage
    deleteRecord("example.csv", "Alice");

    return 0;
}

In this code:

CSV file is read into a 2D vector
The user provides an index of records to delete.
Vectors's 'erase()' function removes that record.
The updated vector is written back to the CSV file.

Output:

Before Deletion:

Name, Age, City
John,25, New York
Alice,30, London
Bob,22, Paris

After deleting Alice's record in the CSV file:

Name, Age, City
John,25, New York
Bob,22, Paris

Optimizing CSV File Processing for Large Datasets:

Here are some ways to optimize performance when working with large CSV files in C++:

1. Buffered I/O

Use buffered streams like 'fstream' instead of unbuffered input/output.
Buffering reduces the number of system calls and improves disk I/O efficiency.

2. Parallel Processing

Process CSV files across multiple threads using standard parallel algorithms.
Each thread handles a subset of rows independently.
Merge outputs from threads.
Parallelism utilizes multicore architecture.

3. Compression

Use compression algorithms like gzip while writing CSV.
Compact size reduces I/O time.
Leverage multi-core hardware accelerated compression libraries.

4. Data Formatting

Pre-allocate vectors to size for parsed CSV data instead of dynamic growth.
Reserve() capacity to minimize re-allocations.
Variable length data like strings add parsing overhead.

5. Additional Points

Use memory-mapped files for random access without parsing.
Batch database inserts for multiple rows together.
Profile to identify bottlenecks - I/O, parsing, processing.

Combining buffering, compression, parallelism, and reduced allocations/copies can significantly speed up large CSV processing.

Handling Exceptions and Errors when Processing CSV Files

Here are key strategies for handling errors and exceptions when working with CSV files:

Common Errors and Exceptions:

File Not Found:

Check for file existence before opening.
Provide informative error messages if not found.

Invalid Format:

Use ..except blocks to catch parsing errors.
Validate file structure and data types.
Consider using libraries that handle common format issues.

Data Errors:

Validate data types and ranges.
Handle missing or inconsistent values appropriately (e.g., fill with defaults, flag for review).

Permissions Issues:

Ensure your program has the necessary read/write permissions.

Encoding Errors:

Specify correct encoding when opening files (e.g., UTF-8).

Best Practices:

Use try...except Blocks:

Enclose file opening and operations within ..except blocks to handle potential errors gracefully.

Provide Informative Error Messages:

Include clear error messages with helpful context for users or developers.

Validate Data:

Check for valid data types, ranges, and consistency.

Log Errors:

Record errors for debugging and monitoring purposes.

Consider Data Validation Libraries:

Use libraries like pandas or csvlint for advanced validation and error handling.

Test Thoroughly:

Test code with various input files, including those with potential errors, to ensure robustness.

Example:

#include <iostream>
#include <fstream>
#include <sstream>

int main() {
    try {
        std::ifstream csvfile("data.csv");

        if (!csvfile.is_open()) {
            throw std::runtime_error("Error: File not found.");
        }

        std::string line;
        while (std::getline(csvfile, line)) {
            std::istringstream iss(line);
            std::string field;

            while (std::getline(iss, field, ',')) {
                // Process field data (equivalent to row data in Python)
                std::cout << field << " ";
            }

            // Process row data
            std::cout << std::endl;
        }

        csvfile.close();
    } catch (const std::exception& e) {
        std::cerr << e.what() << std::endl;
    }

    return 0;
}

Output:

Name Age City 
John 25 New York 
Alice 30 London 
Bob 22 Paris

Additional Tips:

Write Robust Code: Anticipate potential errors and design code to handle them gracefully.
Consider User Experience: Provide clear feedback and guide users through error resolution.
Use Appropriate Data Structures: Choose structures that align with CSV data for efficient processing and error handling.

Next TopicFleury's Algorithm for printing Eulerian Path or Circuit in C++

← prev next →