Python - Combine all CSV Files in Folder
In this tutorial, we will demonstrate different Python-based methods for merging or combining numerous CSV data into a single file (this method also applies to text and other types of files). There will be a bonus lesson on how to quickly merge numerous CSV files for Linux. Finally, you can combine thousands of files with complete control over the imported data by converting the all-CSV files into the Pandas DataFrame and afterwards marking each row with the CSV file that it came from.
How to use Python to combine many identical CSV files Please take note that we presumptively presume that all files contain the same columns and information. Concatenating the CSV files in the Downloads folder using a short code sample.
Five dummy sales files that CSV format that I'll be using in the next tutorial step may be found in this folder. Each file has precisely the same number of columns and exactly 1 million rows. For instance, the sales_recors_n1.csv file has the following format when opened.
The listdir() method inside the os Python module can be used to list the files' contents after you have downloaded them into a certain directory on your laptop.
In order to use this method, you must provide the directory's exact path, as seen below:
Take note of the Python list that this method produces, which contains the all files inside the sales csv directory. This is advantageous since it enables iterative file reading with the object.
The first step's objective is to use Python to combine five CSV files into a single dataset with 5 million rows. I'll utilise the os or pandas packages to accomplish it. The complete Python script is as follows to accomplish that
Let me now walk you through the code's action step by step.
When we print file list's contents, we get the following:
['/Users/anbento/Documents/sales/sales_csv/sales_records_n5.csv', '/Users/anbento/Documents/sales/sales_csv/sales_records_n4.csv', '/Users/anbento/Documents/sales/sales_csv/sales_records_n1.csv', '/Users/anbento/Documents/sales/sales_csv/sales_records_n3.csv', '/Users/anbento/Documents/sales/sales_csv/sales_records_n2.csv']
All of the files are now parse-ready and have their paths associated.
Although the list is unordered, keep in mind that if the sequence in which the files were combined is crucial to then, we will need to utilise the sorted () function when iterating through to the list.
3. we now start by making an empty csv list. we then use pandas' read csv () method to continuously read any CSV file in the file list and add those datasets to the csv list.
It should be noted that when a CSV file is parsed using read csv (), the datasets are automatically transformed to pandas DFs, therefore csv_list now contains 5 distinct pandas DFs. However, we are also attaching the following command in addition to parsing the files.
This adds a new column with the title of the source CSV file to each DF so that, when files are combined, it is clear which file is which.
The files are eventually combined into a singular file pandas DF by applying concat() method into it. A new ordered index would be created for csv merged if ignore index is set to True.
The final step is to transform csv merged from a pandas DF situated in the identical directory. To accomplish this, use the to csv() command. When index=False is used, it indicates that the index-containing column must not be added.
['sales_records_full.csv', 'sales_records_n1.csv', 'sales_records_n2.csv', 'sales_records_n3.csv', 'sales_records_n4.csv', 'sales_records_n5.csv']
It just only 7 lines of code plus a few seconds of run-time, with the exception of the package imports, to get this outcome.