Reading a CSV File

JavaJavaBeginner
Practice Now

Introduction

In this lab, we will learn how to read CSV (Comma-Separated Values) files in Java. CSV is a common file format used to store tabular data such as spreadsheets or database exports. Each line in a CSV file represents a row of data, with columns separated by commas.

We will explore three different approaches to reading CSV files in Java:

  • Using the BufferedReader class from the java.io package
  • Using the Scanner class from the java.util package
  • Using the OpenCSV library, a popular third-party library for CSV processing

By the end of this lab, you will be able to choose the most appropriate method for reading CSV files in your Java applications based on your specific requirements.

Create a Sample CSV File and Project Structure

Before we start reading CSV files, let's ensure our project is properly set up. In this step, we will examine the structure of our CSV file and create our main Java class.

Understanding CSV Files

A CSV (Comma-Separated Values) file stores tabular data in plain text. Each line represents a row, and columns are separated by commas. CSV files are widely used for data exchange because of their simplicity and compatibility with many applications like Excel, Google Sheets, and database systems.

Examining Our Sample CSV File

Our lab environment already includes a sample CSV file at ~/project/sample.csv. Let's first take a look at its contents:

cat ~/project/sample.csv

You should see the following output:

name,age,city
John,25,New York
Alice,30,Los Angeles
Bob,28,Chicago
Eve,22,Boston

This CSV file contains four rows of data (including the header row) with information about people, their ages, and cities.

Creating Our Java Class

Now, let's create a new Java class named CSVReaderDemo.java in the src directory that we'll use throughout this lab.

In VSCode, click on the Explorer icon in the sidebar, navigate to the ~/project/src directory, right-click on it, and select "New File". Name the file CSVReaderDemo.java.

Add the following basic structure to the file:

public class CSVReaderDemo {
    public static void main(String[] args) {
        System.out.println("CSV Reader Demo");

        // We will add CSV reading code here in the next steps
    }
}
Create Java File

Let's compile and run our Java class to verify everything is set up correctly:

cd ~/project
javac -d . src/CSVReaderDemo.java
java CSVReaderDemo

You should see the output:

CSV Reader Demo

Great! Now we have our project structure ready. In the next steps, we will implement different methods to read our CSV file.

Reading CSV Files Using BufferedReader

In this step, we will implement our first approach to reading CSV files using the BufferedReader class from the java.io package. This is a common and straightforward method for reading text files in Java.

Understanding BufferedReader

BufferedReader is a class that reads text from a character-input stream, buffering characters to provide efficient reading of characters, arrays, and lines. The buffer size can be specified, or the default size can be used.

Implementing CSV Reading with BufferedReader

Let's update our CSVReaderDemo.java file to read the CSV file using BufferedReader. Replace the entire content of the file with the following code:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

public class CSVReaderDemo {
    public static void main(String[] args) {
        System.out.println("Reading CSV using BufferedReader");

        // Path to our CSV file
        String csvFile = "sample.csv";

        // Lists to store our data
        List<List<String>> data = new ArrayList<>();

        // Try-with-resources to ensure the reader gets closed automatically
        try (BufferedReader br = new BufferedReader(new FileReader(csvFile))) {
            String line;

            // Read each line from the file
            while ((line = br.readLine()) != null) {
                // Split the line by comma and convert to a List
                String[] values = line.split(",");
                List<String> lineData = Arrays.asList(values);

                // Add the line data to our main list
                data.add(lineData);
            }

            // Print the data we read
            System.out.println("\nData read from CSV file:");
            for (int i = 0; i < data.size(); i++) {
                List<String> row = data.get(i);
                System.out.println("Row " + i + ": " + String.join(", ", row));
            }

        } catch (IOException e) {
            System.err.println("Error reading the CSV file: " + e.getMessage());
            e.printStackTrace();
        }
    }
}
Update Java File

Let's compile and run our updated code:

cd ~/project
javac -d . src/CSVReaderDemo.java
java CSVReaderDemo

You should see output similar to this:

Reading CSV using BufferedReader

Data read from CSV file:
Row 0: name, age, city
Row 1: John, 25, New York
Row 2: Alice, 30, Los Angeles
Row 3: Bob, 28, Chicago
Row 4: Eve, 22, Boston

Code Explanation

  1. We import necessary Java classes for file I/O operations and data structures.
  2. We define the path to our CSV file (sample.csv).
  3. We create a List<List<String>> to store our CSV data as a two-dimensional list.
  4. We use a try-with-resources block to automatically close the BufferedReader after use.
  5. We read each line from the file with br.readLine().
  6. For each line, we split it by commas using line.split(",") and convert it to a List.
  7. We add each row to our main list of data.
  8. Finally, we print the data to verify we read it correctly.

The BufferedReader approach is simple and efficient for reading text files, including CSV files. However, it has limitations when dealing with more complex CSV formatting, such as fields containing commas or newlines enclosed in quotes.

In the next step, we will explore another approach using the Scanner class.

Reading CSV Files Using Scanner

In this step, we will implement our second approach to reading CSV files using the Scanner class from the java.util package. The Scanner class provides a convenient way to read formatted input from various sources.

Understanding Scanner

The Scanner class breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods.

Implementing CSV Reading with Scanner

Let's update our CSVReaderDemo.java file to read the CSV file using Scanner. Replace the entire content of the file with the following code:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Scanner;

public class CSVReaderDemo {
    public static void main(String[] args) {
        System.out.println("Reading CSV using Scanner");

        // Path to our CSV file
        String csvFile = "sample.csv";

        // Lists to store our data
        List<List<String>> data = new ArrayList<>();

        try (Scanner scanner = new Scanner(new File(csvFile))) {
            // Read each line from the file
            while (scanner.hasNextLine()) {
                String line = scanner.nextLine();

                // Split the line by comma and convert to a List
                String[] values = line.split(",");
                List<String> lineData = Arrays.asList(values);

                // Add the line data to our main list
                data.add(lineData);
            }

            // Print the data we read
            System.out.println("\nData read from CSV file:");
            for (int i = 0; i < data.size(); i++) {
                List<String> row = data.get(i);
                System.out.println("Row " + i + ": " + String.join(", ", row));
            }

        } catch (FileNotFoundException e) {
            System.err.println("CSV file not found: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Let's compile and run our updated code:

cd ~/project
javac -d . src/CSVReaderDemo.java
java CSVReaderDemo

You should see output similar to this:

Reading CSV using Scanner

Data read from CSV file:
Row 0: name, age, city
Row 1: John, 25, New York
Row 2: Alice, 30, Los Angeles
Row 3: Bob, 28, Chicago
Row 4: Eve, 22, Boston

Code Explanation

  1. We import necessary Java classes for file operations, Scanner, and data structures.
  2. We define the path to our CSV file (sample.csv).
  3. We create a List<List<String>> to store our CSV data as a two-dimensional list.
  4. We use a try-with-resources block to automatically close the Scanner after use.
  5. We read each line from the file with scanner.nextLine() as long as scanner.hasNextLine() returns true.
  6. For each line, we split it by commas using line.split(",") and convert it to a List.
  7. We add each row to our main list of data.
  8. Finally, we print the data to verify we read it correctly.

The Scanner approach is similar to the BufferedReader approach but provides more convenience methods for parsing different types of data. However, like BufferedReader, it has limitations when dealing with complex CSV formatting.

In the next step, we will explore a more robust approach using the OpenCSV library, which handles complex CSV formatting more effectively.

Reading CSV Files Using OpenCSV Library

In this step, we will implement our third approach to reading CSV files using the OpenCSV library. OpenCSV is a third-party library that provides robust CSV parsing capabilities, handling complex scenarios like fields containing commas or newlines enclosed in quotes.

Understanding OpenCSV

OpenCSV is a CSV parser library for Java that supports all the basic CSV-format variations. Unlike the previous approaches, OpenCSV properly handles quoted fields containing commas, line breaks, and other special characters that would otherwise break simple splitting by commas.

Setting Up OpenCSV

First, let's download the OpenCSV library and its dependencies:

cd ~/project
mkdir -p lib
curl -L -o lib/opencsv-5.7.1.jar https://repo1.maven.org/maven2/com/opencsv/opencsv/5.7.1/opencsv-5.7.1.jar
curl -L -o lib/commons-lang3-3.12.0.jar https://repo1.maven.org/maven2/org/apache/commons/commons-lang3/3.12.0/commons-lang3-3.12.0.jar
curl -L -o lib/commons-text-1.10.0.jar https://repo1.maven.org/maven2/org/apache/commons/commons-text/1.10.0/commons-text-1.10.0.jar
curl -L -o lib/commons-beanutils-1.9.4.jar https://repo1.maven.org/maven2/commons-beanutils/commons-beanutils/1.9.4/commons-beanutils-1.9.4.jar
curl -L -o lib/commons-collections-3.2.2.jar https://repo1.maven.org/maven2/commons-collections/commons-collections/3.2.2/commons-collections-3.2.2.jar
curl -L -o lib/commons-logging-1.2.jar https://repo1.maven.org/maven2/commons-logging/commons-logging/1.2/commons-logging-1.2.jar

Creating a More Complex CSV File

Let's create a more complex CSV file that includes quoted fields with commas:

echo 'name,description,price
"Laptop","High-performance laptop, with SSD",999.99
"Smartphone","Latest model, with dual camera",499.99
"Headphones","Noise-canceling, wireless",149.99' > ~/project/products.csv

Implementing CSV Reading with OpenCSV

Now, let's update our CSVReaderDemo.java file to read the CSV file using OpenCSV. Replace the entire content of the file with the following code:

import com.opencsv.CSVReader;
import com.opencsv.exceptions.CsvValidationException;
import java.io.FileReader;
import java.io.IOException;

public class CSVReaderDemo {
    public static void main(String[] args) {
        System.out.println("Reading CSV using OpenCSV");

        // Path to our CSV file with complex data
        String csvFile = "products.csv";

        try (CSVReader reader = new CSVReader(new FileReader(csvFile))) {
            // Read and print the header
            String[] header = reader.readNext();
            if (header != null) {
                System.out.println("\nHeader: " + String.join(", ", header));
            }

            // Read and print each line
            String[] nextLine;
            int rowNumber = 1;

            System.out.println("\nData read from CSV file:");
            while ((nextLine = reader.readNext()) != null) {
                System.out.println("Row " + rowNumber + ":");
                for (int i = 0; i < nextLine.length; i++) {
                    System.out.println("  " + header[i] + ": " + nextLine[i]);
                }
                rowNumber++;
                System.out.println();
            }

        } catch (IOException | CsvValidationException e) {
            System.err.println("Error reading the CSV file: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Let's compile and run our updated code:

cd ~/project
javac -cp ".:lib/*" -d . src/CSVReaderDemo.java
java -cp ".:lib/*" CSVReaderDemo

You should see output similar to this:

Reading CSV using OpenCSV

Header: name, description, price

Data read from CSV file:
Row 1:
  name: Laptop
  description: High-performance laptop, with SSD
  price: 999.99

Row 2:
  name: Smartphone
  description: Latest model, with dual camera
  price: 499.99

Row 3:
  name: Headphones
  description: Noise-canceling, wireless
  price: 149.99

Code Explanation

  1. We import necessary classes from the OpenCSV library and Java I/O.
  2. We define the path to our CSV file (products.csv).
  3. We create a CSVReader object to read the CSV file.
  4. We read the header row with reader.readNext() and store it for later use.
  5. We then read each subsequent row with reader.readNext() in a loop until there are no more rows.
  6. For each row, we print each field along with its corresponding header.

The OpenCSV library handles the complex CSV formatting automatically, correctly parsing fields with commas enclosed in quotes. This makes it ideal for real-world CSV files that may contain complex data.

Advantages of OpenCSV

OpenCSV offers several advantages over the basic approaches:

  1. It correctly handles quoted fields containing commas, newlines, and other special characters.
  2. It provides built-in support for reading into beans (Java objects).
  3. It supports advanced features like custom separators, quote characters, and escape characters.
  4. It handles large CSV files efficiently.

For most real-world applications dealing with CSV files, using a dedicated library like OpenCSV is the recommended approach.

Summary

In this lab, we explored three different approaches to reading CSV files in Java:

  1. Using BufferedReader: A simple approach using the standard Java I/O library. It works well for basic CSV files but has limitations when dealing with complex CSV formatting.
  2. Using Scanner: Another approach using the standard Java utility library. Like BufferedReader, it is suitable for simple CSV files but lacks support for complex CSV formatting.
  3. Using OpenCSV: A robust approach using a third-party library specifically designed for CSV processing. It handles complex CSV formatting, including quoted fields containing commas, newlines, and other special characters.

Each approach has its strengths and use cases:

  • BufferedReader and Scanner are good choices for simple CSV files when you want to avoid external dependencies.
  • OpenCSV is the best choice for real-world applications dealing with potentially complex CSV files.

By understanding these different approaches, you can choose the most appropriate method based on your specific requirements and the complexity of your CSV data.

CSV files are widely used in data processing, data exchange, and data integration scenarios. The ability to read and process CSV files is a valuable skill for Java developers, particularly in data-focused applications and integrations with other systems.