1. What is Spring Batch?
Spring Batch is a lightweight, comprehensive framework for batch processing that's part of the Spring ecosystem. It's designed to build robust and scalable batch applications for both enterprise and small-scale use. Batch applications are programs that process large volumes of data without human interaction, often running on a schedule. Think of tasks like processing daily transaction reports, generating monthly statements, or migrating data.
2. How Spring Batch Works
Spring Batch provides reusable functions that are essential for processing large datasets, including logging, transaction management, restartability, and skip functionality. The framework follows a common pattern called ETL (Extract, Transform, Load).
-
Extract: An
ItemReader
reads data from a source, like a database or a file. -
Transform: An optional
ItemProcessor
modifies or filters the data. -
Load: An
ItemWriter
writes the processed data to a destination.
The core of a Spring Batch application is a Job
. A Job
is made up of one or more Step
s. Each Step
is a self-contained sequence of the read-process-write operation.
3. When to Use It and When Not to Use It
Use Spring Batch when:
- You need to process large volumes of data periodically.
- You need to perform scheduled data migrations.
- You require restartability and skip-on-error functionality for your batch jobs.
- You need a centralized way to manage and monitor batch jobs.
Don't use Spring Batch when:
- You're building real-time, user-facing applications that require immediate responses.
- Your application is a typical web application serving synchronous requests.
- The data volume is very small and a simple script or a direct database operation would suffice.
4. Key Features
- Restartability: Jobs can be restarted from where they left off if they fail.
- Chunk-based processing: Processes data in chunks, which is highly efficient and minimizes memory usage.
-
Declarative I/O: Provides pre-built
ItemReader
andItemWriter
classes for common data sources (e.g., files, databases). - Scalability: Supports various scaling models, from multi-threaded steps to parallel processing.
- Transaction Management: Manages transactions to ensure data integrity during processing.
5. What are Job, Step, ItemReader, ItemProcessor, ItemWriter
These are the core components of a Spring Batch application.
-
Job
: The highest-level abstraction, encapsulating the entire batch process. It's composed of one or moreStep
s. -
Step
: A self-contained, independent phase of aJob
. MostStep
s follow the read-process-write chunk-oriented model. -
ItemReader
: Reads data from a source one item at a time. Examples: reading a line from a file or a row from a database. -
ItemProcessor
: (Optional) Processes or transforms an item read by theItemReader
. This is where business logic is applied. -
ItemWriter
: Writes a chunk of processed items to a destination.
6. What are FlatFileItemReader, LineTokenizer, LineMapper
These components are specifically used for reading data from flat files, like CSVs or fixed-width files.
-
FlatFileItemReader
: A specific implementation ofItemReader
for reading from flat files. It reads one line at a time. -
LineTokenizer
: A helper interface used by theFlatFileItemReader
. It takes a single line of text and tokenizes it (splits it into an array of strings or tokens).DelimitedLineTokenizer
is a common implementation for comma-separated files. -
LineMapper
: An interface that maps aLineTokenizer
's tokens to an object. It takes the array of strings and populates a Java object with the data from each field.
7. FlatFileItemWriter, JdbcCursorItemWriter
-
FlatFileItemWriter
: An implementation ofItemWriter
for writing data to a flat file. It takes a chunk of objects and writes each one as a line to the file. -
JdbcCursorItemWriter
: (There's noJdbcCursorItemWriter
). The correct component for writing to a database is typically aJdbcBatchItemWriter
. It uses aPreparedStatement
to perform batch inserts or updates, which is highly performant. AJdbcCursorItemReader
is used for reading from a database using a cursor.
8. Example of Reading Employee Data from CSV and Writing to a Database
Prerequisites
-
Dependencies: You'll need
spring-boot-starter-batch
andspring-boot-starter-data-jpa
(orjdbc
), along with a database driver (e.g.,h2
,mysql
,postgresql
).
Step 1: Create the Employee Model and Entity
Create a simple Java POJO that represents the employee data. Use @Entity
for JPA mapping.
import javax.persistence.Entity;
import javax.persistence.Id;
import java.util.Date;
@Entity
public class Employee {
@Id
private Long id;
private String firstName;
private String lastName;
private Date hireDate;
// Getters and Setters
}
Step 2: Create the Batch Configuration Class
Use @EnableBatchProcessing
to enable Spring Batch.
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.database.BeanPropertyItemSqlParameterSourceProvider;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.LineMapper;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import javax.sql.DataSource;
@Configuration
@EnableBatchProcessing
public class BatchConfig {
@Autowired
public JobBuilderFactory jobBuilderFactory;
@Autowired
public StepBuilderFactory stepBuilderFactory;
@Autowired
public DataSource dataSource;
// 1. ItemReader
@Bean
public FlatFileItemReader<Employee> reader() {
FlatFileItemReader<Employee> reader = new FlatFileItemReader<>();
reader.setResource(new ClassPathResource("employee.csv"));
reader.setLineMapper(lineMapper());
return reader;
}
@Bean
public LineMapper<Employee> lineMapper() {
DefaultLineMapper<Employee> lineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter(",");
lineTokenizer.setNames("id", "firstName", "lastName", "hireDate");
BeanWrapperFieldSetMapper<Employee> fieldSetMapper = new BeanWrapperFieldSetMapper<>();
fieldSetMapper.setTargetType(Employee.class);
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(fieldSetMapper);
return lineMapper;
}
// 2. ItemProcessor (Optional)
@Bean
public ItemProcessor<Employee, Employee> processor() {
return new EmployeeProcessor();
}
// 3. ItemWriter
@Bean
public JdbcBatchItemWriter<Employee> writer() {
JdbcBatchItemWriter<Employee> writer = new JdbcBatchItemWriter<>();
writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>());
writer.setSql("INSERT INTO employee (id, first_name, last_name, hire_date) VALUES (:id, :firstName, :lastName, :hireDate)");
writer.setDataSource(dataSource);
return writer;
}
// Job and Step
@Bean
public Job importEmployeeJob(JobCompletionNotificationListener listener, Step step1) {
return jobBuilderFactory.get("importEmployeeJob")
.listener(listener)
.flow(step1)
.end()
.build();
}
@Bean
public Step step1(ItemReader<Employee> reader, ItemWriter<Employee> writer, ItemProcessor<Employee, Employee> processor) {
return stepBuilderFactory.get("step1")
.<Employee, Employee>chunk(10) // Process 10 items at a time
.reader(reader)
.processor(processor)
.writer(writer)
.build();
}
}
Step 3: Create the Processor Class
The processor can be used to add business logic, like capitalizing names.
import org.springframework.batch.item.ItemProcessor;
public class EmployeeProcessor implements ItemProcessor<Employee, Employee> {
@Override
public Employee process(final Employee employee) {
final String firstName = employee.getFirstName().toUpperCase();
final String lastName = employee.getLastName().toUpperCase();
final Employee transformedEmployee = new Employee();
transformedEmployee.setId(employee.getId());
transformedEmployee.setFirstName(firstName);
transformedEmployee.setLastName(lastName);
transformedEmployee.setHireDate(employee.getHireDate());
return transformedEmployee;
}
}
Step 4: Create employee.csv
file
Create a src/main/resources/employee.csv
file with your data.
1,John,Doe,2023-01-15
2,Jane,Smith,2022-05-20
3,Peter,Jones,2024-03-10
That's correct. The core of Spring Batch's efficiency lies in its chunk-oriented processing, where ItemReader
, ItemProcessor
, and ItemWriter
interact in a specific, performance-optimized way.
How the Read-Process-Write Loop Works
- Read an Item: The
ItemReader
reads a single item from the data source (e.g., a single row from a database, one line from a CSV file). - Process the Item: The item is immediately passed to the
ItemProcessor
(if one is configured), where any business logic, transformation, or filtering is applied. - Accumulate Items: Instead of writing the item right away, the framework holds the processed item in memory.
- Repeat until Chunk Size is Reached: Steps 1-3 are repeated. The
ItemReader
continues reading, and theItemProcessor
continues processing, accumulating items in an internal list. - Write the Chunk: Once the number of processed items in memory reaches the configured chunk size, the entire list of items (the "chunk") is passed to the
ItemWriter
. TheItemWriter
then writes all items in that chunk to the destination in a single, batched operation.
This behavior is highly efficient because it minimizes the number of I/O operations. For example, instead of executing 100 separate INSERT
statements for 100 items, the JdbcBatchItemWriter
can perform a single, highly performant batch insert of all 100 items at once. This reduces the overhead associated with establishing connections and managing transactions for each individual item.
The transaction boundary in a chunk-oriented step is around the entire chunk. The transaction is committed only after the ItemReader
has finished reading, the ItemProcessor
has finished processing, and the ItemWriter
has successfully written all the items in the chunk. If any part of this process fails, the entire chunk's work is rolled back, providing restartability and data integrity.
Notes: Job consists of Multiple Steps and Step consists of Reader,Processor(Optional),Writer or Tasklet
Top comments (0)