Prerequisites: Basic knowledge on Java, spring batch, spring JPA.
You can find the code used in this blog over in this github repository.
Spring Batch is a framework in the Spring ecosystem that simplifies the development of batch processing applications. Batch processing involves handling a lot of data at once, often on a scheduled basis.
Here are the main points in simpler terms:
- Making Things Easier: It helps developers deal with the complexity of batch processing by providing ready-to-use tools and components.
- Keeping Things in Order: Manages transactions, ensuring that the data remains consistent and reliable, even if something goes wrong during processing.
- Handling Big Tasks: Can handle large amounts of data efficiently by allowing tasks to be done in parallel, breaking them into smaller pieces, and processing them one chunk at a time.
- Organized Work: Batch processing is divided into jobs, which are made up of steps. A job is like the overall task, and steps are the smaller parts of that task.
- Read, Do Something, Write: Follows a simple approach of reading data from a source, doing some work on it, and then writing it to a destination. This is how it processes the data.
- Dealing with Problems: Provides ways to deal with issues like retries for failed operations and skipping records that can’t be processed correctly.
- Custom Actions: Allows developers to customize and add their own code at various points in the batch processing, like before or after a step or job.
In practical terms, Spring Batch is commonly used when you have a lot of data to process regularly, such as when you’re moving data between systems, transforming it, or doing periodic tasks like generating reports. It makes the whole process more manageable and less error-prone.
The most challenging part in spring batch is how to write unit and integration tests. Essencially beceause of the configuration complexity.
Let’s take an example of a batch that will do the following :
Watch a folder for new files recieved
Read data from a flat file when added to the folder
Process the data
Save the data in a Database
For this example we will be using of course Spring Batch 5 , Spring Boot 3 , Java 17 , MapStruct for object mapping and Maven as a build tool.
Here is how our pom.xml will look like :
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.2.2</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>dev.sabri</groupId>
<artifactId>FolderMonitor</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>FolderMonitor</name>
<description>FolderMonitor</description>
<properties>
<java.version>17</java.version>
<map.struct.version>1.5.5.Final</map.struct.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.mapstruct</groupId>
<artifactId>mapstruct</artifactId>
<version>${map.struct.version}</version>
</dependency>
<dependency>
<groupId>org.mapstruct</groupId>
<artifactId>mapstruct-processor</artifactId>
<version>${map.struct.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.batch</groupId>
<artifactId>spring-batch-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<configuration>
<excludes>
<exclude>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</build>
</project
Now let’s create our entity object :
@Entity
@Data
@NoArgsConstructor
@AllArgsConstructor
@ToString
public class Visitors {
@Id
private Long id;
private String firstName;
private String lastName;
private String emailAddress;
private String phoneNumber;
private String address;
private String visitDate;
}
We will need a Dto object that will be mapped later to our entity. We will use a record :
@Builder
public record VisitorsDto(
Long id,
String firstName,
String lastName,
String emailAddress,
String phoneNumber,
String address,
String visitDate) {
}
The MapStruct interface will look like this :
@Mapper(componentModel = "spring")
public interface VisitorsMapper {
Visitors toVisitors(VisitorsDto visitorsDto);
}
Next Step will be to create a JpaRepository to handle database access , it’s a simple JpaRepository with no custom queries :
public interface VisitorsRepository extends JpaRepository<Visitors, Long> {
}
A standard Spring batch Job is the combination of one or multiple steps.
Each step itself can contain a tasklet or an itemReader , itemProcessor and an itemWriter.
Let’s start by defining our itemReader , the main objective here is to read the data from our flat file :
@Bean
@StepScope
public FlatFileItemReader<VisitorsDto> flatFileItemReader(@Value("#{jobParameters['inputFile']}") final String visitorsFile) throws IOException {
val flatFileItemReader = new FlatFileItemReader<VisitorsDto>();
flatFileItemReader.setName("VISITORS_READER");
flatFileItemReader.setLinesToSkip(1);
flatFileItemReader.setLineMapper(linMapper());
flatFileItemReader.setStrict(false);
flatFileItemReader.setResource(new FileSystemResource(visitorsFile));
return flatFileItemReader;
}
An important part in the process of reading data if mapping the data and the fields from our file to our java object.
In this case we will use a custom FieldSetMapper :
public class VisitorsFieldSetMapper implements FieldSetMapper<VisitorsDto> {
@Override
public VisitorsDto mapFieldSet(final FieldSet fieldSet) throws BindException {
return VisitorsDto.builder()
.id(fieldSet.readLong("id"))
.firstName(fieldSet.readString("firstName"))
.lastName(fieldSet.readString("lastName"))
.emailAddress(fieldSet.readString("emailAddress"))
.phoneNumber(fieldSet.readString("phoneNumber"))
.address(fieldSet.readString("address"))
.visitDate(fieldSet.readString("visitDate"))
.build();
}
}
After we read the data from the file , it will need to be processed depending on our needs. In my use case the processor will map a VisitorsDto object to our entity Visitors :
@Component
public class VisitorsItemProcessor implements ItemProcessor<VisitorsDto, Visitors> {
private final VisitorsMapper visitorsMapper;
public VisitorsItemProcessor(VisitorsMapper visitorsMapper) {
this.visitorsMapper = visitorsMapper;
}
@Override
public Visitors process(final VisitorsDto visitorsDto) throws Exception {
return visitorsMapper.toVisitors(visitorsDto);
}
}
Once our object is mapped , now it’s time to insert data in our H2 Database using the itemWriter as follows :
@Component
public class VisitorsItemWriter implements ItemWriter<Visitors> {
private final VisitorsRepository visitorsRepository;
public VisitorsItemWriter(VisitorsRepository visitorsRepository) {
this.visitorsRepository = visitorsRepository;
}
@Override
public void write(Chunk<? extends Visitors> chunk) throws Exception {
visitorsRepository.saveAll(chunk.getItems());
}
}
Here is how our full job configuration will look like :
@Configuration
public class VisitorsBatchConfig {
@Bean
public Job importVistorsJob(final JobRepository jobRepository, final PlatformTransactionManager transactionManager, final VisitorsRepository visitorsRepository, final VisitorsMapper visitorsMapper) throws IOException {
return new JobBuilder("importVisitorsJob", jobRepository)
.start(importVisitorsStep(jobRepository, transactionManager, visitorsRepository, visitorsMapper))
.build();
}
@Bean
public Step importVisitorsStep(final JobRepository jobRepository, final PlatformTransactionManager transactionManager, final VisitorsRepository visitorsRepository, final VisitorsMapper visitorsMapper) throws IOException {
return new StepBuilder("importVisitorsStep", jobRepository)
.<VisitorsDto, Visitors>chunk(100, transactionManager)
.reader(flatFileItemReader(null))
.processor(itemProcessor(visitorsMapper))
.writer(visitorsItemWriter(visitorsRepository))
.build();
}
@Bean
public ItemProcessor<VisitorsDto, Visitors> itemProcessor(final VisitorsMapper visitorsMapper) {
return new VisitorsItemProcessor(visitorsMapper);
}
@Bean
public VisitorsItemWriter visitorsItemWriter(final VisitorsRepository visitorsRepository) {
return new VisitorsItemWriter(visitorsRepository);
}
@Bean
@StepScope
public FlatFileItemReader<VisitorsDto> flatFileItemReader(@Value("#{jobParameters['inputFile']}") final String visitorsFile) throws IOException {
val flatFileItemReader = new FlatFileItemReader<VisitorsDto>();
flatFileItemReader.setName("VISITORS_READER");
flatFileItemReader.setLinesToSkip(1);
flatFileItemReader.setLineMapper(linMapper());
flatFileItemReader.setStrict(false);
flatFileItemReader.setResource(new FileSystemResource(visitorsFile));
return flatFileItemReader;
}
@Bean
public LineMapper<VisitorsDto> linMapper() {
val defaultLineMapper = new DefaultLineMapper<VisitorsDto>();
val lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter(",");
lineTokenizer.setNames("id", "firstName", "lastName", "emailAddress", "phoneNumber", "address", "visitDate");
lineTokenizer.setStrict(false); // Set strict property to false
defaultLineMapper.setLineTokenizer(lineTokenizer);
defaultLineMapper.setFieldSetMapper(new VisitorsFieldSetMapper());
return defaultLineMapper;
}
}
To wrap up our batch , we will need to setup a service watcher to look for any new files coming into our folder.
This step was configured in our main Spring Boot project :
@SpringBootApplication
@EnableBatchProcessing
@EnableScheduling
@Slf4j
public class FolderMonitorApplication {
private final JobLauncher jobLauncher;
private final Job job;
public FolderMonitorApplication(JobLauncher jobLauncher, Job job) {
this.jobLauncher = jobLauncher;
this.job = job;
}
public static void main(String[] args) {
new SpringApplicationBuilder(FolderMonitorApplication.class)
.web(WebApplicationType.SERVLET)
.run(args)
.registerShutdownHook();
}
public void run(final String inputFile) throws Exception {
val jobParameters = new JobParametersBuilder()
.addDate("timestamp", Calendar.getInstance().getTime())
.addString("inputFile", inputFile)
.toJobParameters();
val jobExecution = jobLauncher.run(job, jobParameters);
while (jobExecution.isRunning()) {
log.info("..................");
}
}
@Scheduled(fixedRate = 2000)
public void runJob() {
val path = Paths.get("/home/sabri/Work");
WatchKey key;
WatchService watchService = null;
try {
watchService = FileSystems.getDefault().newWatchService();
path.register(watchService, StandardWatchEventKinds.ENTRY_CREATE);
while ((key = watchService.take()) != null) {
for (WatchEvent<?> event : key.pollEvents()) {
log.info(
"Event kind:" + event.kind()
+ ". File affected: " + event.context() + ".");
if (event.kind().name().equals("ENTRY_CREATE")) {
run(path + "/" + event.context().toString());
}
}
key.reset();
}
} catch (Exception e) {
log.error(e.getMessage(), e);
}
}
}
Now that our batch is ready let’s start it and put a file inside our folder
/home/sabri/Work , here is how the logs look like :
2024-02-05T22:58:34.100+01:00 INFO 6931 --- [ scheduling-1] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=importVisitorsJob]] completed with the following parameters: [{'inputFile':'{value=/home/sabri/Work/visitors.csv, type=class java.lang.String, identifying=true}','timestamp':'{value=Mon Feb 05 22:58:34 CET 2024, type=class java.util.Date, identifying=true}'}] and the following status: [COMPLETED] in 75ms
2024-02-05T22:58:53.023+01:00 INFO 6931 --- [ scheduling-1] d.s.f.FolderMonitorApplication : Event kind:ENTRY_CREATE. File affected: .~lock.visitors.csv#.
2024-02-05T22:58:53.029+01:00 INFO 6931 --- [ scheduling-1] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=importVisitorsJob]] launched with the following parameters: [{'inputFile':'{value=/home/sabri/Work/.~lock.visitors.csv#, type=class java.lang.String, identifying=true}','timestamp':'{value=Mon Feb 05 22:58:53 CET 2024, type=class java.util.Date, identifying=true}'}]
2024-02-05T22:58:53.031+01:00 INFO 6931 --- [ scheduling-1] o.s.batch.core.job.SimpleStepHandler : Executing step: [importVisitorsStep]
2024-02-05T22:58:53.034+01:00 INFO 6931 --- [ scheduling-1] o.s.batch.core.step.AbstractStep : Step: [importVisitorsStep] executed in 2ms
2024-02-05T22:58:53.035+01:00 INFO 6931 --- [ scheduling-1] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=importVisitorsJob]] completed with the following parameters: [{'inputFile':'{value=/home/sabri/Work/.~lock.visitors.csv#, type=class java.lang.String, identifying=true}','timestamp':'{value=Mon Feb 05 22:58:53 CET 2024, type=class java.util.Date, identifying=true}'}] and the following status: [COMPLETED] in 5ms
2024-02-05T23:00:32.457+01:00 INFO 6931 --- [ scheduling-1] d.s.f.FolderMonitorApplication : Event kind:ENTRY_CREATE. File affected: Visitors2.csv.
2024-02-05T23:00:32.464+01:00 INFO 6931 --- [ scheduling-1] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=importVisitorsJob]] launched with the following parameters: [{'inputFile':'{value=/home/sabri/Work/Visitors2.csv, type=class java.lang.String, identifying=true}','timestamp':'{value=Mon Feb 05 23:00:32 CET 2024, type=class java.util.Date, identifying=true}'}]
2024-02-05T23:00:32.468+01:00 INFO 6931 --- [ scheduling-1] o.s.batch.core.job.SimpleStepHandler : Executing step: [importVisitorsStep]
VisitorsDto[id=1, firstName=John, lastName=Doe, emailAddress=john.doe@example.com, phoneNumber=(555) 123-4567, address=123 Main St, visitDate=2024-02-05]
VisitorsDto[id=2, firstName=Jane, lastName=Smith, emailAddress=jane.smith@example.com, phoneNumber=(555) 987-6543, address=456 Oak St, visitDate=2024-02-06]
VisitorsDto[id=3, firstName=Michael, lastName=Johnson, emailAddress=michael.j@example.com, phoneNumber=(555) 555-5555, address=789 Pine St, visitDate=2024-02-07]
VisitorsDto[id=4, firstName=Alice, lastName=Williams, emailAddress=alice.w@example.com, phoneNumber=(555) 456-7890, address=101 Elm St, visitDate=2024-02-08]
VisitorsDto[id=5, firstName=David, lastName=Miller, emailAddress=david.m@example.com, phoneNumber=(555) 111-2222, address=202 Birch St, visitDate=2024-02-09]
Now that our job is running fine , the first part is done , let’s move on and write a test for this batch.
As a said before the most compilcated part is the configuration , this is how it will look like :
@Configuration
@EnableBatchProcessing
@EnableJpaRepositories(basePackages = {"dev.sabri.foldermonitor.repositories"})
@EntityScan(basePackages = {"dev.sabri.foldermonitor.domain"})
@ComponentScan(basePackages = {"dev.sabri.foldermonitor.mapper"})
public class VisitorsBatchTestConfig {
@Bean
@Primary
public DataSource dataSource() {
return new EmbeddedDatabaseBuilder()
.setType(EmbeddedDatabaseType.H2)
.addScript("/org/springframework/batch/core/schema-h2.sql")
.build();
}
@Bean
public PlatformTransactionManager transactionManager() {
return new DataSourceTransactionManager(dataSource());
}
@Bean("entityManagerFactory")
public LocalContainerEntityManagerFactoryBean entityManagerFactoryBean() {
val localContainerEntityManagerFactoryBean = new LocalContainerEntityManagerFactoryBean();
localContainerEntityManagerFactoryBean.setDataSource(dataSource());
localContainerEntityManagerFactoryBean.setPackagesToScan("dev.sabri.foldermonitor.domain");
localContainerEntityManagerFactoryBean.setPersistenceUnitName("visitors");
val hibernateJpaVendorAdapter = new HibernateJpaVendorAdapter();
localContainerEntityManagerFactoryBean.setJpaVendorAdapter(hibernateJpaVendorAdapter);
return localContainerEntityManagerFactoryBean;
}
@Bean
public JobRepository jobRepository() throws Exception {
val jobrepositoryFactoryBean = new JobRepositoryFactoryBean();
jobrepositoryFactoryBean.setDataSource(dataSource());
jobrepositoryFactoryBean.setTransactionManager(transactionManager());
jobrepositoryFactoryBean.afterPropertiesSet();
return jobrepositoryFactoryBean.getObject();
}
}
@EnableJparepositories is used to tell my batch where to find all my repositories
@EntityScan is used to identify my entities
@ComponentScan is used mainly here for my mapper so that the bean will be detected by my test on runtime
Now let’s write a simple test class :
@SpringBatchTest
@SpringJUnitConfig(classes = {VisitorsBatchConfig.class, VisitorsBatchTestConfig.class})
class VisitorsBatchIntegrationTests {
public static final String INPUt_FILE = "visitors.csv";
@Autowired
private JobLauncherTestUtils jobLauncherTestUtils;
@Autowired
private JobRepositoryTestUtils jobRepositoryTestUtils;
@AfterEach
public void cleanUp() {
jobRepositoryTestUtils.removeJobExecutions();
}
private JobParameters defaultJobParameters() {
val paramsBuilder = new JobParametersBuilder();
paramsBuilder.addString("inputFile", INPUt_FILE);
paramsBuilder.addDate("timestamp", Calendar.getInstance().getTime());
return paramsBuilder.toJobParameters();
}
@Test
void givenVisitorsFlatFile_whenJobExecuted_thenSuccess(@Autowired Job job) throws Exception {
// given
this.jobLauncherTestUtils.setJob(job);
// when
val jobExecution = jobLauncherTestUtils.launchJob(defaultJobParameters());
val actualJobInstance = jobExecution.getJobInstance();
val actualJobExitStatus = jobExecution.getExitStatus();
// then
assertThat(actualJobInstance.getJobName()).isEqualTo("importVisitorsJob");
assertThat(actualJobExitStatus.getExitCode()).isEqualTo(ExitStatus.COMPLETED.getExitCode());
}
}
As you will see in the next screen , the test is running fine.
Test Results
Here is a simple of the data used in this spring batch sample:
visitor_id,first_name,last_name,email_address,phone_number,address,visit_date
1,John,Doe,john.doe@example.com,(555) 123-4567,123 Main St,2024-02-05
2,Jane,Smith,jane.smith@example.com,(555) 987-6543,456 Oak St,2024-02-06
3,Michael,Johnson,michael.j@example.com,(555) 555-5555,789 Pine St,2024-02-07
4,Alice,Williams,alice.w@example.com,(555) 456-7890,101 Elm St,2024-02-08
5,David,Miller,david.m@example.com,(555) 111-2222,202 Birch St,2024-02-09
6,Susan,Jones,susan.j@example.com,(555) 333-4444,303 Cedar St,2024-02-10
7,Robert,Smith,robert.s@example.com,(555) 555-1234,505 Maple St,2024-02-11
8,Emily,Davis,emily.d@example.com,(555) 876-5432,707 Oak St,2024-02-12
9,Christopher,Clark,chris.c@example.com,(555) 234-5678,909 Pine St,2024-02-13
10,Emma,Johnson,emma.j@example.com,(555) 432-1098,111 Walnut St,2024-02-14
11,William,Martin,william.m@example.com,(555) 567-8901,222 Birch St,2024-02-15
12,Olivia,Anderson,olivia.a@example.com,(555) 789-0123,333 Elm St,2024-02-16
13,Mason,White,mason.w@example.com,(555) 321-0987,444 Oak St,2024-02-17
14,Sophia,Taylor,sophia.t@example.com,(555) 654-3210,555 Maple St,2024-02-18
15,Aiden,Brown,aiden.b@example.com,(555) 876-5432,666 Pine St,2024-02-19
And Finally , that’s a job done.
We have a full operational spring batch with tests that are configured and running fine.
Take your time to go through my article . Your feedback and comments are valuable to me, as they contribute to the enhancement of my work, and I truly value each one of them.
Top comments (0)