İbrahim Gündüz

Posted on Oct 6 • Originally published at ibrahimgunduz34.Medium

Batch Processing with Spring Batch and Multiple Data Sources

#database #backend #java #spring

Overview

In most business domains, batch processing is a crucial component widely adopted across the industry. In this context, Spring Batch provides a lightweight yet powerful solution for handling large datasets, regardless of the workflow’s complexity. In this article, we’ll explore Spring Batch through a practical example. While this is not an exhaustive course on the framework, I’m confident you’ll discover several important insights that are often overlooked in typical tutorials.

Let’s get started!

Example Use Case

E-Commerce: Supplier Data Synchronization

In the e-commerce domain, companies may offer products from various suppliers, each of whom might provide data in different formats, such as XML or CSV. In this example, we’ll demonstrate a simple implementation using Spring Batch to illustrate this use case. However, keep in mind that Spring Batch offers much more than what we’ll cover here, so be sure to check out the official documentation for a deeper understanding.

Sample Data

Although GitHub Gists displays the content as a nicely formatted table, it’s a CSV file containing five products with their attributes.

Project Structure

Although we implement our batch processor using core Spring components, the same solution would be significantly easier to build with Spring Batch on Spring Boot. Spring Boot simplifies the setup by automatically configuring most of the required infrastructure — such as data sources, transaction managers, and core Spring Batch components. However, this article focuses on demonstrating the fundamental concepts using plain Spring, providing deeper insight into how things work under the hood.

Let’s begin by defining the project structure:

mockserver/
src/
└── org/
    └── example/
        └── productbatch/
            ├── config/
            ├── model/
            ├── processor/
            ├── reader/
            ├── runner/
            ├── tasklet/
            └── writer/

To keep the project organized, each package in the structure is designed to fulfill a distinct responsibility within the batch processing workflow.

Let’s now walk through the project structure and see what each part does.

mockserver : This is a simple mock server that serves a CSV file for testing purposes. It is configured to run on port 3000 and returns the CSV content when accessed via http://localhost:3000/catalogs/download/
config: This package holds the configuration classes that set up components used throughout the application, such as Hibernate and Spring Batch.
model: This package contains the data model classes used in the application. It includes AcmeProduct, which maps the structure of the CSV file, and Product, the domain model used for persistence and business logic.
processor: This package contains the ItemProcessor implementation responsible for processing the data read from the CSV file. It handles necessary transformations and validations before the data is passed to the writer.
reader: This package contains the ItemReader implementation responsible for reading CSV data and mapping it to the application’s internal data model, based on the provider’s structure.
runner: This package contains the runner classes responsible for executing the batch processes. These classes serve as entry points and are triggered by the Spring Scheduler.
tasklet: This package contains Tasklet implementations that handle single-step tasks such as downloading files, cleaning folders, and similar operations.
writer: This package contains the ItemWriter implementation responsible for persisting data that has been processed by the ItemProcessor.

Dependencies

Next, let’s define our project dependencies and required plugins by creating a pom.xml file as shown below.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>spring-batch</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.release>17</maven.compiler.release>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework</groupId>
                <artifactId>spring-framework-bom</artifactId>
                <version>6.2.7</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-context</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-jdbc</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework</groupId>
            <artifactId>spring-tx</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-core</artifactId>
            <version>5.2.2</version>
        </dependency>
        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>42.7.6</version>
        </dependency>
        <dependency>
            <groupId>com.h2database</groupId>
            <artifactId>h2</artifactId>
            <version>2.3.232</version>
        </dependency>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.5.18</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.14.0</version>
                <configuration>
                    <encoding>${project.build.sourceEncoding}</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

Configuration

Although we are not using the spring-boot-starter-batch variant of Spring Batch, the spring-batch-core module still provides a default configuration through the DefaultBatchConfiguration class. This helps manage the essential bean definitions for core components such as JobRepository and JobLauncher.

However, we still need to define beans like DataSource and TransactionManager so that Spring Batch can persist job and step metadata in a database.

Since our goal is to synchronize the downloaded CSV data into a PostgreSQL database, we need one set of DataSource and TransactionManager beans for persisting the business data, and another for Spring Batch to use internally. Fortunately, this is a well-supported use case. The DefaultBatchConfiguration class looks for beans named specifically dataSource and transactionManager in the application context.

Therefore, the beans intended for Spring Batch must be explicitly named as shown below.

package org.example.productbatch.config;

import org.springframework.batch.core.configuration.support.DefaultBatchConfiguration;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.datasource.DataSourceTransactionManager;
import org.springframework.jdbc.datasource.embedded.EmbeddedDatabaseBuilder;
import org.springframework.jdbc.datasource.embedded.EmbeddedDatabaseType;
import org.springframework.transaction.PlatformTransactionManager;

import javax.sql.DataSource;

@Configuration
public class SpringBatchConfig extends DefaultBatchConfiguration {
    @Bean("dataSource")
    public DataSource batchDataSource() {
        return new EmbeddedDatabaseBuilder()
                .setType(EmbeddedDatabaseType.H2)
                .addScript("classpath:org/springframework/batch/core/schema-drop-h2.sql")
                .addScript("classpath:org/springframework/batch/core/schema-h2.sql")
                .build();
    }

    @Bean("transactionManager")
    public PlatformTransactionManager batchTransactionManager(@Qualifier("dataSource") DataSource dataSource) {
        return new DataSourceTransactionManager(dataSource);
    }
}

For better clarity, you can check how DefaultBatchConfiguration accesses PlatformTransactionManager and DataSource classes from the application context from the source code below:

As the next step, we’ll create a configuration class to set up a PostgreSQL connection using JDBC, allowing Spring Batch to access it. As previously explained, we should avoid using the bean names transactionManager and dataSource for this configuration, since these names are reserved for the beans used internally by Spring Batch.

package org.example.productbatch.config;

import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;
import org.springframework.jdbc.datasource.DataSourceTransactionManager;
import org.springframework.jdbc.datasource.DriverManagerDataSource;
import org.springframework.transaction.PlatformTransactionManager;

import javax.sql.DataSource;

@Configuration
public class JdbcConfig {
    private final Environment environment;

    public JdbcConfig(Environment environment) {
        this.environment = environment;
    }

    @Bean
    public DataSource domainDataSource() {
        DriverManagerDataSource dataSource = new DriverManagerDataSource();
        dataSource.setDriverClassName(environment.getProperty("spring.datasource.driver-class-name"));
        dataSource.setUrl(environment.getProperty("spring.datasource.url"));
        dataSource.setUsername(environment.getProperty("spring.datasource.username"));
        dataSource.setPassword(environment.getProperty("spring.datasource.password"));
        return dataSource;
    }


    @Bean
    public PlatformTransactionManager domainTransactionManager(@Qualifier("domainDataSource") DataSource dataSource) {
        return new DataSourceTransactionManager(dataSource);
    }
}

And next, create a configuration file named application.properties under the src/resource folder like the one below:

# PostgreSQL
spring.datasource.url=jdbc:postgresql://localhost:5432/catalog_db
spring.datasource.username=postgres
spring.datasource.password=postgres
spring.datasource.driver-class-name=org.postgresql.Driver

# Acme Provider Config
acme.catalog.url=http://localhost:3000/catalogs/download
acme.catalog.cron=*/30 * * * * *

project.download_path=downloads

Although we’ve already used some of these properties in the JdbcConfig class, we’ll explore where the remaining ones are used in the upcoming sections.

To serve as the entry point for our application, we’ll now define a main class, as shown below:

package org.example.productbatch;

import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.PropertySource;
import org.springframework.scheduling.annotation.EnableScheduling;

@ComponentScan("org.example.productbatch")
@PropertySource("classpath:application.properties")
@EnableScheduling
public class Application {
    public static void main(String[] args) {
        new AnnotationConfigApplicationContext(Application.class);
    }
}

To enable loading external configuration values such as database credentials or the URL of the product feed provider, we use the @PropertySource annotation to load the application.properties file into the application context.

We also add the @EnableScheduling annotation to keep the application process running and to enable support for scheduled tasks. This will allow us to trigger our batch processing jobs at predefined intervals.

Since this application is intended for demonstration purposes, we won’t define a custom task scheduler bean. As a result, Spring will fall back to its default implementation, which uses a single-threaded executor. This is sufficient for our needs in this context.

Building A New Job

So far, we’ve built the foundational structure of the application and set up all necessary configurations. In this section, we’ll begin by creating the essential components required for our batch job.

Since we need to download and process a CSV file from a remote server — in our case, a mock server — the job we’re about to create will consist of two steps.

FileDownloaderTasklet Component

In Spring Batch terminology, a unit of work that performs a single task is called a Tasklet. To handle the file download from the remote server, we’ll create a Tasklet implementation named FileDownloaderTasklet.

package org.example.productbatch.tasklet;

import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

@Component
@StepScope
public class FileDownloaderTasklet implements Tasklet {
    private final String targetFilename;
    private final String downloadUrl;
    private final HttpClient httpClient = HttpClient.newHttpClient();
    private final String downloadPath;

    public FileDownloaderTasklet(@Value("${acme.catalog.url}") String downloadUrl,
                                 @Value("${project.download_path}") String downloadPath,
                                 @Value("#{jobParameters['TARGET_FILENAME']}") String targetFilename) {
        this.downloadUrl = downloadUrl;
        this.downloadPath = downloadPath;
        this.targetFilename = targetFilename;
    }

    @Override
    public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) throws Exception {
        Path path = Paths.get(downloadPath, targetFilename).toAbsolutePath();
        Files.createDirectories(path.getParent());

        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(downloadUrl))
                .header("Accept", "text/csv")
                .build();

        HttpResponse<String> response = httpClient.send(request, HttpResponse.BodyHandlers.ofString());

        if (response.statusCode() != 200) {
            throw new IllegalStateException("Failed to download file");
        }
        Files.writeString(path, response.body());

        return RepeatStatus.FINISHED;

    }
}

The FileDownloaderTasklet class contains a few noteworthy aspects worth clarifying before moving on. As seen in the code above, it performs a simple operation — calling a remote endpoint and saving the response to a local file. Since this tasklet is not specific to any particular data provider, its behavior is fully parameterized: the download folder, the endpoint URL, and the target filename are all passed as parameters.

While values from application.properties (such as the download folder and endpoint URL) can be injected using the @Value(“${…}”) expression, the target filename is a job parameter that will be provided at runtime when the job is launched. To inject it using @Value(“#{jobParameters[‘TARGET_FILENAME’]}”), we need to defer the creation of this bean until the job is actually running. This is achieved by annotating the tasklet with @StepScope, which ensures the bean is initialized within the scope of the step execution, where job parameters are accessible.

Creating Models:

During processing, Spring Batch allows mapping the parsed data to a desired target type. Since each product supplier may use a different data structure, we define a separate product class to represent each supplier’s format. In addition, we also maintain a domain-specific product model that aligns with our internal business requirements.

Supplier product model:

package org.example.productbatch.model;

import java.math.BigDecimal;

public class AcmeProduct {
    private String index;
    private String name;
    private String description;
    private String brand;
    private String category;
    private BigDecimal price;
    private String currency;
    private Integer stock;
    private String ean;
    private String color;
    private String size;
    private String availability;
    private String internalId;

    public AcmeProduct() {}

    public AcmeProduct(String index, String name, String description, String brand, String category, BigDecimal price, String currency, Integer stock, String ean, String color, String size, String availability, String internalId) {
        this.index = index;
        this.name = name;
        this.description = description;
        this.brand = brand;
        this.category = category;
        this.price = price;
        this.currency = currency;
        this.stock = stock;
        this.ean = ean;
        this.color = color;
        this.size = size;
        this.availability = availability;
        this.internalId = internalId;
    }

    public String getIndex() {
        return index;
    }

    public String getName() {
        return name;
    }

    public String getDescription() {
        return description;
    }

    public String getBrand() {
        return brand;
    }

    public String getCategory() {
        return category;
    }

    public BigDecimal getPrice() {
        return price;
    }

    public String getCurrency() {
        return currency;
    }

    public Integer getStock() {
        return stock;
    }

    public String getEan() {
        return ean;
    }

    public String getColor() {
        return color;
    }

    public String getSize() {
        return size;
    }

    public String getAvailability() {
        return availability;
    }

    public String getInternalId() {
        return internalId;
    }

    public void setIndex(String index) {
        this.index = index;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setDescription(String description) {
        this.description = description;
    }

    public void setBrand(String brand) {
        this.brand = brand;
    }

    public void setCategory(String category) {
        this.category = category;
    }

    public void setPrice(BigDecimal price) {
        this.price = price;
    }

    public void setCurrency(String currency) {
        this.currency = currency;
    }

    public void setStock(Integer stock) {
        this.stock = stock;
    }

    public void setEan(String ean) {
        this.ean = ean;
    }

    public void setColor(String color) {
        this.color = color;
    }

    public void setSize(String size) {
        this.size = size;
    }

    public void setAvailability(String availability) {
        this.availability = availability;
    }

    public void setInternalId(String internalId) {
        this.internalId = internalId;
    }
}

Domain Product Model:

package org.example.productbatch.model;

import java.math.BigDecimal;

public class Product {
    private String name;

    private String description;

    private BigDecimal price;

    private String currency;

    public Product() {
    }

    public Product(String name, String description, BigDecimal price, String currency) {
        this.name = name;
        this.description = description;
        this.price = price;
        this.currency = currency;
    }

    public String getName() {
        return name;
    }

    public String getDescription() {
        return description;
    }

    public BigDecimal getPrice() {
        return price;
    }

    public String getCurrency() {
        return currency;
    }
}

Csv File Reader Component

Spring Batch provides a built-in component called FlatFileItemReader, which enables reading and parsing structured text files. It can be easily configured using the FlatFileItemReaderBuilder to support various delimiters and column mappings. Additionally, it allows mapping each row to a target POJO using different mapping strategies. In this example, we’ll use the BeanWrapperFieldSetMapper, which is automatically configured by the FlatFileItemReaderBuilder.

package org.example.productbatch.reader;

import org.example.productbatch.model.AcmeProduct;
import org.springframework.batch.core.configuration.annotation.StepScope;
import org.springframework.batch.item.ExecutionContext;
import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.ItemStream;
import org.springframework.batch.item.ItemStreamException;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.core.io.FileSystemResource;
import org.springframework.stereotype.Component;

import java.nio.file.Paths;

@Component
@StepScope
public class AcmeProductsCsvFileReader implements ItemReader<AcmeProduct>, ItemStream {
    private final FlatFileItemReader<AcmeProduct> delegate;

    @Override
    public void open(ExecutionContext executionContext) throws ItemStreamException {
        this.delegate.open(executionContext);
    }

    public AcmeProductsCsvFileReader(@Value("#{jobParameters['TARGET_FILENAME']}") String targetFilename,
                                     @Value("${project.download_path}") String downloadPath) {
        String fileName = Paths.get(downloadPath, targetFilename).toAbsolutePath().toString();
        this.delegate = createDelegate(fileName);
    }

    @Override
    public AcmeProduct read() throws Exception {
        return delegate.read();
    }

    private FlatFileItemReader<AcmeProduct> createDelegate(String fileName) {
        return new FlatFileItemReaderBuilder<AcmeProduct>()
                .name("acmeProductReader")
                .resource(new FileSystemResource(fileName))
                .encoding("UTF-8")
                .linesToSkip(1)
                .delimited()
                .delimiter(",")
                .quoteCharacter('"')
                .names("index", "name", "description", "brand", "category",
                        "price", "currency", "stock", "ean", "color", "size",
                        "availability", "internalId")
                .targetType(AcmeProduct.class)
                .strict(false)
                .build();
    }
}

Keep in mind that when using BeanWrapperFieldSetMapper as your mapping strategy, the target class must have public setter methods for all mapped fields. This is because the mapper uses Java reflection to call setters based on the column names in the file.

If these setters are missing, Spring Batch will fail to map the fields and throw a NotWritablePropertyException or a similar binding error during parsing.

If you prefer to use immutable objects without setters, you can create a custom implementation of FieldSetMapper that instantiates the target object using a constructor. However, this requires manually reading each field from the FieldSet and passing them to the constructor, which involves additional effort.

Processor Component

In this section, we’ll implement an ItemProcessor that filters and processes only the products from the supplier data that are marked as ‘in_stock’.

package org.example.productbatch.processor;

import org.example.productbatch.model.AcmeProduct;
import org.example.productbatch.model.Product;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.stereotype.Component;

@Component
public class AcmeProductProcessor implements ItemProcessor<AcmeProduct, Product> {
    @Override
    public Product process(AcmeProduct item) throws Exception {
        if (!item.getAvailability().equals("in_stock")) {
            return null; // Skip products that are not in stock
        }

        return new Product(
                item.getName(),
                item.getDescription(),
                item.getPrice(),
                item.getCurrency()
        );
    }
}

As demonstrated above, returning null from the ItemProcessor tells Spring Batch to skip processing the current item, so it won’t be handed off to the ItemWriter.

Writer Component

To persist the processed data into the PostgreSQL database in batches, we use Spring Batch’s built-in JdbcBatchItemWriter. In our custom ItemWriter implementation, we configure the writer using its builder and delegate the write operation to it. The writer uses a parameterized SQL query to insert data, allowing the chunked data produced by the AcmeProductProcessor to be written efficiently.

package org.example.productbatch.writer;

import org.example.productbatch.model.Product;
import org.springframework.batch.item.Chunk;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.database.JdbcBatchItemWriter;
import org.springframework.batch.item.database.builder.JdbcBatchItemWriterBuilder;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.stereotype.Component;

import javax.sql.DataSource;

@Component
public class AcmeProductJpaWriter implements ItemWriter<Product> {
    private final JdbcBatchItemWriter<Product> delegate;

    public AcmeProductJpaWriter(@Qualifier("domainDataSource") DataSource dataSource) {
        this.delegate = new JdbcBatchItemWriterBuilder<Product>()
                .dataSource(dataSource)
                .sql("""
                INSERT INTO catalog_products (name, description, price, currency)
                VALUES (:name, :description, :price, :currency)
                """)
                .beanMapped()
                .build();

        try {
            this.delegate.afterPropertiesSet();
        } catch (Exception e) {
            throw new RuntimeException("JdbcBatchItemWriter init failed", e);
        }
    }

    @Override
    public void write(Chunk<? extends Product> chunk) throws Exception {
        delegate.write(chunk);
    }
}

As shown in the code above, we use the @Qualifier annotation to explicitly indicate that the domain DataSource should be used, as there are multiple DataSource beans defined in the application context.

Job Configuration

The job configuration is where we define the overall flow to download and process the supplier data. We’ll create each component of the configuration step by step.

Since the first task is to download the file from the supplier, we start by defining a step that uses the FileDownloaderTasklet component. This tasklet retrieves the file from the specified location using the job parameters provided at runtime.:

@Bean(name = "acmeProductsDownloadStep")
public Step acmeProductsStep(FileDownloaderTasklet fileDownloaderTasklet) {
    return new StepBuilder("acmeProductsDownloadStep", jobRepository)
            .tasklet(fileDownloaderTasklet, transactionManager)
            .build();
}

Next, we define the step responsible for reading, processing, and persisting the data. This step uses our custom ItemReader, ItemProcessor, and ItemWriter implementations to handle each phase of the batch workflow.

@Bean(name = "acmeProductsProcessStep")
public Step acmeProductsProcessStep(AcmeProductsCsvFileReader csvReader,
                                    AcmeProductProcessor processor,
                                    AcmeProductJpaWriter jpaWriter) {
    return new StepBuilder("acmeProductsProcessStep", jobRepository)
            .<AcmeProduct, Product>chunk(10, transactionManager)
            .reader(csvReader)
            .processor(processor)
            .writer(jpaWriter)
            .build();
}

Finally, we define the Job component, which is responsible for executing these steps in the specified order.

@Bean(name = "acmeProductsJob")
public Job acmeProductsJob(@Qualifier("acmeProductsDownloadStep") Step acmeProductsStep,
                           @Qualifier("acmeProductsProcessStep") Step acmeProductsProcessStep) {
    return new JobBuilder("acmeProducts", jobRepository)
            .incrementer(new RunIdIncrementer())
            .start(acmeProductsStep)
            .next(acmeProductsProcessStep)
            .build();
}

As a quick recap;

In Spring Batch, a Job represents the top-level container for a batch process. It encapsulates the complete batch workflow, which is composed of one or more sequential Step components. Each Step performs a distinct task, such as reading data, processing it, and writing the output.

In the configuration shown above, a job named “acmeProducts” is defined using a JobBuilder. It is composed of two steps:

acmeProductsDownloadStep: Downloads the file from the supplier.

acmeProductsProcessStep: Reads, processes, and persists the data into the database.

The job is configured with a RunIdIncrementer, which automatically generates a unique jobParameters value for each run. This is essential when rerunning the same job multiple times, as Spring Batch requires unique parameters to treat each execution as a new instance.

Finally, the start() and next() methods are used to define the execution sequence of the steps, ensuring they run in the intended order.

Now that we’ve explained each component of the configuration, here is the full class that ties everything together.

package org.example.productbatch.config;

import org.example.productbatch.model.AcmeProduct;
import org.example.productbatch.model.Product;
import org.example.productbatch.processor.AcmeProductProcessor;
import org.example.productbatch.reader.AcmeProductsCsvFileReader;
import org.example.productbatch.tasklet.FileDownloaderTasklet;
import org.example.productbatch.writer.AcmeProductJpaWriter;
import org.springframework.batch.core.Job;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.job.builder.JobBuilder;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.core.repository.JobRepository;
import org.springframework.batch.core.step.builder.StepBuilder;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.transaction.PlatformTransactionManager;

@Configuration
public class AcmeProductsJobConfig {
    private final PlatformTransactionManager transactionManager;
    private final JobRepository jobRepository;

    public AcmeProductsJobConfig(@Qualifier("transactionManager") PlatformTransactionManager transactionManager,
                                 JobRepository jobRepository) {
        this.transactionManager = transactionManager;
        this.jobRepository = jobRepository;
    }

    @Bean(name = "acmeProductsJob")
    public Job acmeProductsJob(@Qualifier("acmeProductsDownloadStep") Step acmeProductsStep,
                               @Qualifier("acmeProductsProcessStep") Step acmeProductsProcessStep) {
        return new JobBuilder("acmeProducts", jobRepository)
                .incrementer(new RunIdIncrementer())
                .start(acmeProductsStep)
                .next(acmeProductsProcessStep)
                .build();
    }

    @Bean(name = "acmeProductsDownloadStep")
    public Step acmeProductsStep(FileDownloaderTasklet fileDownloaderTasklet) {
        return new StepBuilder("acmeProductsDownloadStep", jobRepository)
                .tasklet(fileDownloaderTasklet, transactionManager)
                .build();
    }

    @Bean(name = "acmeProductsProcessStep")
    public Step acmeProductsProcessStep(AcmeProductsCsvFileReader csvReader,
                                        AcmeProductProcessor processor,
                                        AcmeProductJpaWriter jpaWriter) {
        return new StepBuilder("acmeProductsProcessStep", jobRepository)
                .<AcmeProduct, Product>chunk(10, transactionManager)
                .reader(csvReader)
                .processor(processor)
                .writer(jpaWriter)
                .build();
    }
}

Job Runner

Finally, this is the last component we define in our example. :)

While not a standard part of Spring Batch itself, this class serves as the scheduled entry point for launching the job. The @Scheduled annotation triggers the run() method based on a cron expression defined in the configuration, allowing the job to be executed periodically.

package org.example.productbatch.runner;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.JobParametersInvalidException;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.core.repository.JobExecutionAlreadyRunningException;
import org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException;
import org.springframework.batch.core.repository.JobRestartException;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

@Component
public class AcmeProductsJobRunner {
    private final JobLauncher jobLauncher;
    private final Job acmeProductsJob;

    public AcmeProductsJobRunner(JobLauncher jobLauncher,
                                 @Qualifier("acmeProductsJob") Job acmeProductsJob) {
        this.jobLauncher = jobLauncher;
        this.acmeProductsJob = acmeProductsJob;
    }

    @Scheduled(cron = "${acme.catalog.cron}")
    public void run() throws JobInstanceAlreadyCompleteException, JobExecutionAlreadyRunningException, JobParametersInvalidException, JobRestartException {
        String targetFilename = String.format("acme-products-%d.csv", System.currentTimeMillis());
        JobParameters parameters = new JobParametersBuilder()
                .addString("TARGET_FILENAME", targetFilename)
                .toJobParameters();
        jobLauncher.run(acmeProductsJob, parameters);
    }
}

Inside the method, we dynamically generate a filename as a job parameter and pass it to the JobLauncher to execute the acmeProductsJob that we previously configured.

Conclusion

Thank you for taking the time to read this lengthy article — I hope you found it helpful. The full demo project is available in the GitHub repository below, with setup instructions included in the README.

Spring Batch Code Example

DEV Community