Issue
Im developing a Spring Batch application.
Although I'm getting more and more comfortable with it, I came across with something that is making me very confused.
Please take a look at this step configuration.
@Bean
@Qualifier(value = "processNonExportedMbfsOperationsStep")
public Step processNonExportedMbfsOperationsStep() {
return stepBuilderFactory
.get("processNonExportedMbfsOperationsStep")
.allowStartIfComplete(false)
.<MbfsEntity, CsvOutputLineDto>chunk(Integer.parseInt(chunkSize))
.reader(processNonExportedMbfsOperationsItemReader)
.processor(processNonExportedMbfsOperationItemProcessor)
.writer(processNonExportedMbfsOperationsCompositeItemWriter)
.faultTolerant()
.retry(DataAccessException.class)
.retryLimit(3)
.build();
}
As you can see it's a pretty standard step.
My confusion is related to the chunk size (50) and the reader (processNonExportedMbfsOperationsItemReader).
Reader code next:
@PersistenceContext
@Qualifier(value = "mbfsEntityManager")
private EntityManager mbfsEntityManager;
@Bean
public JpaPagingItemReader<MbfsEntity> processNonExportedMbfsOperationsItemReader() {
JpaNativeQueryProvider<MbfsEntity> queryProvider = new JpaNativeQueryProvider<>();
queryProvider.setSqlQuery(buildQuery());
queryProvider.setEntityClass(MbfsEntity.class);
return new JpaPagingItemReaderBuilder<MbfsEntity>()
.name("processNonExportedMbfsOperationsItemReader")
.entityManagerFactory(mbfsEntityManager.getEntityManagerFactory())
.pageSize(Integer.parseInt(chunkSize))
.queryProvider(queryProvider)
.build();
}
The reader is of type JpaPagingItemReader since I have thousands of records to fetch from the DB.
So here is where the confusion starts. I would expect that this JpaPagingItemReaderBuilder would use the value of the chunk size property defined in the step config, as the value to the JpaPagingItemReader pageSize property.
But clearly that's not the case, and I don't know how to make sense of it.
Should I set step chunk size to 1 and the page size to the value I want, like 50?
What I'm missing? Thanky you for your time!
Solution
In a chunk-oriented step with no processor, the difference between the page size of the reader and the chunk size is that
- the page size of the reader controls how many items are fetched per query from the DB,
- the chunk size controls how many items are passed to the
Writer
in one invocation of itswrite
method.
It depends on what you writer does, but 1 is most likely not a good chunk size. You can start with setting the chunk size equal to the page size and then optimize by trying and measuring the performance of different settings.
If the step contains a processor that returns null
for some items, i.e. drops them, then it gets more complicated. The number of items that is passed to the writer is then only bound from above by the chunk size. The reason is that the chunks are formed before the items of the chunk are passed to the processor that may drop them.
Please also have a look at this section of the reference documentation: https://docs.spring.io/spring-batch/docs/current/reference/html/step.html#chunkOrientedProcessing
Answered By - Henning
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.