Issue
I have written a function that scans files (pictures) from two Lists and check if a file is in both lists.
The code below is working as expected, but for large sets it takes some time. So I tried to do this in parallel with coroutines. But in sets of 100 sample files the programm was always slower than without coroutines.
The code:
private fun doJob() {
val filesToCompare = File("C:\\Users\\Tobias\\Desktop\\Test").walk().filter { it.isFile }.toList()
val allFiles = File("\\\\myserver\\Photos\\photo").walk().filter { it.isFile }.toList()
println("Files to scan: ${filesToCompare.size}")
filesToCompare.forEach { file ->
var multipleDuplicate = 0
var s = "This file is a duplicate"
s += "\n${file.absolutePath}"
allFiles.forEach { possibleDuplicate ->
if (file != possibleDuplicate) { //only needed when both lists are the same
// Files that have the same name or contains the name, so not every file gets byte comparison
if (possibleDuplicate.nameWithoutExtension.contains(file.nameWithoutExtension)) {
try {
if (Files.mismatch(file.toPath(), possibleDuplicate.toPath()) == -1L) {
s += "\n${possibleDuplicate.absolutePath}"
i++
multipleDuplicate++
println(s)
}
} catch (e: Exception) {
println(e.message)
}
}
}
}
if (multipleDuplicate > 1) {
println("This file has $multipleDuplicate duplicate(s)")
}
}
println("Files scanned: ${filesToCompare.size}")
println("Total number of duplicates found: $i")
}
How have I tried to add the coroutines?
I wrapped the code inside the first forEach in launch{...}
the idea was that for each file a coroutine starts and the second loop is done concurrently. I expected the program to run faster but in fact it was about the same time or slower.
How can I achieve this code to run in parallel faster?
Solution
Running each inner loop in a coroutine seems to be a decent approach. The problem might lie in the dispatcher you were using. If you used runBlocking
and launch
without context argument, you were using a single thread to run all your coroutines.
Since there is mostly blocking IO here, you could instead use Dispatchers.IO
to launch your coroutines, so your coroutines are dispatched on multiple threads. The parallelism should be automatically limited to 64, but if your memory can't handle that, you can also use Dispatchers.IO.limitedParallelism(n)
to reduce the number of threads.
Answered By - Joffrey
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.