scalability of MPI on one machine -
i encounter problem scalability of mpi solver on 1 machine. want twenty matrix summations. work done in 2 way: 1. 20 tasks launched mpi, each task 1 matrix summations 1 round. 2. 2 tasks launched mpi, each task 1 matrix summation per round , ten rounds of calculation. different task synchronized between different rounds. cpu time of matrix summation per each round recorded both 2 ways. cpu time not include memory allocation , release, not include matrix creation. expect, cpu time per round in 2 ways should same since each task calculate 1 matrix summation in each round , there no communication between tasks, e.g. calculation of each task independent. means scalability of solver should perfect.
but cpu time of first way (20 mpi tasks) more ten times of cpu time of second way (2 mpi tasks). possible reason of it? related cache in memory?
Comments
Post a Comment