Systems which must use point-to-point communication for MPI_Allreduce
will probably display a log(p) behavior for
the cost of an MPI_Allreduce. Systems that can use either a special network or
shared memory may have faster reductions with different scaling.
Some of these optimizations (in particular, special networks) apply only to
MPI_COMM_WORLD or a communicator that contains the same processes as
MPI_COMM_WORLD.