What is MPI?
- MPI: Message Passing Interface
- Doc: https://www.mpi-forum.org/
Vocabulary
- Process - Each instance of the code is an MPI process, typically 1 per physical core or node
- Rank - Each process has a rank, that is basically just an ID, it’s an integer
- Communicator - A group of processes that we assign some name to, so this might be a group that includes every process you have, or maybe just a few working on a specific task
Basic API
Function |
Introduction |
MPI_Init |
Initialise the MPI environment, it can be passed two variables but for now just leave these as NULL |
MPI_Comm_rank |
This tells you what your current rank is for a communicator group |
MPI_COMM_WORLD |
This is the default communicator group, has all of the instances currently initialised |
MPI_Finalize |
Ignore the horrible American spelling here, this ends the MPI environment |
MPI_Comm_size |
This returns the number of processes in a specific communicator |
MPI_get_processor_name |
Returns the name of the actual piece of hardware that is currently running the process |
Communication
-
Collective communication - All of the processes within a specific group receive a copy of the communication
-
Point-to-point communication - There is a single sender, and a single receiver
#include <mpi.h>
Example
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
const int MAX_STRING = 100;
int main(void) {
int comm_size;
int rank;
char greeting[MAX_STRING];
MPI_Init(NULL, NULL);
MPI_Comm_size(MPI_COMM_WORLD, &comm_size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_get_processor_name(processor_name, &name_len);
if(rank != 0){
sprintf(greeting, "Greetings from processor %s, process %d of %d!", processor_name, rank, comm_size);
MPI_Send(greeting, strlen(greeting)+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
}
else{
int i;
for(i = 1; i < comm_size; i++){
MPI_Recv(greeting, MAX_STRING, MPI_CHAR, i, 0, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
printf("%s\n", greeting);
}
}
}
Blocking
if a MPI_recv call is made the process will continue to wait until it receives that message
Non-blocking
Code can continue execution after calling a particular MPI function
Function |
Description |
MPI_Ssend |
A guaranteed blocking version of MPI_Send It takes the exact same parameters as MPI_Send |
MPI_Send |
Sends a message to another process, will block only if that message is large |
MPI_Isend |
Same as MPI_Send but is guaranteed not to block |
MPI_Recv |
Receives a message from another process, guaranteed to block |
MPI_Irecv |
Same as MPI_Recv but does not block |
Status
-
MPI_Status_Ignore - You are not providing any information on the data You could use the MPI_Status struct
-
MPI_Source - Which process sent this message?
-
MPI_Tag - What was the tag for this message?
-
MPI_Error - Was there a detectable error with this message? Wrong type etc
Sending
- MPI_Test - A non-blocking check, assigns the second parameter to true (1) if the MPI_Request is equal to the third parameter
Receiving
MPI_Waitall - If you have multiple requests as an array, it can provide a non-blocking wait until they are all finished
Cost of MPI messages
Message size
There are two major factors that affect the time to send a message
The first is message size, it works kind of like the length of a train
There is a limit of how many tracks there are, the more carriages, the longer it takes to completely arrive
Physical connection
For latency we measure it in microseconds (μs), which is one millionth of a second
Equation
Packet size = Number of elements X variable size
Time cost = Number of messages * ((Latency + Packet size) / Bandwidth)
Summary
This is just an estimate, lots of factors can affect the actual time cost
- The number of switches in the path
- The physical distance of the interconnection
- The method of packaging the message for transmission
Collective communication
Collective communication - All of the processes within a specific group receive a copy of the communication
MPI_Bcast
This is used to transmit a message from one process to others in a communicator
paramaters |
description |
void* |
What variable is the data you are sending? |
int |
How many elements of data is this? |
MPI_Datatype |
What MPI_Datatype is the data? |
int |
What is the rank of the process sending the message? |
MPI_Comm |
What communicator are you sending to? |
Example:
int value = 12345;
MPI_Bcast(&value, 1, MPI_INT, broadcast_process, MPI_COMM_WORLD)
MPI_Scatter
MPI_Scatter - A method for splitting and sending an array across a set of processes
Total data
parameters |
description |
void* |
What variable is the full data you are sending? |
int |
How many elements is the full data? |
MPI_Datatype |
What MPI_Datatype is the full data? |
Received data
parameters |
description |
void* |
What variable is the full data you are sending? |
int |
How many elements is the full data? |
MPI_Datatype |
What MPI_Datatype is the full data? |
Sender data
parameters |
description |
int |
What is the rank of the process sending the message? |
MPI_Comm |
What communicator is this using? |
Example:
#include <stdio.h>
int rank;
int broadcast_root = 0;
int values[] = {1,2,3,4,5,6,7,8,9};
int local_values[2];
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Scatter(values, 8, MPI_INT, local_values, MPI_INT, 2, broadcast_root, MPI_COMM_WORLD);
printf("[Process: %d]", rank);
for(int i = 0; i < 2; i++) {
printf("local values are %d", local_value[i]);
}
MPI_Scatterv
MPIScatterv - An extension of MPI_Scatter that allows you to specify
how many _elements each process receives
Total data
parameters |
description |
void* |
What variable is the full data you are sending? |
int |
How many elements is the full data? |
int |
Where will this portion of work start? |
MPI_Datatype |
What MPI_Datatype is the full data? |
Received data
parameters |
description |
void* |
What variable is the full data you are sending? |
int |
How many elements is the full data? |
MPI_Datatype |
What MPI_Datatype is the full data? |
Sender data
parameters |
description |
int |
What is the rank of the process sending the message? |
MPI_Comm |
What communicator is this using? |
Example:
int displacement = {0, 3, 6, 9};
int elements_per_process = {3, 3, 3, 2};
int rank;
int values[] = {1,2,3,4,5,6,7,8,9,10,11};
int local_value[2];
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Scatterv(values, elements_per_process, displacement, MPI_INT, local_values, MPI_INT, elements_per_process[rank], broadcast_root, MPI_COMM_WORLD);
printf("[Process: %d]", rank);
for(int i = 0; i < 2; i++) {
printf("local values are %d", local_value[i]);
}
MPI_Gather
MPI_Gather - A method for combining data to a single array from a set of smaller arrays in each processes
MPI_Gatherv - An extension of MPI_Gather that allows you to receive different sized
messages from each process
Sent data
parameters |
description |
void* |
What variable is the full data you are sending? |
int |
How many elements is the full data? |
MPI_Datatype |
What MPI_Datatype is the full data? |
Gathered data
parameters |
description |
void* |
What variable is the full data you are sending? |
int |
How many elements is the full data? |
MPI_Datatype |
What MPI_Datatype is the full data? |
Receviver data
parameters |
description |
int |
What is the rank of the process sending the message? |
MPI_Comm |
What communicator is this using? |
Example:
MPI_Allgather
MPI_Allgather - An extension of MPI_Gather that gives each process a copy of the complete array
Sent data
parameter |
description |
void* |
What variable is the local data you are sending? |
int |
How many elements is the local data? |
MPI_Datatype |
What MPI_Datatype is the full data? |
Gathered data
parameter |
description |
void* |
Where is the data you receive stored? |
int |
How many elements of data will you receive? |
MPI_Datatype |
What datatype will that data be? |
Receiver data
parameter |
description |
int |
What is the rank of the process receiving the message? |
MPI_Comm |
What communicator is this using? |
MPI_Allgatherv - An extension of MPI_Allgather that allows you to receive different sized messages from each process
Sent data
parameter |
description |
void* |
What variable is the local data you are sending? |
int |
How many elements is the local data? |
MPI_Datatype |
What MPI_Datatype is the full data? |
Gathered data
parameter |
description |
void* |
Where is the data you receive stored? |
int |
How many elements does each process have? |
int |
Where does each portion of work start? |
MPI_Datatype |
What datatype will that data be? |
Receiver data
parameter |
description |
int |
What is the rank of the process receiving the message? |
MPI_Comm |
What communicator is this using? |
MPI_Reduce
MPI_Reduce - A method that performs an operation on array that applies to all elements at the same iteration
parameter |
description |
void* |
What variable is the local data you are sending? |
void* |
Where is the data you receive stored? |
int |
How many elements is the local data? |
MPI_Datatype |
What MPI_Datatype is the full data? |
MPI_Op |
What type of operation will be applied? |
int |
What is the rank of the process receiving the message? |
MPI_Comm |
What communicator is this using? |
Operation |
Description |
---------- |
------------------------------------------------------------- |
MPI_MIN |
The minimum element for each iteration |
MPI_MAX |
The maximum element for each iteration |
MPI_MINLOC |
The minimum element for each iteration and the process’s rank |
MPI_MAXLOC |
The maximum element for each iteration and the process’s rank |
MPI_SUM |
The sum for each iteration |
MPI_PROD |
The product for each iteration |
MPI_LAND |
The logical AND for each iteration |
MPI_LOR |
The logical OR for each iteration |
MPI_BAND |
The bitwise AND for each iteration |
MPI_BOR |
The bitwise OR for each iteration |
Example
MPI_Reduce(&local_valyes, &values, 2, MPI_INT, MPI_SUM, root_Rank, MPI_COMM_WORLD);
MPI_Allreduce - It is quite obvious how this works, it just sends out distributes that final array
MPI_Sendrecv
MPI_Sendrecv - This method functions as a combination of MPI_Send and MPI_Recv, both processes call the same line of code.
Sender parameters
parameter |
description |
void* |
What variable is the data you are sending? |
int |
How many elements is the sent data? |
MPI_Datatype |
What MPI_Datatype is the sent data? |
int |
Which process will receive the data? |
int |
What tag is used for the sent data? |
Receiver parameters
parameter |
description |
void* |
Where is the data you receive stored? |
int |
How many elements of data will you receive? |
MPI_Datatype |
What datatype will that data be? |
int |
What is the rank of the process sending the message? |
int |
What tag is expected for the sent message? |
Shared parameters
parameter |
description |
MPI_Comm |
What communicator is this using? |
MPI_Status |
What status is reported to the sending process? |
Example:
MPI_Sendrecv(&result, 1, MPI_INT, 1, 0, &result, 1, MPI_INT, 0, 0, MPI_COMM_WORLD_ MPI_STATUS_IGNORE);
if(rank == 0){
MPI_Send(&result, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
}else{
MPI_Recv(&result, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
}
MPI_Sendrecv_replace
MPI_Sendrecv_replace - This method uses a single send and receive variable. This is good for just passing a value between two processes
Example
MPI_Sendrecv_replace(&result, 1, MPI_INT, 1, 0, 0, 0, MPI_COMM_WORLD_ MPI_STATUS_IGNORE);
MPI Communicators
MPI_Comm_split
parameter |
description |
MPI_Comm |
What is the communicator you are splitting? |
int |
What is the ‘color’ of your group? Which is the tag for the group |
int |
What is the rank the process will get in the new communicator? |
MPI_Comm |
What is the new communicator? |
Example
#include<mpi.h>
int global_rank, sub_rank, local_key;
MPI_Comm sub_communicator;
if(global_rank % 2 == 0){
color = 1;
local_key = global_rank;
} else {
color = 2;
local_key = global_rank - (MPI_comm_size / 2);
}
MPI_Comm_split(MPI_COMM_WORLD, color, local_key, &sub_communicator);
MPI_Comm_rank(MPI_COMM_WORLD, global_rank);
MPI_Comm_rank(sub_communicator, sub_rank);
printf("My MPI_COMM_WORLD rank is %d, but in comm %d my rank is %d\n", global_rank, color, sub_rank);
MPI_Comm_free
MPI_Comm_free - A method to remove a communicator from memory
MPI_Comm_free(&sub_communicator);
OpemMP and MPI
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(void){
MPI_Init(NULL, NULL);
int rank;
int i = 0;
int global_values[8];
int local_values[2];
int global_calculationed[8]; int local_calculationed[2];
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if(rank == 0){
global_values = load_global_values("Filename.txt");
printf("[Process %d] The global values are loaded.\n", rank);
}
MPI_Scatter(global_values, 8, MPI_INT, local_values, MPI_INT, 2, 0, MPI_COMM_WORLD);
#pragma omp parallel for shared(local_values, local_calculationed) firstprivate(i)
for(i = 0; i < 2; i++){
local_calculationed[i] = calculation(local_values[i]);
printf("[Process %d] %d results in %d\n", rank, local_values[i], local_calculationed[i]);
}
MPI_Gather(local_calculationed, 2, MPI_INT, global_calculationed, 2, MPI_INT, 0, MPI_COMM_WORLD);
if(rank == 0){
printf("[Process %d] Calculated values are ", rank);
for (int i = 0; i < 8; i++)
printf("%d ", global_calculationed[i]);
printf("\n");
}
MPI_Finalize();
}
MPI_Wtime
MPI_Wtime - This method returns the amount of time taken by the process that calls that function
double start, end, time_taken;
double start = MPI_Wtime();
double end = MPI_Wtime();
time_taken = end - start;
printf("[Process %d] The time taken is %.2fs.\n", rank, time_taken);
MPI_Probe
Stored locally
parameter |
description |
void* |
What variable is the data you are sending? |
int |
How many elements is the sent data? |
MPI_Datatype |
What MPI_Datatype is the sent data? |
int |
Which process is sending the data? |
int |
What tag is used for the sent data? |
MPI_Comm |
What communicator is this using? |
MPI_Status |
What status is reported to the sending process? |