Thanks to ISL, we AlphaZ now have two schedulers. Multi-dimensional farkas scheduler by Paul Feautrier, and PLuTo scheduling by Uday Bondhugula. For details of the schedulers, refer to the corresponding articles.
@article{feautrier1992some, title={Some efficient solutions to the affine scheduling problem. part II. multidimensional time}, author={Feautrier, P.}, journal={International Journal of Parallel Programming}, volume={21}, number={6}, pages={389--420}, year={1992}, publisher={Springer} }
@article{bondhugula2008practical, title={A practical automatic polyhedral parallelizer and locality optimizer}, author={Bondhugula, U. and Hartono, A. and Ramanujam, J. and Sadayappan, P.}, journal={ACM SIGPLAN Notices}, volume={43}, number={6}, pages={101--113}, year={2008}, publisher={ACM} }
Schedulers take PRDG (Polyhedral Reduced Dependence Graph) as inputs and returns affine functions for each node in the PRDG. Following commands are related to scheduling.
Lets take a simple program, matrix multiply, as an example. Matrix multiplication has a 3D iteration space, and the only dependence is along the k-axis for accumulating.
affine matrix_product {P, Q, R|P>1 && Q>1 && R>1} given float A {i,k| 0<=i<P && 0<=k<Q}; float B {k,j| 0<=k<Q && 0<=j<R}; returns float C {i,j,k| 0<=i<P && 0<=j<R && k==Q+1}; using // Using an accumulator locally float temp_C {i,j,k|0<=i<P && 0<=j<R && 0<=k<=Q}; through temp_C[i,j,k] = case // For the computation of the value of temp_C at (i,j,k), // we need the value of temp_C at (i,j,k-1), A at (i,k) and B at (k,j) {|k>0} : temp_C[i,j,k-1] + A[i,k-1]*B[k-1,j]; {|k==0} : 0; // Initialization of the accumulator esac; C[i,j,k] = temp_C[i,j,k-1]; .
The following scripts uses FarkasMD scheduling to find schedules for the above program.
# Load Program and store as 'prog' prog = ReadAlphabets("../../testcases/matrix_product/matrix_product.ab"); # Define a variable 'system' to store the system name system = "matrix_product"; prdg = BuildPRDG(prog, system, 1); schedules = FarkasMDScheduler(prdg); setSchedule(prog, system, schedules); listSpaceTimeMaps(prog, system);The output should look like the following:
C (i,j,k->k,j,i) temp_C (i,j,k->k,j,i) ===SpaceTime Maps=== SpaceTimeMap: C (i,j,k->k,j,i) [SEQUENTIAL, SEQUENTIAL, SEQUENTIAL] SpaceTimeMap: temp_C (i,j,k->k,j,i) [SEQUENTIAL, SEQUENTIAL, SEQUENTIAL]
Since Farkas scheduler tries to satisfy as much dependencies as possible in the outer dimensions, the k-axis has became the outer most dimension in the resulting schedule.
The following is a similar script for matrix multiply, but uses PLuTo scheduling.
# Load Program and store as 'prog' prog = ReadAlphabets("../../testcases/matrix_product/matrix_product.ab"); # Define a variable 'system' to store the system name system = "matrix_product"; prdg = BuildPRDG(prog, system, 1); schedules = PlutoScheduler(prdg); setSchedule(prog, system, schedules); listSpaceTimeMaps(prog, system);The output should look like the following:
C (i,j,k->i,j,k) temp_C (i,j,k->i,j,k) ===SpaceTime Maps=== SpaceTimeMap: C (i,j,k->i,j,k) [SEQUENTIAL, SEQUENTIAL, SEQUENTIAL] SpaceTimeMap: temp_C (i,j,k->i,j,k) [SEQUENTIAL, SEQUENTIAL, SEQUENTIAL]
Note that now the k-axis is the innermost dimension of the resulting schedules. Because PLuTo scheduler have a different cost function, the resulting schedule can be different.
Here is another example to illustrate the difference between Farkas and PLuTo scheduling.
The alphabets program below is a Jacobi-style stencil computation for 1D data.
Because of the dependence (t,i->t,i+1)
, this program is not tilable with the identity schedule.
affine jacobi_1d {TSTEPS,N | TSTEPS > 2 && N > 5} given double A {i|0<=i<N}; returns double B {t,i|0<=i<N && t==TSTEPS}; using double temp_B {t,i|0<=i<N && 0<=t<TSTEPS}; through temp_B[t,i] = case {|t == 0} : A[i]; {|t > 0 && 1<=i<N-1} : (temp_B[t-1,i-1] + temp_B[t-1,i] + temp_B[t-1,i+1])*0.33333; {|t > 0 && 0==i } || {|t > 0 && i==N-1 } : temp_B[t-1,i]; esac; B[t,i] = temp_B[t-1,i]; .PLuTo Scheduling, which tries to find tilable shcedules, gives the following. Note that the second dimension of the schedules are shifted by
t
. This is an instance of “skewing”, and this schedule is tilable.
temp_B (t,i->t,i+t) B (t,i->t,i+t) ===SpaceTime Maps=== SpaceTimeMap: temp_B (t,i->t,i+t) [SEQUENTIAL, SEQUENTIAL] SpaceTimeMap: B (t,i->t,i+t) [SEQUENTIAL, SEQUENTIAL]
Farkas Scheduling does not try to produce tilable schedule, and thus gives the following schedule.
temp_B (t,i->t,i) B (t,i->t,i) ===SpaceTime Maps=== SpaceTimeMap: temp_B (t,i->t,i) [SEQUENTIAL, SEQUENTIAL] SpaceTimeMap: B (t,i->t,i) [SEQUENTIAL, SEQUENTIAL]