Since I do not use so often BLAS library for matrix-matrix multiplication, when I have to multiply two matrices with some rectangular shape or with additional operation I always get confused. So I decided to write a simple guide to c/z-gemm in fortran.
1) Simplest case two square complex matrices: A(N,N) and B(N,N)
and I want to store ther result in C(N,N)
the call to cgemm will be
SUBROUTINE CGEMM ( TRANSA, TRANSB, N, N, N, ALPHA, A, LDA, B, LDA, BETA, C, LDC )
where LDA=LDB=LDC=N and TRANSA(B) can be an operation on the matrix A(B)
‘N’ = use the A matrix as it is
‘T’ = transpose op(A) = AT
‘C’ = hermitian op(A) = AH
in this case because all the matrices are squared all the indexes remain the same.
2) Now a more complex case A(N,M), B(M,N) and C(N,N) with M=5 and N=3 as in the figure
SUBROUTINE SGEMM ('N', 'N', N, N, M, 1.0, A, N, B, M, 0.0, C, N )
we can also multiply B for A and get a 5×5 matrix as result
SUBROUTINE SGEMM ('N', 'N', M, M, N, 1.0, B, M, A, N, 0.0, C, M )
3) Another possibility is to use operations different from ‘N’, for example the transpose ‘T’ of the hermitian ‘C’, for example this two codes are equivalent but the second is faster and use less memory:
call SGEMM('n', 'n', M, M, N, 1.0, transpose(A), M, transpose(B), N, 0.0, C, M)
call SGEMM('t', 't', M, M, N, 1.0, A, N, B, M, 0.0, C, M)
notice that the LDA and LDB specify the entry dimension of the matrix A and B, therefore in the second case the entry dimension is the first dimension of the original matrices A and B, while in the first example it corresponds to the one of transpose(A) and transpose(B).
Other guides are available here:
This is a great write-up. Thank you for spending some time to describe all of this out for folks. It really is a great help!