Background on GEMM
A simplified viewpoint
Consider a simplified version of GEMM with conformable real linear operators \((\mathbf{A},\mathbf{B},\mathbf{C})\):
\[\displaystyle\mathbf{C} = \alpha \cdot\,\operatorname{op}(\mathbf{A})\, \cdot \,\operatorname{op}(\mathbf{B}) + \,\beta \cdot \mathbf{C}.\]
Here, \(\operatorname{op}(\cdot)\) can return its argument either unchanged or transposed. Its action on \(\mathbf{A}\) and \(\mathbf{B}\) is determined by contextual information in the form of flags. The flag for \(\mathbf{A}\) is traditionally called \(\text{“}\texttt{opA}\text{”}\) and is interpreted as
\[\begin{split}\operatorname{op}(\mathbf{A}) = \begin{cases} \mathbf{A} & \text{ if } \texttt{opA} \texttt{ == NoTrans} \\ \mathbf{A}^T & \text{ if } \texttt{opA} \texttt{ == Trans} \end{cases}.\end{split}\]
The flag for \(\mathbf{B}\) is traditionally named \(\text{“}\texttt{opB}\text{”}\) and is interpreted similarly.
An accurate description
The GEMM API accepts dimensions \((m, n, k)\), pointers \((A, B, C)\), and executes
(1)\[\displaystyle\operatorname{mat}(C) = \alpha \cdot\, \underbrace{\operatorname{op}(\operatorname{mat}(A))}_{m \times k}\, \cdot \,\underbrace{\operatorname{op}(\operatorname{mat}(B))}_{k \times n} + \,\beta \cdot\underbrace{\operatorname{mat}(C)}_{m \times n},\]
where \(\operatorname{mat}(\cdot)\) accepts a pointer and returns a matrix based on the following contextual information:
We use the \(\text{“}\operatorname{mat}\text{”}\) operator only to help with exposition.
No such operator appears in the GEMM API.
For reference, here is a standard function signature for a version of GEMM that requires all three matrices
in Eq. 1 to have a common numerical type, T.
template <typename T>
gemm(
blas::Layout ell, blas::Op opA, blas::Op opB, int m, int n, int k,
T alpha, const T* A, int lda, const T* B, int ldb, T beta, T* C, int ldc
)
A complete explanation of how \(\operatorname{mat}\) extracts submatrices from this contextual information is given below.
Details on \(\operatorname{mat}(\cdot)\)
The semantics of \(\operatorname{mat}\) can be understood by focusing on \(\mathbf{A} = \operatorname{mat}(A)\). First, there is the matter of the dimensions. These are inferred from \((m, k)\) and from \(\texttt{opA}\) in the way indicated by Eq. 1.
If \(\texttt{opA} \texttt{ == NoTrans}\), then \(\mathbf{A}\) is \(m \times k\).
If \(\texttt{opA} \texttt{ == Trans }\), then \(\mathbf{A}\) is \(k \times m\).
Moving forward let us say that \(\mathbf{A}\) is \(r \times c\). The actual contents of \(\mathbf{A}\) are determined by the pointer, \(A\text{,}\) an explicitly declared stride parameter, \(\texttt{lda}\text{,}\) and a layout parameter, \(\texttt{ell}\text{,}\) according to the rule
\[\begin{split}\mathbf{A}_{i,j} = \begin{cases} A[\,i + j \cdot \texttt{lda}\,] & \text{ if } \texttt{ell == ColMajor} \\ A[\,i \cdot \texttt{lda} + j\,] & \text{ if } \texttt{ell == RowMajor} \end{cases}\end{split}\]
where we zero-index \(\mathbf{A}\) for consistency with indexing into buffers in C/C++.
Only the leading \(r \times c\) submatrix of \(\operatorname{mat}(A)\) will be accessed in computing Eq. 1. Note that in order for this submatrix to be well-defined it’s necessary that
\[\begin{split}\texttt{lda} \geq \begin{cases} r & \text{ if } \texttt{ell == ColMajor} \\ c & \text{ if } \texttt{ell == RowMajor} \end{cases}.\end{split}\]
Most performance libraries check that this is the case on entry to GEMM and will raise an error if this condition isn’t satisfied.