Updating and downdating sketches
This page presents four ways of updating a sketch. We use MATLAB notation for in-line concatenation of matrices.
Increasing sketch size
The scenarios here use sketching operators \(S_1\) and \(S_2\) that are sampled independently from distributions \(\mathcal{D}_1\) and \(\mathcal{D}_2.\) We denote the isometry scales of \(\mathcal{D}_1\) and \(\mathcal{D}_2\) by \(\alpha_1\) and \(\alpha_2,\) respectively.
Increasing the size of a sketch is glorified concatenation. The only subtlety is how to perform the update in a way that preserves isometric scaling (which can be useful in contexts like norm estimation).
Our main purpose in explaining these updates is to highlight the effects of setting a distribution’s \(\texttt{major_axis}\) to Long. If you’re working with DenseDists and DenseSkOps (where there’s no statistical difference between short-axis-major or long-axis-major) then this choice of major axis (which is the default) can provide an additional measure of reproducibility in experiments that require tuning sketch size.
Sketching from the left
Here the \(\mathcal{D}_i\) are distributions over wide matrices, and we set \({B}_i = {S}_i {A}.\) To increase sketch size is to combine these individual sketches via concatenation:
It only makes sense to do this if \({B}\) ends up having fewer rows than \({A}.\) Put another way, this type of update only makes sense if the block operator defined by \({S} = [{S}_1;~{S}_2]\) is also wide.
It is important to be aware of the basic statistical properties of this block operator, so we’ll give its distribution the name \(\mathcal{D}.\) The isometry scale of \(\mathcal{D}\) is \(\alpha = (\alpha_1^{-2} + \alpha_2^{-2})^{-1/2}.\) If \(B_1\) was computed with isometric scaling (that is, if \({B}_1 = \alpha_1 {S}_1 {A}\)), then isometrically-scaled updated sketch would be \(B = \alpha [ {B}_1/\alpha_1;~ {B}_2].\)
RandBLAS can explicitly represent \(\mathcal{D}\) under certain conditions, which we express with a code snippet.
// SkDist is either DenseDist or SparseDist.
// (d1, d2, and m) are positive integers where d1 + d2 < m.
// arg3 is any variable allowed in the third argument of the SkDist constructor.
SkDist D1( d1, m, arg3, Axis::Long);
SkDist D2( d2, m, arg3, Axis::Long);
SkDist D(d1 + d2, m, arg3, Axis::Long);
Furthermore, if \({S}_1.\texttt{next_state}\) is the seed state for \({S}_2\), then the resulting block operator \({S} = [{S}_1;~{S}_2]\) equals the operator obtained by sampling from \(\mathcal{D}\) with \({S}_1.\texttt{seed_state}.\)
This presents another option for how sketch updates might be performed. Rather than working with two sketching operators explicitly, one can work with a single operator \({S}\) sampled from the larger distribution \(\mathcal{D},\) and compute \({B}_1\) and \({B}_2\) by working with appropriate submatrices of \({S}.\)
Sketching from the right
Here the \(\mathcal{D}_i\) are distributions over tall matrices, and \({B}_i = A {S}_i.\) The combined sketch is
and it can be obtained by right-multiplying \({A}\) with the block operator \({S} = [{S}_1,~{S}_2].\)
The isometry scale of \({S}\)’s distribution is the same as before: \(\alpha = (\alpha_1^{-2} + \alpha_2^{-2})^{-1/2},\) and RandBLAS can explicitly represent this distribution under the following conditions (\(\texttt{SkDist}\) and \(\texttt{x}\) are as before).
// (d1, d2, n) are positive integers where d1 + d2 < n.
SkDist D1(n, d1, arg3, Axis::Long);
SkDist D2(n, d2, arg3, Axis::Long);
SkDist D(n, d1 + d2, arg3, Axis::Long);
If \({S}_1\) is sampled from \(\mathcal{D}_1\) with seed state \(r\) and \({S}_2\) is sampled from \(\mathcal{D}_2\) with seed state equal to \({S}_1.\texttt{next_state},\) then the block operator \({S}\) is the same as the matrix sampled from \(\mathcal{D}\) with seed state \(r.\) As with sketching from the left, this shows there are situations where it can suffice to define a single operator and sketch with appropriate submatrices of that operator.
Rank-\(k\) updates
A rank-k update is a multiply-accumulate operation involving matrices. It involves a pair of matrices \({X}\) and \({Y}\) that have k columns and k rows, respectively. It also involves a real scalar \(\alpha\) and a matrix \({B}\) of the same shape as \({X} {Y}.\) The operation itself is
Here we describe some rank-k updates that arise in sketching algorithms.
This framework can be used to describe incorporating new data into a sketch, or removing the contributions of old data from a sketch. We’ve focused our documentation efforts on the cases that add data. More specifically, we focus on when we’re performing a rank-k update to add new data into an existing sketch, but k was not known when the original sketch was formed. This case has more complications than if k was known in advance, but it can still be handled with RandBLAS when using distributions with if \(\texttt{major_axis} = \texttt{Short}.\)
Note
Future updates (pun intended) to these web docs will explain how the major-axis requirement can be dropped if an upper bound on k is known in advance. That really just amounts to explaining in detail how you operate with submatrices in RandBLAS. Incidentally, operating with submatrices is really all you need to perform rank-k updates that “remove” data.
Adding data: left-sketching
- Problem statement
We start with an \(m \times n\) data matrix \({A}_1.\) When presented with this matrix, we sample a wide operator \({S}_1\) from a distribution \(\mathcal{D}_1\) (defined as follows)
// Assumptions // * SkDist is DenseDist or SparseDist // * arg3 is any variable that makes sense for the SkDist constructor. // * This example requires d < m. SkDist D1(d, m, arg3, Axis::Short);
and we compute a sketch \({B} = {S}_1 {A}_1.\)
Sometime later, a \(k \times n\) matrix \({A}_2\) arrives. We want to independently sample a \(d \times k\) operator \({S}_2\) from some \(\mathcal{D}_2\) and perform a rank-k update \({B} \leftarrow {B} + {S}_2 {A}_2.\) In essence, we want to redefine
\[\begin{split}{B} = \begin{bmatrix} {S}_1 & {S}_2 \end{bmatrix} \begin{bmatrix} {A}_1 \\ {A}_2 \end{bmatrix}\end{split}\]without having to revisit \({A}_1.\)
- Conceptual solution
Since \(\texttt{major_axis}\) is Short and \(d < m,\) the columns of \(d \times m\) matrices sampled from \(\mathcal{D}_1\) will be sampled independently from a shared distribution on \(\mathbb{R}^d.\) This suggests we could define \(\mathcal{D}_2\) as a distribution over \(d \times k\) matrices whose columns follow the same distribution used in \(\mathcal{D}_1.\)
- Implementation
If \(d < k,\) then short-axis vectors of a \(d \times k\) matrix still refer to columns. This makes it possible to express \(\mathcal{D}_2\) explicitly:
SkDist D2(d, k, arg3, Axis::Short);
If \(d \geq k,\) then we have to define a distribution \(\mathcal{D}\) over \(d \times (m + k)\) matrices
SkDist D(d, m + k, arg3, Axis::Short);
and think of \(\mathcal{D}_2\) as the distribution obtained by selecting the trailing \(k\) columns of a sample from \(\mathcal{D}.\)
This second approach may look wasteful, but that’s not really the case. If a DenseSkOp is used in one of RandBLAS’ functions for sketching with a specified submatrix, only the submatrix that’s necessary for the operation will be generated. The following code snippet provides more insight on the situation.
SkDist D1( d, m, arg3, Axis::Short ); SkDist D( d, m + k, arg3, Axis::Short ); // Since d < m and we're short-axis major, the columns of matrices sampled from // D1 or D1 will be sampled i.i.d. from some distribution on R^d. using SkOp = typename SkDist::distribution_t; SkOp S1( D1, seed_state ); // seed_state is some RNGState. SkOp S( D, seed_state ); // With these definitions, S1 is *always* equal to the first m columns of S. // We recover S2 by working implicitly with the trailing k columns of S.
Adding data: right-sketching
- Problem statement
We start with an \(m \times n\) data matrix \({A}_1.\) When presented with this matrix, we sample a tall operator \({S}_1\) from a distribution \(\mathcal{D}_1\) of the form
// Assumptions // * SkDist is DenseDist or SparseDist // * arg3 is any variable that makes sense for the SkDist constructor. // * This example requires n > d. SkDist D1(n, d, arg3, Axis::Short);
and we compute a sketch \({B} = {A}_1 {S}_1.\)
Sometime later, an \(m \times k\) matrix \({A}_2\) arrives. We want to independently sample an \(k \times d\) operator \({S}_2\) from some \(\mathcal{D}_2\) and perform a rank-k update \({B} \leftarrow {B} + {A}_2 {S}_2.\) Essentially, we want to redefine
\[\begin{split}{B} = \begin{bmatrix} {A}_1 & {A}_2 \end{bmatrix}\begin{bmatrix} {S}_1 \\ {S}_2 \end{bmatrix}\end{split}\]without having to revisit \({A}_1.\)
- Conceptual solution
The idea is the same as with left-sketching. The difference is that since we’re sketching from the right with a tall \(n \times d\) operator, the short-axis vectors are rows instead of columns. This means the rows of \({S}_1\) are sampled independently from some distribution on \(\mathbb{R}^d,\) and we can define \({S}_2\) by sampling its rows from that same distribution.
- Implementation
If \(k > d,\) then we can represent \(\mathcal{D}_2\) explicitly, constructing it as follows.
SkDist D2(k, d, arg3, Axis::Short);
If \(k \leq d,\) then we have to define a distribution \(\mathcal{D}\) over \((n + k) \times d\) matrices
SkDist D(n + k, d, arg3, Axis::Short);
and think of \(\mathcal{D}_2\) as the distribution obtained by selecting the bottom \(k\) rows of a sample from \(\mathcal{D}.\)
As with the left-sketching case, we provide a code snippet to summarize the situation.
SkDist D1( n, d, arg3, Axis::Short ); SkDist D( n + k, d, arg3, Axis::Short ); // Since n > d and we're short-axis major, the rows of matrices sampled from // D1 or D1 will be sampled i.i.d. from some distribution on R^d. using SkOp = typename SkDist::distribution_t; SkOp S1( D1, seed_state ); // seed_state is some RNGState. SkOp S( D, seed_state ); // With these definitions, S1 is *always* equal to the first m rows of S. // We recover S2 by working implicitly with the last k rows of S.