Updating and downdating sketches

This page presents four ways of updating a sketch. We use MATLAB notation for in-line concatenation of matrices.

Increasing sketch size

The scenarios here use sketching operators \(S_1\) and \(S_2\) that are sampled independently from distributions \(\mathcal{D}_1\) and \(\mathcal{D}_2.\) We denote the isometry scales of \(\mathcal{D}_1\) and \(\mathcal{D}_2\) by \(\alpha_1\) and \(\alpha_2,\) respectively.

Increasing the size of a sketch is glorified concatenation. The only subtlety is how to perform the update in a way that preserves isometric scaling (which can be useful in contexts like norm estimation).

Our main purpose in explaining these updates is to highlight the effects of setting a distribution’s \(\texttt{major_axis}\) to Long. If you’re working with DenseDists and DenseSkOps (where there’s no statistical difference between short-axis-major or long-axis-major) then this choice of major axis (which is the default) can provide an additional measure of reproducibility in experiments that require tuning sketch size.

Sketching from the left

Here the \(\mathcal{D}_i\) are distributions over wide matrices, and we set \({B}_i = {S}_i {A}.\) To increase sketch size is to combine these individual sketches via concatenation:

\[\begin{split}{B} = \begin{bmatrix} {B}_1 \\ {B}_2 \end{bmatrix} = \begin{bmatrix} {S}_1 {A} \\ {S}_2 {A} \end{bmatrix}.\end{split}\]

It only makes sense to do this if \({B}\) ends up having fewer rows than \({A}.\) Put another way, this type of update only makes sense if the block operator defined by \({S} = [{S}_1;~{S}_2]\) is also wide.

It is important to be aware of the basic statistical properties of this block operator, so we’ll give its distribution the name \(\mathcal{D}.\) The isometry scale of \(\mathcal{D}\) is \(\alpha = (\alpha_1^{-2} + \alpha_2^{-2})^{-1/2}.\) If \(B_1\) was computed with isometric scaling (that is, if \({B}_1 = \alpha_1 {S}_1 {A}\)), then isometrically-scaled updated sketch would be \(B = \alpha [ {B}_1/\alpha_1;~ {B}_2].\)

RandBLAS can explicitly represent \(\mathcal{D}\) under certain conditions, which we express with a code snippet.

// SkDist is either DenseDist or SparseDist.
// (d1, d2, and m) are positive integers where d1 + d2 < m.
// arg3 is any variable allowed in the third argument of the SkDist constructor.
SkDist D1(     d1, m, arg3, Axis::Long);
SkDist D2(     d2, m, arg3, Axis::Long);
SkDist  D(d1 + d2, m, arg3, Axis::Long);

Furthermore, if \({S}_1.\texttt{next_state}\) is the seed state for \({S}_2\), then the resulting block operator \({S} = [{S}_1;~{S}_2]\) equals the operator obtained by sampling from \(\mathcal{D}\) with \({S}_1.\texttt{seed_state}.\)

This presents another option for how sketch updates might be performed. Rather than working with two sketching operators explicitly, one can work with a single operator \({S}\) sampled from the larger distribution \(\mathcal{D},\) and compute \({B}_1\) and \({B}_2\) by working with appropriate submatrices of \({S}.\)

Sketching from the right

Here the \(\mathcal{D}_i\) are distributions over tall matrices, and \({B}_i = A {S}_i.\) The combined sketch is

\[{B} = \begin{bmatrix} {B}_1 & {B}_2 \end{bmatrix} = \begin{bmatrix} {A} {S}_1 & {A} {S}_2 \end{bmatrix},\]

and it can be obtained by right-multiplying \({A}\) with the block operator \({S} = [{S}_1,~{S}_2].\)

The isometry scale of \({S}\)’s distribution is the same as before: \(\alpha = (\alpha_1^{-2} + \alpha_2^{-2})^{-1/2},\) and RandBLAS can explicitly represent this distribution under the following conditions (\(\texttt{SkDist}\) and \(\texttt{x}\) are as before).

// (d1, d2, n) are positive integers where d1 + d2 < n.
SkDist D1(n,      d1, arg3, Axis::Long);
SkDist D2(n,      d2, arg3, Axis::Long);
SkDist  D(n, d1 + d2, arg3, Axis::Long);

If \({S}_1\) is sampled from \(\mathcal{D}_1\) with seed state \(r\) and \({S}_2\) is sampled from \(\mathcal{D}_2\) with seed state equal to \({S}_1.\texttt{next_state},\) then the block operator \({S}\) is the same as the matrix sampled from \(\mathcal{D}\) with seed state \(r.\) As with sketching from the left, this shows there are situations where it can suffice to define a single operator and sketch with appropriate submatrices of that operator.

Rank-\(k\) updates

A rank-k update is a multiply-accumulate operation involving matrices. It involves a pair of matrices \({X}\) and \({Y}\) that have k columns and k rows, respectively. It also involves a real scalar \(\alpha\) and a matrix \({B}\) of the same shape as \({X} {Y}.\) The operation itself is

\[{B} \leftarrow {B} + \alpha {X} {Y}.\]

Here we describe some rank-k updates that arise in sketching algorithms.

This framework can be used to describe incorporating new data into a sketch, or removing the contributions of old data from a sketch. We’ve focused our documentation efforts on the cases that add data. More specifically, we focus on when we’re performing a rank-k update to add new data into an existing sketch, but k was not known when the original sketch was formed. This case has more complications than if k was known in advance, but it can still be handled with RandBLAS when using distributions with if \(\texttt{major_axis} = \texttt{Short}.\)

Note

Future updates (pun intended) to these web docs will explain how the major-axis requirement can be dropped if an upper bound on k is known in advance. That really just amounts to explaining in detail how you operate with submatrices in RandBLAS. Incidentally, operating with submatrices is really all you need to perform rank-k updates that “remove” data.

Adding data: left-sketching

Problem statement

We start with an \(m \times n\) data matrix \({A}_1.\) When presented with this matrix, we sample a wide operator \({S}_1\) from a distribution \(\mathcal{D}_1\) (defined as follows)

// Assumptions
//  * SkDist is DenseDist or SparseDist
//  * arg3 is any variable that makes sense for the SkDist constructor.
//  * This example requires d < m.
SkDist D1(d, m, arg3, Axis::Short);

and we compute a sketch \({B} = {S}_1 {A}_1.\)

Sometime later, a \(k \times n\) matrix \({A}_2\) arrives. We want to independently sample a \(d \times k\) operator \({S}_2\) from some \(\mathcal{D}_2\) and perform a rank-k update \({B} \leftarrow {B} + {S}_2 {A}_2.\) In essence, we want to redefine

\[\begin{split}{B} = \begin{bmatrix} {S}_1 & {S}_2 \end{bmatrix} \begin{bmatrix} {A}_1 \\ {A}_2 \end{bmatrix}\end{split}\]

without having to revisit \({A}_1.\)

Conceptual solution

Since \(\texttt{major_axis}\) is Short and \(d < m,\) the columns of \(d \times m\) matrices sampled from \(\mathcal{D}_1\) will be sampled independently from a shared distribution on \(\mathbb{R}^d.\) This suggests we could define \(\mathcal{D}_2\) as a distribution over \(d \times k\) matrices whose columns follow the same distribution used in \(\mathcal{D}_1.\)

Implementation

If \(d < k,\) then short-axis vectors of a \(d \times k\) matrix still refer to columns. This makes it possible to express \(\mathcal{D}_2\) explicitly:

SkDist D2(d, k, arg3, Axis::Short);

If \(d \geq k,\) then we have to define a distribution \(\mathcal{D}\) over \(d \times (m + k)\) matrices

SkDist D(d, m + k, arg3, Axis::Short);

and think of \(\mathcal{D}_2\) as the distribution obtained by selecting the trailing \(k\) columns of a sample from \(\mathcal{D}.\)

This second approach may look wasteful, but that’s not really the case. If a DenseSkOp is used in one of RandBLAS’ functions for sketching with a specified submatrix, only the submatrix that’s necessary for the operation will be generated. The following code snippet provides more insight on the situation.

SkDist D1( d,     m, arg3, Axis::Short );
SkDist D(  d, m + k, arg3, Axis::Short );
// Since d < m and we're short-axis major, the columns of matrices sampled from
// D1 or D1 will be sampled i.i.d. from some distribution on R^d.
using SkOp = typename SkDist::distribution_t;
SkOp S1( D1,  seed_state ); // seed_state is some RNGState.
SkOp  S(  D,  seed_state );
// With these definitions, S1 is *always* equal to the first m columns of S.
// We recover S2 by working implicitly with the trailing k columns of S.

Adding data: right-sketching

Problem statement

We start with an \(m \times n\) data matrix \({A}_1.\) When presented with this matrix, we sample a tall operator \({S}_1\) from a distribution \(\mathcal{D}_1\) of the form

// Assumptions
//  * SkDist is DenseDist or SparseDist
//  * arg3 is any variable that makes sense for the SkDist constructor.
//  * This example requires n > d.
SkDist D1(n, d, arg3, Axis::Short);

and we compute a sketch \({B} = {A}_1 {S}_1.\)

Sometime later, an \(m \times k\) matrix \({A}_2\) arrives. We want to independently sample an \(k \times d\) operator \({S}_2\) from some \(\mathcal{D}_2\) and perform a rank-k update \({B} \leftarrow {B} + {A}_2 {S}_2.\) Essentially, we want to redefine

\[\begin{split}{B} = \begin{bmatrix} {A}_1 & {A}_2 \end{bmatrix}\begin{bmatrix} {S}_1 \\ {S}_2 \end{bmatrix}\end{split}\]

without having to revisit \({A}_1.\)

Conceptual solution

The idea is the same as with left-sketching. The difference is that since we’re sketching from the right with a tall \(n \times d\) operator, the short-axis vectors are rows instead of columns. This means the rows of \({S}_1\) are sampled independently from some distribution on \(\mathbb{R}^d,\) and we can define \({S}_2\) by sampling its rows from that same distribution.

Implementation

If \(k > d,\) then we can represent \(\mathcal{D}_2\) explicitly, constructing it as follows.

SkDist D2(k, d, arg3, Axis::Short);

If \(k \leq d,\) then we have to define a distribution \(\mathcal{D}\) over \((n + k) \times d\) matrices

SkDist D(n + k, d, arg3, Axis::Short);

and think of \(\mathcal{D}_2\) as the distribution obtained by selecting the bottom \(k\) rows of a sample from \(\mathcal{D}.\)

As with the left-sketching case, we provide a code snippet to summarize the situation.

SkDist D1(     n, d, arg3, Axis::Short );
SkDist D(  n + k, d, arg3, Axis::Short );
// Since n > d and we're short-axis major, the rows of matrices sampled from
// D1 or D1 will be sampled i.i.d. from some distribution on R^d.
using SkOp = typename SkDist::distribution_t;
SkOp S1( D1,  seed_state ); // seed_state is some RNGState.
SkOp  S(  D,  seed_state );
// With these definitions, S1 is *always* equal to the first m rows of S.
// We recover S2 by working implicitly with the last k rows of S.