.. |seedstate| mathmacro:: \texttt{seed_state}
  .. |nextstate| mathmacro:: \texttt{next_state}
  .. |majoraxis| mathmacro:: \texttt{major_axis}
  .. |ttt| mathmacro:: \texttt
  .. |D| mathmacro:: \mathcal{D}
  .. |mS| mathmacro:: {S}
  .. |mA| mathmacro:: {A}
  .. |mB| mathmacro:: {B}
  .. |mX| mathmacro:: {X}
  .. |mY| mathmacro:: {Y}
  .. |R| mathmacro:: \mathbb{R}
  .. |rank| mathmacro:: \operatorname{rank}

.. _sketch_updates:


*********************************************************************************************
Updating and downdating sketches
*********************************************************************************************

This page presents four ways of updating a sketch.
We use MATLAB notation for in-line concatenation of matrices.


Increasing sketch size 
-----------------------

The scenarios here use sketching operators :math:`S_1` and :math:`S_2` 
that are sampled independently from distributions :math:`\D_1` and :math:`\D_2.`
We denote the isometry scales of :math:`\D_1` and :math:`\D_2` by :math:`\alpha_1` and :math:`\alpha_2,` respectively.

Increasing the size of a sketch is glorified concatenation.
The only subtlety is how to perform the update in a way that preserves isometric scaling
(which can be useful in contexts like norm estimation).

Our main purpose in explaining these updates is to highlight the effects of 
setting a distribution's :math:`\majoraxis` to Long.
If you're working with DenseDists and DenseSkOps (where there's no statistical difference
between short-axis-major or long-axis-major) then this choice of major axis (which is
the default) can provide an additional measure of reproducibility in experiments that require
tuning sketch size.


Sketching from the left
~~~~~~~~~~~~~~~~~~~~~~~
Here the :math:`\D_i` are distributions over wide matrices, and we set :math:`\mB_i = \mS_i \mA.`
To increase sketch size is to combine these individual sketches via concatenation:

.. math::
  \mB = \begin{bmatrix} \mB_1 \\ \mB_2 \end{bmatrix} = \begin{bmatrix} \mS_1 \mA \\ \mS_2 \mA \end{bmatrix}.

It only makes sense to do this if :math:`\mB` ends up having fewer rows than :math:`\mA.`
Put another way, this type of update only makes sense if the block operator
defined by :math:`\mS = [\mS_1;~\mS_2]` is also wide.

It is important to be aware of the basic statistical properties of this block operator,
so we'll give its distribution the name :math:`\D.`
The isometry scale of :math:`\D` is :math:`\alpha = (\alpha_1^{-2} + \alpha_2^{-2})^{-1/2}.`
If :math:`B_1` was computed with isometric scaling (that is, if :math:`\mB_1 = \alpha_1 \mS_1 \mA`),
then isometrically-scaled
updated sketch would be :math:`B = \alpha [ \mB_1/\alpha_1;~ \mB_2].`

RandBLAS can explicitly represent :math:`\D` under certain conditions, which we express with a code snippet.

.. code:: c++

  // SkDist is either DenseDist or SparseDist.
  // (d1, d2, and m) are positive integers where d1 + d2 < m.
  // arg3 is any variable allowed in the third argument of the SkDist constructor.
  SkDist D1(     d1, m, arg3, Axis::Long);
  SkDist D2(     d2, m, arg3, Axis::Long);
  SkDist  D(d1 + d2, m, arg3, Axis::Long);


Furthermore, if :math:`\mS_1.\nextstate` is the seed state for :math:`\mS_2`, then
the resulting block operator :math:`\mS = [\mS_1;~\mS_2]` equals the operator obtained by sampling from
:math:`\D` with :math:`\mS_1.\seedstate.`

This presents another option for how sketch updates might be performed.
Rather than working with two sketching operators explicitly,
one can work with a single operator :math:`\mS` sampled from the larger
distribution :math:`\D,` and compute :math:`\mB_1` and :math:`\mB_2` by working with
appropriate submatrices of :math:`\mS.`

Sketching from the right
~~~~~~~~~~~~~~~~~~~~~~~~
Here the :math:`\D_i` are distributions over tall matrices, and :math:`\mB_i = A \mS_i.`
The combined sketch is

.. math::

    \mB = \begin{bmatrix} \mB_1 & \mB_2 \end{bmatrix} = \begin{bmatrix} \mA \mS_1 & \mA \mS_2 \end{bmatrix},

and it can be obtained by right-multiplying :math:`\mA` with the block operator 
:math:`\mS = [\mS_1,~\mS_2].`

The isometry scale of :math:`\mS`'s distribution is the same as before: :math:`\alpha = (\alpha_1^{-2} + \alpha_2^{-2})^{-1/2},`
and RandBLAS can explicitly represent this distribution under the following conditions
(:math:`\ttt{SkDist}` and :math:`\ttt{x}` are as before).

.. code:: c++

  // (d1, d2, n) are positive integers where d1 + d2 < n.
  SkDist D1(n,      d1, arg3, Axis::Long);
  SkDist D2(n,      d2, arg3, Axis::Long);
  SkDist  D(n, d1 + d2, arg3, Axis::Long);

If :math:`\mS_1` is sampled from :math:`\D_1` with seed state :math:`r` and 
:math:`\mS_2` is sampled from :math:`\D_2` with seed state equal to :math:`\mS_1.\nextstate,`
then the block operator :math:`\mS` is the same as the matrix sampled from 
:math:`\D` with seed state :math:`r.`
As with sketching from the left, this shows there are situations where
it can suffice to define a single operator and sketch with appropriate submatrices
of that operator.


Rank-:math:`k` updates
----------------------

A *rank-k update* is a multiply-accumulate operation involving matrices. 
It involves a pair of matrices :math:`\mX` and :math:`\mY` that have k columns and k rows, respectively.
It also involves a real scalar :math:`\alpha` and a matrix :math:`\mB` of the same shape as :math:`\mX \mY.`
The operation itself is 

.. math::
    \mB \leftarrow \mB  + \alpha \mX \mY.

Here we describe some rank-k updates that arise in sketching algorithms.

This framework can be used to describe incorporating new data into a sketch,
or removing the contributions of old data from a sketch.
We've focused our documentation efforts on the cases that add data.
More specifically, we focus on when we're performing a rank-k update to add new
data into an existing sketch, but k was not known when the original sketch was
formed. 
This case has more complications than if k was known in advance, but it can still be handled
with RandBLAS when using distributions with *if* :math:`\majoraxis = \ttt{Short}.`

.. note::
  Future *updates* (pun intended) to these web docs will explain how the major-axis requirement
  can be dropped if an upper bound on k is known in advance. That really just amounts to explaining
  in detail how you operate with submatrices in RandBLAS. Incidentally, operating with submatrices
  is really all you need to perform rank-k updates that "remove" data.


Adding data: left-sketching
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem statement
  We start with an :math:`m \times n` data matrix :math:`\mA_1.`
  When presented with this matrix, we sample a wide operator :math:`\mS_1` from a distribution :math:`\D_1`
  (defined as follows)

  .. code:: c++
  
      // Assumptions
      //  * SkDist is DenseDist or SparseDist
      //  * arg3 is any variable that makes sense for the SkDist constructor.
      //  * This example requires d < m.
      SkDist D1(d, m, arg3, Axis::Short);

  and we compute a sketch :math:`\mB = \mS_1 \mA_1.`

  Sometime later, a :math:`k \times n` matrix :math:`\mA_2` arrives.
  We want to independently sample a :math:`d \times k` operator :math:`\mS_2` from *some* :math:`\D_2` and perform a rank-k update :math:`\mB \leftarrow \mB + \mS_2 \mA_2.`
  In essence, we want to redefine

  .. math::

      \mB = \begin{bmatrix} \mS_1 & \mS_2 \end{bmatrix} \begin{bmatrix} \mA_1 \\ \mA_2 \end{bmatrix}

  without having to revisit :math:`\mA_1.`

Conceptual solution
  Since :math:`\majoraxis` is Short and :math:`d < m,` the columns of :math:`d \times m` matrices sampled from :math:`\D_1`
  will be sampled independently from a shared distribution on :math:`\mathbb{R}^d.`
  This suggests we could define :math:`\D_2` as a distribution over :math:`d \times k` matrices whose columns follow the same
  distribution used in :math:`\D_1.`

Implementation
  If :math:`d < k,` then short-axis vectors of a :math:`d \times k` matrix still refer to columns.
  This makes it possible to express :math:`\D_2` explicitly:
  
  .. code:: c++
    
    SkDist D2(d, k, arg3, Axis::Short);
  
  If :math:`d \geq k,` then we have to define a distribution :math:`\D` over :math:`d \times (m + k)` matrices  

  .. code:: c++

      SkDist D(d, m + k, arg3, Axis::Short);

  and think of :math:`\D_2` as the distribution obtained by selecting the trailing :math:`k` columns of a sample from :math:`\D.`

  This second approach may look wasteful, but that's not really the case.
  If a DenseSkOp is used in one of RandBLAS' functions for sketching with a specified submatrix,
  only the submatrix that's necessary for the operation will be generated.
  The following code snippet provides more insight on the situation.

  .. code:: c++

      SkDist D1( d,     m, arg3, Axis::Short );
      SkDist D(  d, m + k, arg3, Axis::Short ); 
      // Since d < m and we're short-axis major, the columns of matrices sampled from
      // D1 or D1 will be sampled i.i.d. from some distribution on R^d.

      auto S1 = D1.sample( seed_state ); // seed_state is some RNGState.
      auto S  =  D.sample( seed_state );
      // With these definitions, S1 is *always* equal to the first m columns of S.
      // We recover S2 by working implicitly with the trailing k columns of S.


Adding data: right-sketching
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Problem statement
  We start with an :math:`m \times n` data matrix :math:`\mA_1.`
  When presented with this matrix, we sample a tall operator :math:`\mS_1` from a distribution :math:`\D_1` of the form

  .. code:: c++
  
      // Assumptions
      //  * SkDist is DenseDist or SparseDist
      //  * arg3 is any variable that makes sense for the SkDist constructor.
      //  * This example requires n > d.
      SkDist D1(n, d, arg3, Axis::Short);

  and we compute a sketch :math:`\mB = \mA_1 \mS_1.`

  Sometime later, an :math:`m \times k` matrix :math:`\mA_2` arrives.
  We want to independently sample an :math:`k \times d` operator :math:`\mS_2` from *some* :math:`\D_2`
  and perform a rank-k update :math:`\mB \leftarrow \mB + \mA_2 \mS_2.`
  Essentially, we want to redefine

  .. math::

      \mB = \begin{bmatrix} \mA_1 & \mA_2 \end{bmatrix}\begin{bmatrix} \mS_1 \\ \mS_2 \end{bmatrix} 

  without having to revisit :math:`\mA_1.`

Conceptual solution
  The idea is the same as with left-sketching. The difference is that since we're sketching from the
  right with a tall :math:`n \times d` operator, the short-axis vectors are rows instead of columns.
  This means the rows of :math:`\mS_1` are sampled independently from some distribution on :math:`\R^d,`
  and we can define :math:`\mS_2` by sampling its rows from that same distribution.

Implementation
  If :math:`k > d,` then we can represent :math:`\D_2` explicitly, constructing it as follows.
  
  .. code:: c++
    
    SkDist D2(k, d, arg3, Axis::Short);
  
  If :math:`k \leq d,` then we have to define a distribution :math:`\D` over :math:`(n + k) \times d` matrices  

  .. code:: c++

      SkDist D(n + k, d, arg3, Axis::Short);

  and think of :math:`\D_2` as the distribution obtained by selecting the bottom :math:`k` rows of a sample from :math:`\D.`

  As with the left-sketching case, we provide a code snippet to summarize the situation.
  
  .. code:: c++

      SkDist D1(     n, d, arg3, Axis::Short );
      SkDist D(  n + k, d, arg3, Axis::Short ); 
      // Since n > d and we're short-axis major, the rows of matrices sampled from
      // D1 or D1 will be sampled i.i.d. from some distribution on R^d.

      auto S1 = D1.sample( seed_state ); // seed_state is some RNGState.
      auto S  =  D.sample( seed_state );
      // With these definitions, S1 is *always* equal to the first m rows of S.
      // We recover S2 by working implicitly with the last k rows of S.