Sampling a sketching operator

RandBLAS relies on counter-based random number generators (CBRNGs) from Random123. A CBRNG returns a random number upon being called with two integer parameters: the counter and the key. The time required for the CBRNG to return does not depend on either of these parameters. A serial application can set the key at the outset of the program and never change it, while parallel applications should use different keys across different threads. Sequential calls to the CBRNG with a fixed key should use different values for the counter.

RandBLAS doesn’t expose CBRNGs directly. Instead, it exposes an abstraction of a CBRNG’s state as defined in the RNGState class. RNGState objects are needed to construct sketching operators.

Constructing RNGStates

There are two ways to construct an RNGState from scratch:

RandBLAS::RNGState s1();     // key and counter are initialized to 0.
RandBLAS::RNGState s2(42);   // key set to 42, counter set to 0.

Note that in both cases the counter is initialized to zero. This is important: you should never set the counter yourself! If you want statistically independent runs of the same program, then you can start with different values for the key.

You can also construct an RNGState with a copy operation:

RandBLAS::RNGState s3(s1);   // s3 is a copy of s1.

Constructing your first sketching operator

RandBLAS provides several constructors for the DenseSkOp and SparseSkOp classes. However, the recommended constructors for these classes just accept two parameters: a representation of a distribution (i.e., a DenseDist or a SparseDist) and an RNGState.

For example, the following code produces a \(10000 \times 50\) dense sketching operator whose entries are iid samples from the uniform distribution over \([-1, 1]\).

RandBLAS::RNGState my_state();
RandBLAS::DenseDist my_dist(10000, 50);
RandBLAS::DenseSkOp<double> S(my_dist, my_state);
// my_state is stored as a constant value S.seed_state.
// S.seed_state will be accessed by RandBLAS' random sampling
// functions behind the scenes, only when needed.

We note that the numerical precision of the sketching operator must be specified with a template parameter; the entries of the sketching operator are defined by sampling in single precision and then casting the sample to double if needed.

Formal API docs for the recommended constructors can be found here and here.

Constructing your \(N^{\text{th}}\) sketching operator, for \(N > 1\)

Suppose you have an application that requires two independent dense sketching operators, \(\texttt{S1}\) and \(\texttt{S2}\), each of size \(10000 \times 50\). How should you get your hands on these objects?

Warning

If you try to construct those sketching operators as follows …

RandBLAS::RNGState my_state();
RandBLAS::DenseDist my_dist(10000, 50);
RandBLAS::DenseSkOp<double> S1(my_dist, my_state);
RandBLAS::DenseSkOp<double> S2(my_dist, my_state);

then your results would be invalid! Far from being independent, \(\texttt{S1}\) and \(\texttt{S2}\) would be equal from a mathematical perspective.

One correct approach is to first fill the entries of \(\texttt{S1}\), and then call the constructor for \(\texttt{S2}\) using \(\texttt{S1.next_state}\) as its RNGState argument:

RandBLAS::RNGState my_state();
RandBLAS::DenseDist my_dist(10000, 50);
RandBLAS::DenseSkOp<double> S1(my_dist, my_state);
// ^ Defines S1 from a mathematical perspective, but performs no work.
//   RandBLAS hasn't made any calls to a CBRNG.
RandBLAS::fill_dense(S1);
// ^ RandBLAS uses a CBRNG to sample each entry of S1,
//   then sets S1.next_state.
RandBLAS::DenseSkOp<double> S2(my_dist, S1.next_state);

The shortcoming of this approach is that it requires explicitly filling \(\texttt{S1}\) just to define \(\texttt{S2}\) in a mathematical sense. There are two alternative approaches you can use if this shortcoming is significant in your application.

Alternative 1. Define one larger sketching operator (here, a \(10000\times 100\) sketching operator) and operate with appropriate submatrices of that larger operator when needed.

Alternative 2. Declare two RNGState objects from the beginning using different keys, as in the following code:

RandBLAS::RNGState my_state1(19);
// ^ An RNGState with zero'd counter and key initialized to 19.
RandBLAS::RNGState my_state2(93);
// ^ An RNGState with zero'd counter and key initialized to 93.
RandBLAS::DenseDist my_dist(10000, 50);
RandBLAS::DenseSkOp<double> S1(my_dist, my_state1);
RandBLAS::DenseSkOp<double> S2(my_dist, my_state2);
// ^ S1 and S2 are defined only from a mathematical perspective.
//   No real work is performed here.