Distributions and sketching operators
The state of a random number generator
-
template<typename RNG = r123::Philox4x32>
struct RNGState A representation of the state of a counter-based random number generator (CBRNG) defined in Random123. The representation consists of two arrays: the counter and the key. The arrays’ types are statically sized, small (typically of length 2 or 4), and can be distinct from one another.
- Template Parameters:
RNG – A CBRNG type in defined in Random123. We’ve found that Philox-based CBRNGs work best for our purposes. Strictly speaking, we allow all Random123 CBRNGs besides those based on AES.
Public Types
Public Functions
-
inline RNGState()
Initialize the counter and key arrays to all zeros.
-
inline RNGState(key_uint_type k)
Initialize the counter array to all zeros. Initialize the key array to have first element equal to k and all other elements equal to zero.
DenseDist and DenseSkOp
-
enum class RandBLAS::DenseDistName : char
We call a sketching operator “dense” if (1) it is naturally represented with a buffer and (2) the natural way to apply that operator to a matrix is to use the operator’s buffer in GEMM.
We support two distributions for dense sketching operators: those whose entries are iid Gaussians or iid uniform over a symmetric interval. For implementation reasons, we also expose an option to indicate that an operator’s distribution is unknown but it is still represented by a buffer that can be used in GEMM.
Values:
-
enumerator Gaussian
Indicates the Gaussian distribution with mean 0 and standard deviation 1.
-
enumerator Uniform
Indicates the uniform distribution over [-1, 1].
-
enumerator BlackBox
Indicates that the sketching operator’s entries will only be specified by a user-provided buffer.
-
enumerator Gaussian
-
struct DenseDist
A distribution over dense sketching operators.
Public Functions
-
inline DenseDist(int64_t n_rows, int64_t n_cols, DenseDistName dn = DenseDistName::Uniform)
A distribution over matrices of shape (n_rows, n_cols) with entries drawn iid from either the standard normal distribution or the uniform distribution over [-1, 1].
Public Members
-
const int64_t n_rows
Matrices drawn from this distribution have this many rows.
-
const int64_t n_cols
Matrices drawn from this distribution have this many columns.
-
const DenseDistName family
The distribution used for the entries of the sketching operator.
-
const MajorAxis major_axis
This member indirectly sets the storage order of buffers of sketching operators that are sampled from this distribution.
We note that the storage order of a DenseSkOp’s underlying buffer does not affect whether the operator can be applied to row-major or column-major data. Mismatched data layouts are resolved automatically and with zero copies inside RandBLAS::sketch_general. Therefore most users need not spend any brain power thinking about how this value should be set.
-
inline DenseDist(int64_t n_rows, int64_t n_cols, DenseDistName dn = DenseDistName::Uniform)
-
template<typename T, typename RNG = r123::Philox4x32>
struct DenseSkOp A sample from a distribution over dense sketching operators.
Public Functions
-
inline DenseSkOp(DenseDist dist, RNGState<RNG> const &state)
The preferred constructor for DenseSkOp objects. There are other constructors, but they don’t appear in the web documentation.
- Parameters:
dist – [in] A DenseDist object.
Defines the number of rows and columns in this sketching operator.
Defines the (scalar-valued) distribution of each entry in this sketching operator.
state – [in] An RNGState object.
The RNG will use this as the starting point to generate all random numbers needed for this sketching operator.
Public Members
-
const DenseDist dist
The distribution from which this sketching operator is sampled. This member specifies the number of rows and columns of the sketching operator.
-
inline DenseSkOp(DenseDist dist, RNGState<RNG> const &state)
-
template<typename T, typename RNG>
RNGState<RNG> RandBLAS::fill_dense(DenseSkOp<T, RNG> &S) Performs the work in sampling S from its underlying distribution. This entails allocating a buffer of size S.dist.n_rows * S.dist.n_cols, attaching that buffer to S as S.buff, and finally sampling iid random variables to populate S.buff. A flag is set on S so its destructor will deallocate S.buff.
By default, RandBLAS allocates and populates buffers for dense sketching operators just before they are needed in some operation, and then it deletes these buffers once the operation is complete. Calling this function bypasses that policy.
SparseDist and SparseSkOp
-
struct SparseDist
A distribution over sparse matrices.
Public Members
-
const int64_t n_rows
Matrices drawn from this distribution have this many rows.
-
const int64_t n_cols
Matrices drawn from this distribution have this many columns.
-
const int64_t vec_nnz
If this distribution is short-axis major, then matrices sampled from it will have exactly \({\texttt{vec_nnz}}\) nonzeros per short-axis vector (i.e., per column of a wide matrix or row of a tall matrix). One would be paranoid to set this higher than, say, eight, even when sketching very high-dimensional data.
If this distribution is long-axis major, then matrices sampled from it will have at most \({\texttt{vec_nnz}}\) nonzeros per long-axis vector (i.e., per row of a wide matrix or per column of a tall matrix).
-
const MajorAxis major_axis = MajorAxis::Short
Constrains the sparsity pattern of matrices drawn from this distribution.
Having major_axis==Short results in sketches are more likely to contain useful geometric information, without making assumptions about the data being sketched.
-
const int64_t n_rows
-
template<typename T, typename RNG = r123::Philox4x32, SignedInteger sint_t = int64_t>
struct SparseSkOp A sample from a prescribed distribution over sparse matrices.
Public Functions
-
SparseSkOp(SparseDist dist, const RNGState<RNG> &state)
The preferred constructor for SparseSkOp objects. There are other constructors, but they don’t appear in the web documentation.
- Parameters:
dist – [in] A SparseDist object.
Defines the number of rows and columns in this sketching operator.
Indirectly controls sparsity pattern.
Directly controls sparsity level.
state – [in] An RNGState object.
The RNG will use this as the starting point to generate all random numbers needed for this sketching operator.
Public Members
-
const SparseDist dist
The distribution from which this sketching operator is sampled. This member specifies the number of rows and columns of the sketching operator.
-
const RNGState<RNG> seed_state
The state that should be passed to the RNG when the full sketching operator needs to be sampled from scratch.
-
RNGState<RNG> next_state
The state that should be used by the next call to an RNG after the full sketching operator has been sampled.
-
const bool own_memory = true
We need workspace to store a representation of the sampled sketching operator. This member indicates who is responsible for allocating and deallocating this workspace. If own_memory is true, then RandBLAS is responsible.
-
bool known_filled = false
A flag (indicating a sufficient condition) that the data underlying the sparse matrix has already been sampled.
-
SparseSkOp(SparseDist dist, const RNGState<RNG> &state)
-
template<typename T, typename RNG, SignedInteger sint_t>
RNGState<RNG> RandBLAS::fill_sparse(SparseSkOp<T, RNG, sint_t> &S) Performs the work in sampling S from its underlying distribution. This entails populating S.rows, S.cols, and S.vals with COO-format sparse matrix data.
RandBLAS will automatically call this function if and when it is needed.
- Parameters:
S – [in] SparseSkOp object.
- Returns:
An RNGState object. This is the state that should be used the next time the program needs to generate random numbers for a randomized algorithm.
Advanced material
-
template<typename T, typename RNG>
std::pair<blas::Layout, RandBLAS::RNGState<RNG>> RandBLAS::fill_dense(const DenseDist &D, int64_t n_rows, int64_t n_cols, int64_t ro_s, int64_t co_s, T *buff, const RNGState<RNG> &seed) -
Fill \({\mathtt{buff}}\) so that \({\operatorname{mat}(\mathtt{buff})}\) is a submatrix of an implicit random sample from \({\mathcal{D}}\) .
If we denote the implicit sample from \({\mathcal{D}}\) by \({S}\) , then we have
\[\operatorname{mat}(\mathtt{buff}) = S[\mathtt{i\_off}:(\mathtt{i\_off} + \mathtt{n\_rows}),\, \mathtt{j\_off}:(\mathtt{j\_off} + \mathtt{n\_cols})]\]on exit.- Parameters:
D – [in] A DenseDist object.
A distribution over random matrices of shape (D.n_rows, D.n_cols).
n_rows – [in] A positive integer.
The number of rows in \({\operatorname{mat}(\mathtt{buff})}\) .
n_cols – [in] A positive integer.
The number of columns in \({\operatorname{mat}(\mathtt{buff})}\) .
ro_s – [in] A nonnegative integer.
The row offset for \({\operatorname{mat}(\mathtt{buff})}\) as a submatrix of \({S}\) .
We require that \({\mathtt{i\_off} + \mathtt{n\_rows}}\) is at most D.n_rows.
co_s – [in] A nonnegative integer.
The column offset for \({\operatorname{mat}(\mathtt{buff})}\) as a submatrix of \({S}\) .
We require that \({\mathtt{j\_off} + \mathtt{n\_cols}}\) is at most D.n_cols.
buff – [in] Buffer of type T.
Length must be at least \({\mathtt{n\_rows} \cdot \mathtt{n\_cols}}\) .
The leading dimension of \({\operatorname{mat}(\mathtt{buff})}\) when reading from \({\mathtt{buff}}\) is either \({\mathtt{n\_rows}}\) or \({\mathtt{n\_cols}}\) , depending on the return value of this function that indicates row-major or column-major layout.
seed – [in] A CBRNG state
Used to define \({S}\) as a sample from \({\mathcal{D}}\) .
- Returns:
A std::pair consisting of “layout” and “next_state”.
\({\mathtt{buff}}\) must be read in “layout” order to recover \({\operatorname{mat}(\mathtt{buff})}\) . This layout is determined from \({\mathcal{D}}\) and cannot be controlled directly.
If this function returns a layout that is undesirable then it is the caller’s responsibility to perform a transpose as needed.