xbtorch.decomposition.nmf

Non-negative Matrix Factorization (NMF) decomposition module.

Provides the NMF class, which implements gradient decomposition using a low-rank non-negative matrix factorization. Supports both standard batch and streaming modes.

This method separates positive and negative components of the gradient, reducing memory usage and communication costs while maintaining reasonable approximation accuracy.

Classes

NMF([rank, streaming, max_iters])

Non-negative Matrix Factorization (NMF) gradient decomposition.

class xbtorch.decomposition.nmf.NMF(rank=1, streaming=False, max_iters=2000)[source]

Bases: GenericDecomposition

Non-negative Matrix Factorization (NMF) gradient decomposition.

This method approximates the full gradient matrix as a low-rank factorization into non-negative components, separating positive and negative contributions. The decomposition reduces the memory and communication cost of training while potentially reducing the number of device writes in memristive crossbar arrays.

XBTorch supports two operation modes:

Standard mode: Each batch is decomposed independently.
Streaming mode: Estimates from previous batches are reused (naive streaming), as described in [Hoskins et al., 2021]. This can improve stability across batches but is approximate.

Parameters:

rank (int, optional (default=1)) – The target rank for the low-rank factorization. Smaller ranks yield greater compression but less fidelity.
streaming (bool, optional (default=False)) – If True, reuses decomposition factors across batches for incremental (“streaming”) updates. Requires group_param_idx in decompose().
max_iters (int, optional (default=2000)) – Maximum number of iterations for the sklearn NMF solver.

info

Stores intermediate decomposition estimates (Wp, Hp, Wn, Hn) for reuse in streaming mode. Keys are grouped by parameter index.

Type:: dict

Notes

Positive and negative parts of the gradient are decomposed separately to respect the non-negativity constraint of NMF.
Streaming mode here is “naive”: it re-initializes new NMF runs with previous factor estimates but still processes the full gradient each time.

References

Hoskins et al., “Streaming batch eigenupdates for hardware neural networks”, Frontiers in Neuroscience, 2019.
Vogels et al., “PowerSGD: Practical low-rank gradient compression for distributed optimization”, NeurIPS 2019.
Huang et al., “Low-rank gradient descent for memory-efficient training of deep in-memory arrays”, ACM JETC, 2023.

decompose(input, delta, gradient, group_param_idx=None)[source]

Perform NMF-based gradient decomposition.

Parameters:

input (torch.Tensor) – Input activations for the current layer, shape (batch_size, input_dim).
delta (torch.Tensor) – Backpropagated errors for the current layer, shape (batch_size, output_dim).
gradient (torch.Tensor) – Gradient matrix to be decomposed, shape (output_dim, input_dim).
group_param_idx (int, optional) – Identifier for parameter grouping. Required in streaming mode to manage reuse of factors.

Returns:

Low-rank approximation of the gradient with shape (output_dim, input_dim), reconstructed as:

approx_grad = (Wp @ Hp) - (Wn @ Hn)

where Wp, Hp are the positive factorization components, and Wn, Hn are the negative components (flipped positive before NMF).

Return type:

torch.Tensor

Raises:

RuntimeError – If streaming is enabled but group_param_idx is not provided.

Notes

Positive and negative parts of the gradient are decomposed separately, with the negative part flipped to satisfy non-negativity.
In streaming mode, factor estimates from previous calls are reused as custom initializations for subsequent decompositions.