Nonnegative Least Squares (NNLS) is a fundamental constrained optimization problem encountered in many applications such as image deblurring, signal processing, nonnegative matrix factorization, magnetic microscopy, and hyperspectral imaging. Active-set based methods are a common class of algorithms for solving NNLS which identify the optimal variable set of the NNLS solution. They do so by iteratively solving a series of unconstrained least squares problems, identifying which variables violate the nonnegativity constraints, and then swapping variables in/out of consideration until the optimal set of variables is found. Several variations improving upon this method exist in the literature. In this work, we propose an active-set swap heuristic which further improves upon existing active-set based methods for NNLS. Our optimizations are based upon adding multiple variables to the passive set within a threshold of the smallest gradient value and removing variables within a similar threshold of the closest boundary constraint. We leverage these optimizations to yield a Fast Active-Set Thresholding NNLS (FAST-NNLS) algorithm which significantly outperforms the existing state-of-the-art NNLS algorithms for a wide range of problems. Rigorous convergence guarantees are proven for the proposed method. We demonstrate the effectiveness of our proposed method on multiple synthetic datasets and two real-world text analysis applications. In doing so, we present the most comprehensive NNLS solver comparison in the literature to date.
@inproceedings{11402623,author={Cobb, Benjamin and Kannan, Ramakrishnan and Pieper, Konstantin and Sao, Piyush and Soh, Yongseok and Choi, Jee W. and Vuduc, Richard and Park, Haesun},booktitle={2025 IEEE International Conference on Big Data (BigData)},title={Fast Active-Set Thresholding Method for Nonnegative Least Squares},year={2025},pages={},keywords={Nonnegative Least Squares;Active-Set Methods},doi={10.1109/BigData66926.2025.11402623},}
2024
Clustering and Topic Discovery of Multiway Data via Joint-NCMTF
Benjamin
Cobb, Ricardo
Velasquez, Richard
Vuduc, and
1 more author
In 2024 IEEE International Conference on Big Data (BigData), 2024
Nonnegative Matrix Factorization (NMF) and Non- negative Coupled Matrix Tensor Factorization (NCMTF) are Constrained Low-Rank Approximation (CLRA) models which have found use in many applications. In particular, NMF and its variants have been shown to produce high-quality soft clustering and topic modeling results with the property that each clustering assignment relates to a corresponding topic; thereby providing insight into the nature of each item in a given cluster. However, NMF and its variants are unable to process heterogeneous data represented as one or more coupled tenors. Similarly, there do not exist tensorized methods which fully preserve the afore- mentioned desirable clustering and topic modeling properties of NMF. This paper develops a higher order analog of Joint-NMF, Joint Nonnegative Coupled Matrix Tensor Factorization (Joint- NCMTF), capable of factorizing heterogeneous tensor datasets whilst fully preserving these NMF properties. To accomplish this, we develop higher-order analogs of the entire NMF process, including crucial pre and post-processing steps. By incorporating additional dimensions of information present in datasets posed as coupled higher-order tensors, our proposed Joint-NCMTF method yields higher quality clustering and topic modeling results than methods which incorporate less information. We empirically demonstrate the effectiveness of our proposed method on multiple synthetic and two real-world topic modeling tasks.
@inproceedings{10825741,author={Cobb, Benjamin and Velasquez, Ricardo and Vuduc, Richard and Park, Haesun},booktitle={2024 IEEE International Conference on Big Data (BigData)},title={Clustering and Topic Discovery of Multiway Data via Joint-NCMTF},year={2024},volume={},number={},pages={1268-1275},keywords={Tensors;Big Data;Numerical models;Numerical analysis;approximation methods;text analysis;clustering methods},doi={10.1109/BigData62323.2024.10825741},}
On Rank Selection for Nonnegative Matrix Factorization
Srinivas
Eswar, Koby
Hayashi, Benjamin
Cobb, and
4 more authors
In 2024 IEEE International Conference on Big Data (BigData), 2024
Rank selection, i.e. the choice of factorization rank, is the first step in constructing Nonnegative Matrix Factorization (NMF) models. It is a long-standing problem which is not unique to NMF, but arises in most models which attempt to decompose data into its underlying components. Since these models are often used in the unsupervised setting, the rank selection problem is further complicated by the lack of ground truth labels. In this paper, we review and empirically evaluate the most commonly used schemes for NMF rank selection.
@inproceedings{10825324,author={Eswar, Srinivas and Hayashi, Koby and Cobb, Benjamin and Kannan, Ramakrishnan and Ballard, Grey and Vuduc, Richard and Park, Haesun},booktitle={2024 IEEE International Conference on Big Data (BigData)},title={On Rank Selection for Nonnegative Matrix Factorization},year={2024},volume={},number={},pages={1294-1301},keywords={Reviews;Big Data;Data models;Matrix decomposition;Rank selection;Nonnegative Matrix Factorization},doi={10.1109/BigData62323.2024.10825324},}
2023
Distributed-Memory Parallel JointNMF
Srinivas
Eswar, Benjamin
Cobb, Koby
Hayashi, and
4 more authors
In Proceedings of the 37th ACM International Conference on Supercomputing, Orlando, FL, USA, 2023
Joint Nonnegative Matrix Factorization (JointNMF) is a hybrid method for mining information from datasets that contain both feature and connection information. We propose distributed-memory parallelizations of three algorithms for solving the JointNMF problem based on Alternating Nonnegative Least Squares, Projected Gradient Descent, and Projected Gauss-Newton. We extend well-known communication-avoiding algorithms using a single processor grid case to our coupled case on two processor grids. We demonstrate the scalability of the algorithms on up to 960 cores (40 nodes) with 60% parallel efficiency. The more sophisticated Alternating Nonnegative Least Squares (ANLS) and Gauss-Newton variants outperform the first-order gradient descent method in reducing the objective on large-scale problems. We perform a topic modelling task on a large corpus of academic papers that consists of over 37 million paper abstracts and nearly a billion citation relationships, demonstrating the utility and scalability of the methods.
@inproceedings{10.1145/3577193.3593733,author={Eswar, Srinivas and Cobb, Benjamin and Hayashi, Koby and Kannan, Ramakrishnan and Ballard, Grey and Vuduc, Richard and Park, Haesun},title={Distributed-Memory Parallel JointNMF},year={2023},isbn={9798400700569},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3577193.3593733},doi={10.1145/3577193.3593733},booktitle={Proceedings of the 37th ACM International Conference on Supercomputing},pages={301–312},numpages={12},keywords={nonnegative matrix factorization, multimodal inputs, high performance computing},location={Orlando, FL, USA},series={ICS '23},}
2022
FIST-HOSVD: fused in-place sequentially truncated higher order singular value decomposition
Benjamin
Cobb, Hemanth
Kolla, Eric
Phipps, and
1 more author
In Proceedings of the Platform for Advanced Scientific Computing Conference, Basel, Switzerland, 2022
In this paper, several novel methods of improving the memory locality of the Sequentially Truncated Higher Order Singular Value Decomposition (ST-HOSVD) algorithm for computing the Tucker decomposition are presented. We show how the two primary computational kernels of the ST-HOSVD can be fused together into a single kernel to significantly improve memory locality. We then extend matrix tiling techniques to tensors to further improve cache utilization. This block-based approach is then coupled with a novel in-place transpose algorithm to drastically reduce the memory requirements of the algorithm by overwriting the original tensor with the result. Our approach’s effectiveness is demonstrated by comparing the multi-threaded performance of our optimized ST-HOSVD algorithm to TuckerMPI, a state-of-the-art ST-HOSVD implementation, in compressing two combustion simulation datasets. We demonstrate up to 135x reduction in auxiliary memory consumption thereby increasing the problem size that can be computed for a given memory allocation by up to 3x, whilst maintaining comparable runtime performance.
@inproceedings{10.1145/3539781.3539798,author={Cobb, Benjamin and Kolla, Hemanth and Phipps, Eric and \c{C}ataly\"{u}rek, \"{U}mit V.},title={FIST-HOSVD: fused in-place sequentially truncated higher order singular value decomposition},year={2022},isbn={9781450394109},publisher={Association for Computing Machinery},address={New York, NY, USA},url={https://doi.org/10.1145/3539781.3539798},doi={10.1145/3539781.3539798},booktitle={Proceedings of the Platform for Advanced Scientific Computing Conference},articleno={15},numpages={11},keywords={tucker decomposition, reduced memory high-water mark, memory efficient, kernel fusion, in-place, data compression, cache blocking},location={Basel, Switzerland},series={PASC '22},}