Home > Publications > A new adaptive folding-up algorithm for information retrieval

A new adaptive folding-up algorithm for information retrieval

Abstract:

Text collections can be represented mathematically as term-document matrices. A term-document matrix can in turn be represented using the matrix factorization method known as the partial (or truncated) singular value decomposition (PSVD). Recomputing the PSVD when changes are made to a text collection is very expensive. Folding-in is one method of approximating the PSVD when new documents are added to a term-document matrix; updating the PSVD of the existing term-document matrix is another method. The folding-in method is computationally inexpensive, but it may cause deterioration in the accuracy of the PSVD. The PSVD-updating method is more expensive than the folding-in method, but it maintains the accuracy of the PSVD. Folding-up is a method that combines folding-in and PSVD-updating. When a text collection expands in small increments, folding-up provides a significant improvement in computation time when compared with either re- computing the PSVD or PSVD-updating, and it reduces the loss of accuracy in the PSVD that can occur with the folding-in method. This paper introduces a new adaptive folding-up method in which a measure of the error in the PSVD is monitored to determine when it is most advantageous to switch from folding-in to updating.

Authors: Jane E. Mason, Raymond J. Spiteri

Download: MasonSpiteri2008