Sparsity

 
 
  1. Bullet Efficient Projections onto the L1 Ball for Learning in High Dimensions constrained
    We describe efficient algorithms for projecting a vector onto the L1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the L1-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous batch and online learning tasks. We show that variants of stochastic gradient projection methods augmented with our efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. We also show that in online settings gradient updates with L1 projections outperform the exponentiated gradient algorithm while obtaining models with high degrees of sparsity.

  1. This page contains various papers on learning sparse models. The papers are given in a chronological order with a short explanation. In the near future I plan to post a few demos, code, pointers, and a survey (yet to be written) on algorithms for learning sparse models. The focus of the work is on efficient algorithms for high-dimension data in the presence of large amounts of training examples.

  1. Bullet Efficient Online and Batch Learning Using Forward Backward Splitting
    We describe, analyze, and experiment with a framework for empirical loss minimization with regularization. Our algorithmic framework alternates between two phases. On each iteration we first perform an unconstrained gradient descent step. We then cast and solve an instantaneous optimization problem that trades off minimization of a regularization term while keeping close proximity to the result of the first phase. This view yields a simple yet effective algorithm that can be used for batch penalized risk minimization and online learning. Furthermore, the two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as L1. We derive concrete and very simple algorithms for minimization of loss functions with L1, L2, L22, and Linf regularization. We also show how to construct efficient algorithms for mixed-norm L1/Lq regularization. We further extend the algorithms and give efficient implementations for very high-dimensional data with sparsity. We demonstrate the potential of the proposed framework in a series of experiments with synthetic and natural datasets.

  1. Bullet Boosting with Structural Sparsity
    Despite popular belief, boosting algorithms and related coordinate descent methods are prone to overfitting. We derive modifications to AdaBoost and related gradient-based coordinate descent methods that incorporate, along their entire run, sparsity-promoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrates forward feature induction and back-pruning through sparsity promoting regularization along with an automatic stopping criterion for feature induction. We study penalties based on the L1, L2, and Linf norm of the learned predictor and also introduce mixed-norm penalties that build upon the initial norm-based penalties. The mixed-norm penalties facilitate structural sparsity of the parameters of the predictor, which is a useful property in multiclass prediction and other related learning tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models from high dimensional data.

  1. Bullet Group Sparse Coding
    Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary or text documents, there is no simple mapping from raw images or videos to dictio- nary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification.espite popular belief, boosting algorithms and related coordinate descent methods are prone to overfitting. We derive modifications to AdaBoost and related gradient-based coordinate descent methods that incorporate, along their entire run, sparsity-promoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrates forward feature induction and back-pruning through sparsity promoting regularization along with an automatic stopping criterion for feature induction. We study penalties based on the L1, L2, and Linf norm of the learned predictor and also introduce mixed-norm penalties that build upon the initial norm-based penalties. The mixed-norm penalties facilitate structural sparsity of the parameters of the predictor, which is a useful property in multiclass prediction and other related learning tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models from high dimensional data.

  1. Bullet Composite Objective Mirror Descent
    We present a new method for regularized convex optimization and analyze it under both online and stochastic optimization settings. In addition to unifying previously known first-order algorithms, such as the projected gradient method, mirror descent, and forward-backward splitting, our method yields new analysis and algorithms. We also derive specific instantiations of our method for commonly used regularization functions,  such as L1, mixed norm, and trace-norm.

This paper is so technical that:

  1. I cannot think of any image to put here

  2. Only Ambuj and John fully grasp it