University of Texas at Austin

Past Event: Oden Institute Seminar

Feature Selection From Real-World Data With Non-Linear Observations

Martin Genzel, Department of Mathematics, Technische Universität Berlin

3:30 – 5PM
Tuesday Jan 17, 2017

POB 6.304

Abstract

A fundamental challenge in machine learning is the selection of discriminative features from a relatively small collection of sample pairs {(x_i; y_i)}1≤i≤m. Here, the observations y_i ∈ R are often supposed to follow a noisy single-index model, depending on a certain set of target variables. The major difficulty is now that these variables cannot be observed directly, but rather arise as hidden factors in the actual data vector x_i ∈ R^d (feature variables). A typical example would be mass spectrometry data of the human proteome, where the desired molecular concentrations of proteins are intrinsically encoded by means of Gaussian-shaped peaks. In this talk, we will see that a successful feature selection is still possible when the applied estimator does not have any knowledge of the underlying data representation and only takes the “raw” samples {(x_i; y_i)}1≤i≤m as input. Guarantees of such type are especially appealing for practical purposes, since in many applications even standard methods, e.g., the Lasso or logistic regression, yield surprisingly good outcomes. The mathematical basis of our results forms a recent framework for structured signal recovery from highly underdetermined (non-)linear equation systems. This allows us to treat the problem of feature selection in a unified way, particularly including non-linear observations, arbitrary convex signal structures as well as strictly convex loss functions. This is joint work with Gitta Kutyniok.

Event information

Date
3:30 – 5PM
Tuesday Jan 17, 2017
Location POB 6.304
Hosted by Chandrajit Bajaj