Feature Selection From Real-World Data With Non-Linear Observations
Martin Genzel, Department of Mathematics, Technische Universität Berlin
3:30 – 5PM
Tuesday Jan 17, 2017
POB 6.304
Abstract
A fundamental challenge in machine learning is the selection of discriminative features
from a relatively small collection of sample pairs {(x_i; y_i)}1≤i≤m. Here, the observations
y_i ∈ R are often supposed to follow a noisy single-index model, depending on a certain set of
target variables. The major difficulty is now that these variables cannot be observed directly,
but rather arise as hidden factors in the actual data vector x_i ∈ R^d (feature variables). A
typical example would be mass spectrometry data of the human proteome, where the desired
molecular concentrations of proteins are intrinsically encoded by means of Gaussian-shaped
peaks.
In this talk, we will see that a successful feature selection is still possible when the
applied estimator does not have any knowledge of the underlying data representation and
only takes the “raw” samples {(x_i; y_i)}1≤i≤m as input. Guarantees of such type are especially
appealing for practical purposes, since in many applications even standard methods, e.g., the
Lasso or logistic regression, yield surprisingly good outcomes. The mathematical basis of our
results forms a recent framework for structured signal recovery from highly underdetermined
(non-)linear equation systems. This allows us to treat the problem of feature selection in a
unified way, particularly including non-linear observations, arbitrary convex signal structures
as well as strictly convex loss functions. This is joint work with Gitta Kutyniok.