# András Antos' abstracts

## of journal papers

[IEEE Transactions on Automatic Control (accepted) XXX (?):YY-ZZ, 2013]

## Online Markov decision processes under bandit feedback

### Gergely Neu, András György, Csaba Szepesvári, András Antos

Abstract

We consider online learning in finite stochastic Markovian environments where in each time step a new reward function is chosen by an oblivious adversary. The goal of the learning agent is to compete with the best stationary policy in hindsight in terms of the total reward received. In details, in each time step the agent observes the current state and the reward associated with the last transition, however, the agent does not observe the rewards associated with other state-action pairs. The agent is assumed to know the transition probabilities. The state of the art result for this setting is an algorithm with an expected regret of O(T2/3ln T). In this paper, assuming that stationary policies mix uniformly fast, we show that after T time steps, the expected regret of this algorithm (more precisely, a slightly modified version thereof) is O(T1/2 ln T), giving the first rigorously proven, essentially tight regret bound for the problem.

[Theoretical Computer Science 473 (?):77-99, Feb 18, 2013]

## Toward a classification of finite partial-monitoring games

### András Antos, Gábor Bartók, Dávid Pál, Csaba Szepesvári

Abstract

Partial-monitoring games constitute a mathematical framework for sequential decision making problems with imperfect feedback: The learner repeatedly chooses an action, the opponent responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his total cumulative loss. We make progress towards the classification of these games based on their minimax expected regret. Namely, we classify almost all games with two outcomes and a finite number of actions: We show that their minimax expected regret is either zero, \tilde\Theta(\sqrt{T}), \Theta(T2/3), or \Theta(T), and we give a simple and efficiently computable classification of these four classes of games. Our hope is that the result can serve as a stepping stone toward classifying all finite partial-monitoring games.

Keywords: Online algorithms, Online learning, Imperfect feedback, Regret analysis

[IEEE Transactions on Information Theory 58 (2):1147-1157, Feb 2012]

## On codecell convexity of optimal multiresolution scalar quantizers for continuous sources

### András Antos

Abstract

It has been shown by earlier results that for fixed rate multiresolution scalar quantizers and for mean squared error distortion measure, codecell convexity precludes optimality for certain discrete sources. However it was unknown whether the same phenomenon can occur for any continuous source. In this paper, examples of continuous sources (even with bounded continuous densities) are presented for which optimal fixed rate multiresolution scalar quantizers cannot have only convex codecells, proving that codecell convexity precludes optimality also for such regular sources.

Keywords - Clustering methods, quantization, source coding, rate distortion theory, mean square error methods, optimization methods, multi-resolution, codecell convexity, continuous density function.

[Theoretical Computer Science 411 (29-30):2712-2728, 2010]

## Active learning in heteroscedastic noise

### András Antos, Varun Grover, Csaba Szepesvári

Abstract

We consider the problem of actively learning the mean values of distributions associated with a finite number of options. The decision maker can select which option to generate the next observation from, the goal being to produce estimates with equally good precision for all the options. If sample means are used to estimate the unknown values then the optimal solution, assuming that the distributions are known up to a shift, is to sample from each distribution proportional to its variance. No information other than the distributions' variances is needed to calculate the optimal solution. In this paper we propose an incremental algorithm that asymptotically achieves the same loss as an optimal rule. We prove that the excess loss suffered by this algorithm, apart from logarithmic factors, scales as n-3/2, which we conjecture to be the optimal rate. The performance of the algorithm is illustrated on a simple problem.

Key words: active learning, heteroscedastic noise, regression, sequential analysis, sequential allocation

[Machine Learning 71 (1):89-129, 2008]

## Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

### András Antos, Csaba Szepesvári, Remi Munos

Abstract

We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian Decision Problems given the trajectory of some behaviour policy. We study the policy iteration algorithm where in successive iterations the action-value functions of the intermediate policies are obtained by picking a function from some fixed function set (chosen by the user) that minimizes an unbiased finite-sample approximation to a novel loss function that upper-bounds the unmodified Bellman-residual criterion. The main result is a finite-sample, high-probability bound on the performance of the resulting policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept that we call the VC-crossing dimension, the approximation power of the function set and the discounted-average concentrability of the future-state distribution. To the best of our knowledge this is the first theoretical reinforcement learning result for off-policy control learning over continuous state-spaces using a single trajectory.

Key words Reinforcement learning, policy iteration, Bellman-residual minimization, off-policy learning, nonparametric regression, least-squares regression, finite-sample bounds

[IEEE Transactions on Information Theory 51 (11):4022-4032, 2005]

## Improved minimax bounds on the test and training distortion of empirically designed vector quantizers

### András Antos

Abstract

It is shown by earlier results that the minimax expected (test) distortion redundancy of empirical vector quantizers with three or more levels designed from n independent and identically distributed data points is at least \Omega(1 / \sqrt{n}) for the class of distributions on a bounded set. In this paper, a much simpler construction and proof for this are given with much better constants. There are similar bounds for the training distortion of the empirical optimal vector quantizer with three or more levels. These rates, however, do not hold for a one-level quantizer. Here the two-level quantizer case is clarified, showing that it already shares the behavior of the general case. Given that the minimax bounds are proved using a construction that involves discrete distributions, one suspects that for the class of distributions with uniformly bounded continuous densities, the expected distortion redundancy might decrease as o(1 / \sqrt{n}) uniformly. It is shown as well that this is not so, proving that the lower bound for the expected test distortion remains true for these subclasses.

Keywords - Empirical vector quantizer design, distortion redundancy, minimax lower bounds, training distortion.

[IEEE Transactions on Information Theory 51 (11):4013-4022, 2005]

## Individual convergence rates in empirical vector quantizer design

### András Antos, László Györfi, András György

Abstract

We consider the rate of convergence of the expected distortion redundancy of empirically optimal vector quantizers. Earlier results show that the mean-squared distortion of an empirically optimal quantizer designed from n independent and identically distributed source samples converges uniformly to the optimum at a rate of O(1 / \sqrt{n}), and that this rate is sharp in the minimax sense. We prove that for any fixed distribution supported on a given finite set the convergence rate is O(1/n) (faster than the minimax lower bound), where the corresponding constant depends on the source distribution. For more general source distributions we provide conditions implying a little bit worse O(log n/n) rate of convergence. Although these conditions, in general, are hard to verify, we show that sources with continuous densities satisfying certain regularity properties (similar to the ones in [Pol82b] that were used to prove a central limit theorem for the code points of the empirically optimal quantizers) are included in the scope of this result. In particular, scalar distributions with strictly log-concave densities with bounded support (such as the truncated Gaussian distribution) satisfy these conditions.

[Journal of Machine Learning Research 3 (1):73-98, 2002]

## Data-dependent margin-based generalization bounds for classification

### András Antos, Balázs Kégl, Tamás Linder, Gábor Lugosi

Abstract

We derive new margin-based inequalities for the probability of error of classifiers. The main feature of these bounds is that they can be calculated using the training data and therefore may be effectively used for model selection purposes. In particular, the bounds involve empirical complexities measured on the training data (such as the empirical fat-shattering dimension) as opposed to their worst-case counterparts traditionally used in such analyses. Also, our bounds appear to be sharper and more general than recent results involving empirical complexity measures. In addition, we develop an alternative data-based bound for the generalization error of classes of convex combinations of classifiers involving an empirical complexity measure that is easier to compute than the empirical covering number or fat-shattering dimension. We also show examples of efficient computation of the new bounds.

Keywords: classification, error estimation, fat-shattering dimension, margin-based bounds

[Random Structures and Algorithms, special issue: Average-Case Analysis of Algorithms 19 (3-4):163-193, 2002]

## Convergence properties of functional estimates for discrete distributions

### András Antos and Ioannis Kontoyiannis

Abstract

Suppose P is an arbitrary discrete distribution on a countable alphabet {\cal X}. Given an i.i.d. sample (X1,...,Xn) drawn from P, we consider the problem of estimating the entropy H(P) or some other functional F=F(P) of the unknown distribution P. We show that, for additive functionals satisfying mild conditions (including the cases of the mean, the entropy, and mutual information), the plug-in estimates of F are universally consistent. We also prove that, without further assumptions, no rate-of-convergence results can be obtained for any sequence of estimators. In the case of entropy estimation, under a variety of different assumptions, we get rate-of-convergence results for the plug-in estimate and for a nonparametric estimator based on match-lengths. The behavior of the variance and the expected error of the plug-in estimate is shown to be in sharp contrast to the finite-alphabet case. A number of other important examples of functionals are also treated in some detail.

Keywords: functional estimation; entropy estimation; rates of convergence; match lengths

[Theoretical Computer Science 284 (1):3-24, 2002]

## Lower bounds for the rate of convergence in nonparametric pattern recognition

### András Antos

Abstract

We show that there exist individual lower bounds corresponding to the upper bounds for the rate of convergence of nonparametric pattern recognition which are arbitrarily close to Yang's minimax lower bounds, for certain cubic'' classes of regression functions used by Stone and others. The rates are equal to the ones of the corresponding regression function estimation problem. Thus for these classes classification is not easier than regression function estimation.

Keywords: individual rates of convergence, nonparametric pattern recognition

[Journal of Statistical Planning and Inference 83:91-100, 2000]

## Lower bounds on the rate on the rate of convergence of nonparametric regression estimates

### András Antos, László Györfi and Michael Kohler

Abstract

We show that there exist individual lower bounds on the rate of convergence of nonparametric regression estimates, which are arbitrarily close to Stone's minimax lower bounds.

AMS classifications: Primary: 62G20. Secondary: 62G05.

Keywords: Individual rates of convergence, nonparametric regression.

[IEEE Transactions on Pattern Analysis and Machine Intelligence 21:643-645, 1999]

## Lower bounds for Bayes error estimation

### András Antos, Luc Devroye and László Györfi

Abstract

We give a short proof of the following result. Let (X,Y) be any distribution on N x {0,1}, and let (X1,Y1),...,(Xn,Yn) be an i.i.d. sample drawn from this distribution. In discrimination, the Bayes error L* = \infg P{g(X)<>Y} is of crucial importance. Here we show that without further conditions on the distribution of (X,Y), no rate of convergence results can be obtained. Let Ln(X1,Y1,...,Xn,Yn) be an estimate of the Bayes error, and let {Ln(.)} be a sequence of such estimates. For any sequence {an} of positive numbers converging to zero, a distribution of (X,Y) may be found such that

E{|L*-Ln(X1,Y1,...,Xn,Yn)|} >= an

infinitely often.

1991 Mathematics Subject Classifications: Primary: 62G07. Secondary: 62G05, 62F12, 60F25.

Keywords and phrases. Discrimination, statistical pattern recognition, nonparametric estimation, Bayes error, lower bounds, rates of convergence.

[Machine Learning 30:31-56, 1998]

## Strong minimax lower bounds for learning

### András Antos and Gábor Lugosi

Abstract

Minimax lower bounds for concept learning state, for example, that for each sample size n and learning rule gn, there exists a distribution of the observation X and a concept C to be learnt such that the expected error of gn is at least a constant times V/n, where V is the VC dimension of the concept class. However, these bounds do not tell anything about the rate of decrease of the error for a fixed distribution-concept pair.

In this paper we investigate minimax lower bounds in such a - stronger - sense. We show that for several natural k-parameter concept classes, including the class of linear halfspaces, the class of balls, the class of polyhedra with a certain number of faces, and a class of neural networks, for any sequence of learning rules {gn}, there exists a fixed distribution of X and a fixed concept C such that the expected error is larger than a constant times k/n for infinitely many n. We also obtain such strong minimax lower bounds for the tail distribution of the probability of error, which extend the corresponding minimax lower bounds.

Back to the publications