# András Antos' abstracts

## of conference papers

[In 19th International Conference on Algorithmic Learning Theory, ALT 2008, Proceedings,
Springer-Verlag, Berlin, Heidelberg, LNCS/LNAI, 5254, pp.287-302, October 2008]

## Active learning in multi-armed bandits

### András Antos, Varun Grover, Csaba Szepesvári

Abstract

In this paper we consider the problem of actively learning the mean values of distributions associated with a finite number of options (arms). The algorithms can select which option to generate the next sample from in order to produce estimates with equally good precision for all the distributions. When an algorithm uses sample means to estimate the unknown values then the optimal solution, assuming full knowledge of the distributions, is to sample each option proportional to its variance. In this paper we propose an incremental algorithm that asymptotically achieves the same loss as an optimal rule. We prove that the excess loss suffered by this algorithm, apart from logarithmic factors, scales as n-3/2, which we conjecture to be the optimal rate. The performance of the algorithm is illustrated in a simple problem.

[Twenty-First Annual Conference on Neural Information Processing Systems (NIPS 2007),
December 2007, Poster]

## Fitted Q-iteration in continuous action-space MDPs

### András Antos, Rémi Munos, Csaba Szepesvári

Abstract

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action-values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems.

[In 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007),
IEEE, pp.330-337, April 2007]

## Value-iteration based fitted policy iteration: learning with a single trajectory

### András Antos, Csaba Szepesvári, Rémi Munos

Abstract

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is fitted policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced allowing us to significantly extend the scope of previous results.

[In The Nineteenth Annual Conference on Learning Theory, COLT 2006, Proceedings,
Springer-Verlag, Berlin, Heidelberg, LNCS/LNAI, 4005, pp.574-588, June 2006]

## Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

### András Antos, Csaba Szepesvári, Rémi Munos

Extended version with detailed proofs: ps.gz

Abstract

We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical work, we consider the case when the training data consists of a single sample path (trajectory) of some behaviour policy. In particular, we do not assume access to a generative model of the environment. The algorithm studied is policy iteration where in successive iterations the Q-functions of the intermediate policies are obtained by means of minimizing a novel Bellman-residual type error. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance where the bound depends on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used.

[In The Eighteenth Annual Conference on Learning Theory, COLT'05, Proceedings,
Springer-Verlag, Berlin, Heidelberg, LNCS/LNAI, 3559, pp.531-544, June 2005]

## Improved minimax bounds on the test and training distortion of empirically designed vector quantizers

### András Antos

Abstract

It is shown by earlier results that the minimax expected (test) distortion redundancy of empirical vector quantizers with three or more levels designed from n independent and identically distributed data points is at least \Omega(1/\sqrt{n}) for the class of distributions on a bounded set. In this paper, a much simpler construction and proof for this are given with much better constants. There are similar bounds for the training distortion of the empirically optimal vector quantizer with three or more levels. These rates, however, do not hold for a one-level quantizer. Here the two-level quantizer case is clarified, showing that it already shares the behavior of the general case. Given that the minimax bounds are proved using a construction that involves discrete distributions, one suspects that for the class of distributions with uniformly bounded continuous densities, the expected distortion redundancy might decrease as o(1/\sqrt{n}) uniformly. It is shown as well that this is not so, proving that the lower bound for the expected test distortion remains true for these subclasses.

[In Proceedings 2004 IEEE International Symposium on Information Theory (ISIT 2004),
IEEE Information Theory Society, p.301, 2004]

## Improved convergence rates in empirical vector quantizer design

### András Antos, László Györfi, András György

Abstract

We consider the rate of convergence of the expected distortion redundancy of empirically optimal vector quantizers. Earlier results show that the mean-squared distortion of an empirically optimal quantizer designed from n independent and identically distributed source samples converges uniformly to the optimum at a rate O(1 / \sqrt{n}), and that this rate is sharp in the minimax sense. We prove that for any fixed source distribution supported on a given finite set, the convergence rate is O(1/n) (faster than the minimax lower bound), where the corresponding constant depends on the distribution. For more general source distributions, we provide conditions implying a little bit worse O(log n/n) rate of convergence. In particular, scalar distributions having strictly log-concave densities with bounded support (such as the truncated Gaussian distribution) satisfy these conditions.

[In Proceedings 2001 IEEE International Symposium on Information Theory (ISIT 2001),
p.45, 2001]

## Estimating the entropy of discrete distributions

### András Antos and Ioannis Kontoyiannis

Abstract

Given an i.i.d. sample (X1,...,Xn) drawn from an unknown discrete distribution P on a countably infinite set, we consider the problem of estimating the entropy of P. We show that the plug-in estimate is universally consistent and that, without further assumptions, no rate-of-convergence results can be obtained for any sequence of entropy estimates. Under additional conditions we get convergence rates for the plug-in estimate and for an estimate based on match-lengths. The behavior of the expected error of the plug-in estimate is shown to be in sharp contrast to the finite-alphabet case.

[In Limit Theorems in Probability and Statistics, J. Bolyai Mathematical Society, Budapest, pp.101-112, 2002]

## On nonparametric estimates of the expectation

### András Antos

Abstract

Let X be a real-valued random variable with finite expectation denoted by m. Let (X1,...,Xn) be an i.i.d. sample drawn from this distribution of X. By the strong law of the large numbers the average

m'n = (X1+...+Xn)/n

is a strongly consistent estimate of m, that is, m'n -> m a.s. and E{|m'n-m|} -> 0. Let mn(X1,...,Xn) be any estimate of m, and let {mn} be a sequence of such estimates. Here we prove that without further conditions on the distribution of X, no rate-of-convergence results can be obtained for any sequence {mn} of estimates, even for discrete X, that is, for any positive sequence {an} converging to zero, a distribution on N={0,1,2,...} may be found with finite mean such that for any c>0

P{|mn-m|>an} > 1/2 - c        infinitely often.

It is also known that restricting ourselves to the class Dp of distributions with finite pth moment E{|X|p} (1 <= p <= 2), for m'n we get the rate

E{|m'n-m|}=O(1/n1-1/p).

We prove that this is sharp, even for discrete X, in the sense, that for any sequence {mn} of estimates for any c>0, a distribution on N may be found in Dp such that

P{|mn-m|>1/(n1-1/p log n)} > 1/2 - c        infinitely often.

[In Computational Learning Theory: 4th European Conference, EuroCOLT'99, Proceedings,
Springer, Berlin, LNCS/LNAI, 1572, pp.241-252, 1999]

## Lower bounds on the rate of convergence of nonparametric pattern recognition

### András Antos

Abstract

We show that there exist individual lower bounds corresponding to the upper bounds on the rate of convergence of nonparametric pattern recognition which are arbitrarily close to Yang's minimax lower bounds, if the a posteriori probability function is in the classes used by Stone and others. The rates equal to the ones on the corresponding regression estimation problem. Thus for these classes classification is not easier than regression estimation either in individual sense.

[10th European Young Statisticians Meeting,
Warsawa, Poland, August 1997]

## Rawa trees

### András Antos and Luc Devroye

Abstract

Binary search trees are important in data storing, and the running time of several algorithms is related to the height of the tree. The data can be modeled by a random sequence of numbers. On the other hand, random binary search trees can be used in certain computer graphics applications. The case when the random input sequence consists of independent identically distributed (i.i.d.) variables has been heavily studied. In this case the height has order log n, asymptotically, where n is the number of nodes. We study the case when the input sequence is an ordinary RAndom WAlk, that is, the partial sums of an i.i.d. sequence. We show that if the basic distribution satisfies some regularity conditions then the height has order square root of n. We give an exact limit law, where the limit distribution is the Erdõs-Kac-Rényi distribution L(z). As a by-product, we prove a limit theorem for random walks.

[In Proceedings of the Ninth Annual Conference on Computational Learning Theory/COLT'96,
Association for Computing Machinery, New York, pp.303-309, 1996]

## Strong minimax lower bounds for learning

### András Antos and Gábor Lugosi

Abstract

Known minimax lower bounds for learning state that for each sample size n, and learning rule gn, there exists a distribution of the observation X and a concept C to be learnt such that the expected error of gn is at least a constant times V/n, where V is the VC dimension of the concept class. However, these bounds do no tell anything about the rate of decrease of the error for a fixed distribution and fixed concept.

In this paper we investigate minimax lower bounds in such - much stronger - sense. We show that for several natural d-parameter concept classes, for any sequence of learning rules {gn}, there exists a fixed distribution of X and a fixed concept C such that the expected error is larger than a constant times d/n for infinitely many n.

We also show that it is neither the VC dimension, nor the rate of increase of the shatter coefficients of the class that determine the asymptotic behavior of the concept class.

Back to the publications