Publications
Publications in reversed chronological order.
2025
- IPLAn Improved Uniform Convergence Bound with Fat-Shattering DimensionRoberto Colomboni, Emmanuel Esposito, and Andrea PaudiceIn Information Processing Letters, 2025
The fat-shattering dimension characterizes the uniform convergence property of real-valued functions. The state-of-the-art upper bounds feature a multiplicative squared logarithmic factor on the sample complexity, leaving an open gap with the existing lower bound. We provide an improved uniform convergence bound that closes this gap.
2024
- COLTA Theory of Interpretable ApproximationsMarco Bressan, Nicolò Cesa-Bianchi, Emmanuel Esposito, Yishay Mansour, Shay Moran, and Maximilian ThiessenCOLT ’24, 2024
Can a deep neural network be approximated by a small decision tree based on simple features? This question and its variants are behind the growing demand for machine learning models that are \emphinterpretable by humans. In this work we study such questions by introducing \emphinterpretable approximations, a notion that captures the idea of approximating a target concept c by a small aggregation of concepts from some base class \mathcalH. In particular, we consider the approximation of a binary concept c by decision trees based on a simple class \mathcalH (e.g., of bounded VC dimension), and use the tree depth as a measure of complexity. Our primary contribution is the following remarkable trichotomy. For any given pair of \mathcalH and c, exactly one of these cases holds: (i) c cannot be approximated by \mathcalH with arbitrary accuracy; (ii) c can be approximated by \mathcalH with arbitrary accuracy, but there exists no universal rate that bounds the complexity of the approximations as a function of the accuracy; or (iii) there exists a constant κthat depends only on \mathcalH and c such that, for \emphany data distribution and \emphany desired accuracy level, c can be approximated by \mathcalH with a complexity not exceeding κ. This taxonomy stands in stark contrast to the landscape of supervised classification, which offers a complex array of distribution-free and universally learnable scenarios. We show that, in the case of interpretable approximations, even a slightly nontrivial a-priori guarantee on the complexity of approximations implies approximations with constant (distribution-free and accuracy-free) complexity. We extend our trichotomy to classes \mathcalH of unbounded VC dimension and give characterizations of interpretability based on the algebra generated by \mathcalH.
- COLTEfficient Algorithms for Learning Monophonic Halfspaces in GraphsMarco Bressan, Emmanuel Esposito, and Maximilian ThiessenCOLT ’24, 2024
We study the problem of learning a binary classifier on the vertices of a graph. In particular, we consider classifiers given by monophonic halfspaces, partitions of the vertices that are convex in a certain abstract sense. Monophonic halfspaces, and related notions such as geodesic halfspaces, have recently attracted interest, and several connections have been drawn between their properties (e.g., their VC dimension) and the structure of the underlying graph G. We prove several novel results for learning monophonic halfspaces in the supervised, online, and active settings. Our main result is that a monophonic halfspace can be learned with near-optimal passive sample complexity in time polynomial in n = |V(G)|. This requires us to devise a polynomial-time algorithm for consistent hypothesis checking, based on several structural insights on monophonic halfspaces and on a reduction to 2-satisfiability. We prove similar results for the online and active settings. We also show that the concept class can be enumerated with delay \mathrmpoly(n), and that empirical risk minimization can be performed in time 2ω(G) \mathrmpoly(n) where ω(G) is the clique number of G. These results answer open questions from the literature (González et al., 2020), and show a contrast with geodesic halfspaces, for which some of the said problems are NP-hard (Seiffarth et al., 2023).
2023
- NeurIPSOn the Minimax Regret for Online Learning with Feedback GraphsKhaled Eldowa, Emmanuel Esposito, Tommaso Cesari, and Nicolò Cesa-BianchiTo appear in NeurIPS ’23, 2023
In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is \mathcalO\bigl(\sqrtαT\ln K\bigr), where K is the number of actions, αis the independence number of the graph, and T is the time horizon. The \sqrt\ln K factor is known to be necessary when α= 1 (the experts case). On the other hand, when α= K (the bandits case), the minimax rate is known to be Θ\bigl(\sqrtKT\bigr), and a lower bound Ω\bigl(\sqrtαT\bigr) is known to hold for any α. Our improved upper bound \mathcalO\bigl(\sqrtαT(1+\ln(K/α))\bigr) holds for any αand matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with q-Tsallis entropy for a carefully chosen value of q ∈[1/2, 1) that varies with α. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to time-varying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved Ω\bigl(\sqrtαT(\ln K)/(\lnα)\bigr) lower bound for all α> 1, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as α< K.
- ICMLDelayed Bandits: When Do Intermediate Observations Help?Emmanuel Esposito, Saeed Masoudian, Hao Qiu, Dirk van der Hoeven, Nicolò Cesa-Bianchi, and Yevgeny SeldinICML ’23, 2023
We study a K-armed bandit with delayed feedback and intermediate observations. We consider a model where intermediate observations have a form of a finite state, which is observed immediately after taking an action, whereas the loss is observed after an adversarially chosen delay. We show that the regime of the mapping of states to losses determines the complexity of the problem, irrespective of whether the mapping of actions to states is stochastic or adversarial. If the mapping of states to losses is adversarial, then the regret rate is of order \sqrt(K+d)T (within log factors), where T is the time horizon and d is a fixed delay. This matches the regret rate of a K-armed bandit with delayed feedback and without intermediate observations, implying that intermediate observations are not helpful. However, if the mapping of states to losses is stochastic, we show that the regret grows at a rate of \sqrt\bigl(K+\min{|\mathcalS|,d}\bigr)T (within log factors), implying that if the number |\mathcalS| of states is smaller than the delay, then intermediate observations help. We also provide refined high-probability regret upper bounds for non-uniform delays, together with experimental validation of our algorithms.
2022
- NeurIPSLearning on the Edge: Online Learning with Stochastic Feedback GraphsNeurIPS ’22, 2022
The framework of feedback graphs is a generalization of sequential decision-making with bandit or full information feedback. In this work, we study an extension where the directed feedback graph is stochastic, following a distribution similar to the classical Erdős-Rényi model. Specifically, in each round every edge in the graph is either realized or not with a distinct probability for each edge. We prove nearly optimal regret bounds of order \min\bigl{\min_\varepsilon \sqrt(\alpha_\varepsilon/\varepsilon) T, \min_\varepsilon (\delta_\varepsilon/\varepsilon)^1/3 T^2/3\bigr} (ignoring logarithmic factors), where \alpha_\varepsilon and \delta_\varepsilon are graph-theoretic quantities measured on the support of the stochastic feedback graph \mathcalG with edge probabilities thresholded at \varepsilon. Our result, which holds without any preliminary knowledge about \mathcalG, requires the learner to observe only the realized out-neighborhood of the chosen action. When the learner is allowed to observe the realization of the entire graph (but only the losses in the out-neighborhood of the chosen action), we derive a more efficient algorithm featuring a dependence on weighted versions of the independence and weak domination numbers that exhibits improved bounds for some special cases.
2020
- ALENEXRecSplit: Minimal Perfect Hashing via Recursive SplittingEmmanuel Esposito, Thomas Mueller Graf, and Sebastiano VignaIn 2020 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), 2020
A minimal perfect hash function bijectively maps a key set S out of a universe U into the first |S| natural numbers. Minimal perfect hash functions are used, for example, to map irregularly-shaped keys, such as strings, in a compact space so that metadata can then be simply stored in an array. While it is known that just 1.44 bits per key are necessary to store a minimal perfect hash function, no published technique can go below 2 bits per key in practice. We propose a new technique for storing minimal perfect hash functions with expected linear construction time and expected constant lookup time that makes it possible to build for the first time, for example, structures which need 1.56 bits per key, that is, within 8.3% of the lower bound, in less than 2 ms per key. We show that instances of our construction are able to simultaneously beat the construction time, space usage and lookup time of the state-of-the-art data structure reaching 2 bits per key. Moreover, we provide parameter choices giving structures which are competitive with alternative, larger-size data structures in terms of space and lookup time. The construction of our data structures can be easily parallelized or mapped on distributed computational units (e.g., within the MapReduce framework), and structures larger than the available RAM can be directly built in mass storage.