PhD defense: Matthieu Bulté - From Points to Objects: Statistical Inference Beyond Euclidean Spaces
Date: Friday, 16 May 2025, at 10.15
Venue: Aud 6, HCØ, Universitetsparken 5,2100 Copenhagen Ø
Academic Advisors
Professor Helle Sørensen, University of Copenhagen
Professor Christiane Fuchs, University of Bielefeld
Assessment Committee
Professor Bo Markussen (chair),University of Copenhagen
Professor Dietmar Bauer, University of Bielefeld
Professor Victor Panaretos, Ecole Polytechnique Fédérale de Lausanne
Summary: Random variables taking values in metric spaces, called random objects, have recently received additional attention in the statistical litterature. The abstraction, only requiring the definition of a notion of distance between data points, allows for the development of statistical methodology that can be applied to a wide range of complex data types. In particular, this includes types of data not typically covered by existing works. Just like in classical statistics, practitioners encountering such complex data might be interested in answering a wide range of questions. In this thesis, we present the results of three independent projects, each addressing a typical statistical task.
The first chapter provides an introduction to the thesis; it consists of a motivation for the work carried out, together with an attempt at providing the necessary background in metric spaces and their statistical study in order to make this manuscript self-contained. This includes a brief introduction to metric spaces and to relevant geometric concepts. Then, we introduce the notion of random objects together with a generalization of expected value central to this thesis: the Fréchet mean of a random object. Each of the remaining chapters corresponds to one of the projects of the thesis, to which we provide a brief introduction consisting of a motivation and presentation of the main contribution.
In the second chapter, we are concerned with regressing a random object on a vector of real numbers, that is, we attempt to learn from an independent and identically distributed sample the conditional expectation of a random object given a vector of real numbers. To that end, we present an adaptation of random forest together with an approximate tree construction algorithm. Our approximation algorithm allows to perform regression in metric spaces where Fréchet means are expensive to compute. We show that the proposed method, the Metric Random Forest (MRF), is pointwise consistent and we provide a simulation study to illustrate its performance. Time series data, where observations are collected sequentially over time, naturally raise questions about temporal dependence between successive measurements.
In the third chapter, we use the additional structure present in Hadamard spaces to extend the classical autoregressive process of order one, introducing the Geodesic Autoregressive Model (GAR). We provide estimators for its parameters, and propose a statistical test for the absence of temporal dependence. Furthermore, we study the asymptotic properties of the estimators and the test. We illustrate the methodology in a simulation study and in an application to economic surveys of consumer expectations.
Finally, in the fourth chapter, we consider the more elementary problem of testing the Fréchet mean of a random object. We introduce a test based on the empirical Fréchet variance of the sample and introduce a randomization methodology exploiting the symmetries of the metric space. We illustrate the methodology in various kinds of metric spaces and investigate the performance of the test under various conditions. We provide a simulation study to illustrate the performance of the test and apply it to a data set of wind data in Denmark.