TL;DR: In this paper, the authors consider the problem of endogeny in the recursive tree process X_i and draw attention to the theoretical question of whether the X-i measurable functions of the innovations process (X_i) are also measurable in the tree process.
Abstract: In certain problems in a variety of applied probability settings (from probabilistic analysis of algorithms to statistical physics), the central requirement is to solve a recursive distributional equation of the form X =^d g((\xi_i,X_i),i\geq 1). Here (\xi_i) and g(\cdot) are given and the X_i are independent copies of the unknown distribution X. We survey this area, emphasizing examples where the function g(\cdot) is essentially a ``maximum'' or ``minimum'' function. We draw attention to the theoretical question of endogeny: in the associated recursive tree process X_i, are the X_i measurable functions of the innovations process (\xi_i)?
TL;DR: In this article, the authors introduce a structure theory for such aymptotic fringe distributions and illustrate with many examples with respect to finite trees of increasing size, where it often happens that the subtree at a uniform random vertex converges in distribution to a limit random tree.
Abstract: Consider some model of random finite trees of increasing size. It often happens that the subtree at a uniform random vertex converges in distribution to a limit random tree. We introduce some structure theory for such asymptotic fringe distributions and illustrate with many examples.
TL;DR: A process of growing a random recursive tree Tn is studied and the sequence is shown to be a sequence of “snapshots” of a Crump–Mode branching process, which provides a short proof of Devroye's limit law for the height of a random m‐ary search tree.
TL;DR: In this paper, the authors consider the problem of endogeny in the recursive tree process Xi, where the Xi measurable functions of the innovations process (ξi) are given and the Xi are independent copies of the unknown distribution X.
Abstract: In certain problems in a variety of applied probability settings (from probabilistic analysis of algorithms to statistical physics), the central requirement is to solve a recursive distributional equation of the form $X\mathop{=}\limits^{d}\,g((\xi_{i},X_{i}),i\geq 1)$. Here (ξi) and g(⋅) are given and the Xi are independent copies of the unknown distribution X. We survey this area, emphasizing examples where the function g(⋅) is essentially a “maximum” or “minimum” function. We draw attention to the theoretical question of endogeny: in the associated recursive tree process Xi, are the Xi measurable functions of the innovations process (ξi)?
TL;DR: The evtree package is described, which implements an evolutionary algorithm for learning globally optimal classification and regression trees in R, providing unified infrastructure for summaries, visualizations, and predictions.
Abstract: Commonly used classification and regression tree methods like the CART algorithm are recursive partitioning methods that build the model in a forward stepwise search. Although this approach is known to be an efficient heuristic, the results of recursive tree methods are only locally optimal, as splits are chosen to maximize homogeneity at the next step only. An alternative way to search over the parameter space of trees is to use global optimization methods like evolutionary algorithms. This paper describes the evtree package, which implements an evolutionary algorithm for learning globally optimal classification and regression trees in R. Computationally intensive tasks are fully computed in C++ while the partykit package is leveraged for representing the resulting trees in R, providing unified infrastructure for summaries, visualizations, and predictions. evtree is compared to the open-source CART implementation rpart, conditional inference trees (ctree), and the open-source C4.5 implementation J48. A benchmark study of predictive accuracy and complexity is carried out in which evtree achieved at least similar and most of the time better results compared to rpart, ctree, and J48. Furthermore, the usefulness of evtree in practice is illustrated in a textbook customer classification task.