The Lindy Effect: The Concept and the Math

Nassim N. Taleb recently posted his formalization of the Lindy effect. I wrote this piece to elaborate on his ideas. I used R to plot everything and you can find the code here, but I also provide example Mathematica code.

15 min readFeb 1, 2021

An outline of what I want to discuss:

A conceptual introduction to the Lindy effect
The Lindy effect as a decreasing hazard function
Brownian motion: Strong vs. weak Lindy
Does Lindy require power laws? The Weibull and the Gamma distributions
Intuition for the effect of tails: Killing Lindy with a reflecting barrier
Conclusion

1. Introduction

The Lindy effect is the idea that the remaining life expectancy of a nonperishable thing increases with age — the older it is, the more likely it is to survive. Nonperishable means that the thing we are considering does not have an organic limit to its lifetime (e.g., technologies, ideas, etc.). So Beethoven’s music is more likely to survive than Taylor Swift’s, and the Bible is more likely to survive than Harry Potter. The idea was first explored by Benoit Mandelbrot in The Fractal Geometry of Nature (1982) and popularized by Nassim N. Taleb in his 2012 book Antifragile. Taleb provided a framework to think about the Lindy effect, explaining that it implies antifragility through time. Whereas fragile things are harmed by stress / volatility, antifragile things benefit from it. Time being a stressor, things that show the Lindy effect benefit from time. Taleb wrote that without a natural upper bound, the distribution of an event time “…is constrained only by fragility” and follows a power law (Antifragile, p. 317 of the 2014 publication). Under these circumstances, the remaining life expectancy is proportional to past survival. For example, the life expectancy of a 100-year-old technology is expected to be ten times as long as the life expectancy of a 10-year-old technology. However, life expectancy is a “probabilistically derived average” (Antifragile, p. 319), meaning we are not talking about a natural law.

For many years, the Lindy effect did not have an established mathematical formalization. The statistician John D. Cook pointed out in 2012 that Lindy meant a decreasing hazard function. This paper analyzed the Pareto distribution using the same logic, showing that it implies the Lindy effect. I’ll explain the meaning of a hazard function and how to compute one below. But the idea of using hazard functions to formalize the Lindy effect didn’t take off until recently. A few years ago, Taleb started to use the stopping time of Brownian motion to explore the Lindy effect. In his recent work that I linked above, he analyzed the stopping time distribution using survival and hazard functions. It is important to understand what he did there. So, I’ll first describe the functions that define the Lindy effect. Then I’ll give an overview of what Brownian motion is and why it makes sense to use it when studying Lindy. After looking at the behavior of Lindy with Brownian motion, we’ll analyze some probability distributions and see that under certain conditions, heavy tailed lifetimes that do not follow a power law can still show the Lindy effect.

2. The Lindy Effect as a Decreasing Hazard Function

Having defined the Lindy effect conceptually, let’s define it mathematically. A thing obeys the Lindy effect if the conditional expectation of its remaining lifetime beyond a time point, given that it has survived until that point, increases (Eliazar, 2017). Whether or not this is the case can be tested with survival analysis (Aalen & Gjessing, 2001). Survival analysis examines the time between a starting point and an event such as death. The aim is to analyze the probability distribution of this time, using survival and hazard functions. If the conditional probability of survival increases, then the conditional probability of dying in a time interval, given survival until then, should decrease. If we divide this probability by the length of the time interval we are considering, we get the hazard function, also called the “force of mortality”, or the instantaneous rate of death/failure. Writing this out formally will hopefully clarify what I just said. If X is the time of death, we are interested in:

Note that since this is a rate and not a probability, it can take values larger than 1. The hazard rate can be computed using the following expression:

where f(t) is the density function and S(t) is the survival function. The survival function of a distribution evaluated at a value gives the probability that the random variable will be larger than that value, P(T > t). The survival function is given by

where F(t) is the cumulative distribution function (CDF). The CDF gives the probability that the lifetime will take a value less than or equal to a given time point, so the survival function is the complement of the CDF. Integrating the hazard function gives the cumulative hazard function H(t), which is the risk of death accumulated until a given time t. Note the following identities that will be helpful when we derive hazard functions:

The density function is the derivative of the CDF, which is the complement of the survival function, so when we differentiate the survival function and change signs, we get back the density function. The last identity follows from differentiating a logarithm. Now, we write the survival function in terms of the cumulative hazard:

To sum up, the hazard function gives the rate of death immediately after a point, given survival until that point. A decreasing hazard function means that the chances of survival are improving over time — the Lindy effect. Often a hazard function is not monotonic but has an inverted U shape; it increases until it peaks at a value, then decreases asymptotically. This means that Lindy needs time to kick in. Taleb called a monotonically decreasing hazard function “the strong Pareto property”. For simplicity’s sake, let’s call it “strong Lindy” when we have an entirely non-increasing hazard function and “weak Lindy” when it first increases and then decreases.We will now look at some examples of hazard functions, starting with the stopping time of Brownian motion.

3. Brownian Motion and the Lindy Effect

Brownian motion (BM) is the stochastic motion of particles in fluid. It was discovered by the botanist Robert Brown while observing the behavior of pollen grains in water. He described this behavior as irregular and seemingly random. Later, Einstein showed that the behavior of these particles was indeed random and independent of past motion. Although the original significance of Brownian motion was in proving the existence of atoms and molecules, the random fluctuations it describes can be observed in many phenomena. For example, Brownian motion is used in finance to model stock prices. Brownian motion have three main forms: standard Brownian motion, arithmetic Brownian motion (i.e., drifted BM) (ABM) and geometric Brownian motion (GBM). I’ll briefly describe each.

A real valued stochastic process W(t) is a standard Brownian motion (also called a Wiener process) if it has the following properties (Dahl, 2010):

The process starts at W(0) = 0.
The process has independent increments: for all t > 0, the increment W(t+s)–W(t) is an independent random variable.
The increments are normal: W(t+s)–W(t)~N(0, s).
The process is continuous in time.

A stochastic process S(t) is an Arithmetic Brownian motion if it follows the stochastic differential equation (SDE):

with initial value S(0) = s(0). This has the solution

where mu is the drift term and will be negative in our case. Having negative drift means that despite the randomness in each step, the process has the general tendency to go down. If it had positive drift, it would go up on average. In contrast, the standard Brownian motion does not have an average tendency in either direction. It is called arithmetic Brownian motion because the drift term scales only with the time increment, but not the current value. So, it affects the process additively. Sigma is the diffusion term and scales the volatility. Geometric Brownian motion is the solution to the SDE

which is

Unlike ABM, GBM is a multiplicative process; whereas the drift term in ABM is constant and is added to the current value at each time step, in GBM it is linear and is multiplied by the current value. The standard BM always starts from 0 but ABM and GBM do not have to.

The relevance of Brownian motion is that it is useful in modeling death / failure times because the time of death can be represented as the stopping time of Brownian motion. The stopping time of Brownian motion is defined as the first time the process reaches a threshold value B:

The threshold B is called the absorbing barrier. Standard BM and ABM with absorbing barriers are visualized in Figure 1 and Figure 2:

Figure 1. Standard Brownian motion with an absorbing barrier (the red line). 100 sample paths simulated for 1000 time points.

Figure 2. Arithmetic Brownian motion with an absorbing barrier (the red line). 100 sample paths simulated for 1000 time points. Since drift is negative, we are approaching the barrier from above.

The stopping time distributions of standard BM and ABM with constant absorbing barriers have well known solutions. The stopping time of standard BM has a Levy distribution. For the derivation, checkout Kyle Siegrist’s material here. The proof requires an understanding of the reflection principle, which is clearly explained in this video. The stopping time of ABM has an inverse Gaussian distribution (IG). Several proofs are given in this Stack Exchange post. GBM can be transformed to ABM as we will see below.

The Levy distribution has the PDF:

where mu is the location parameter and c is the scale parameter. The relationship between the density and the level of the absorbing barrier is c=B². At the tail, the density decays following a power law, meaning the Levy distribution is a fat tailed distribution. Figure 3 shows the density and how it changes for different values of the scale parameter:

Figure 3. The density of the Levy distribution for different scale parameters.

The CDF of Levy is:

which does not have a closed form. So, the hazard function becomes

When we plot this function for increasing values of x (which denotes time since we are talking about the distribution of lifetimes), we get an inverse U (Figure 4):

Figure 4. The hazard function of the Levy distribution increases until it reaches a peak, and then decreases asymptotically.

This is what Taleb found as well. Example Mathematica code for Levy with location = 0, scale = 1:

hf3[\[Sigma]_, x_] = HazardFunction[LevyDistribution[\[Mu] = 0, \[Sigma]], x]
Plot[Table[hf3[\[Sigma], x], {\[Sigma], {1}}] // Evaluate, {x, 0, 3}]

So we have a “pre-Lindy” period as Taleb calls it. We noted that the asymptotic behavior of the density is a power law that is approximately:

which we can integrate to reach the survival function in a closed form

and get the hazard function:

which is what Taleb got via a shorter route. This is a monotonically decreasing function, which fits what Figure 4 shows — strong Lindy kicks in after a period of time.

What I don’t understand is Taleb’s claim that “any amount of negative drift causes the stopping time distribution to exit the power law class, hence lose the “Lindy” attribute” (Remark 8 in p. 54). True, drift does make the stopping time distribution exit the power law class, but we don’t really lose the Lindy attribute because the hazard function is qualitatively the same — an inverted U. So we didn’t have strong Lindy in the first place and have weak Lindy in both cases. Why is that? As mentioned, drifted Brownian motion has an IG distribution, characterized by a mean and a shape parameter. Specifically, the stopping time distribution of ABM with negative drift (Primozic, 2011) and an absorbing barrier B < S(0) is:

where mu is the drift and sigma squared is the diffusion term squared. In the case of GBM, if f Y(t) is a GBM, then X(t)=log Y(t) is an ABM. So the distribution of the GBM stopping time with the absorbing barrier L < S(0) is again IG, but log transformed parameters (Primozic, 2011; see the solution to the SDE for the new drift term).

The PDF of the IG is

and the CDF is

This doesn’t have a closed form so we will plot the hazard function from the formula h(x) = f(x)/(1-F(x)). Figure 5 shows the density and Figure 6 shows the hazard function of the IG distribution for different values of mean and shape.

Figure 5. The density of the Inverse Gaussian distribution.

Figure 6. The hazard function of the Inverse Gaussian distribution.

As we can see in Figure 6, the time at which Lindy kicks in depends on the mean and the shape of the distribution, which in turn depend on the drift and the distance between the starting value and the absorbing barrier. But qualitatively, both drifted BM and standard BM show weak Lindy. We now look at some other distributions that can be relevant for the Lindy effect.

4. Lindy Without Power laws

The Weibull distribution is popular in fields that use survival analysis, such as actuarial science and engineering. The reason is that it can model all kinds of hazard functions, whether increasing, decreasing, or constant. It all depends on the value of its shape parameter, as we’ll see in a second. The Weibull distribution can also have heavy tails and be useful in modeling extreme observations. But its tails are not as heavy as those of a power law distribution, meaning that extreme values are not as likely under a Weibull model as they are under a power law model (Kizilersu, Kreer, & Thomas, 2018).

We could derive the survival and then the hazard function of the Weibull distribution by integrating its PDF, but there is a more informative way. The Weibull distribution is in fact derived from a specific hazard function (Hogg, Tanis, & Zimmerman, 2013). So we’ll derive its PDF from this hazard function, rather than the other way around. We’ll also see that the hazard function decreases monotonically for certain parameter values. Define the cumulative hazard function:

We differentiate to get the hazard function:

Recalling the identities noted above, the survival function is

Since h(x) = f(x)/S(x), f(x) = h(x)S(x). So, we get

which is the density of the Weibull distribution with shape parameter kappa and scale parameter lambda. The density is visualized in Figure 7 and the hazard function is visualized in Figure 8 for different values of shape. Notice that we have strong Lindy with shape < 1. For shape = 1, Weibull is reduced to the exponential distribution, which is memoryless and therefore has constant hazard.

Figure 7. The density of the Weibull distribution for different parameter values.

Figure 8. The hazard function of the Weibull distribution for different shape values. For shape < 1, we have strong Lindy.

The Gamma distribution is also flexible in that it allows the modeling of various hazard functions, but it is not as widespread in the reliability literature as the Weibull distribution. Like Weibull, Gamma can give strong Lindy for certain parameter values. The Gamma distribution models the waiting time until a given number of events occur, and k stands for the number of events. It also has a scale parameter lambda, which is the rate of events. Gamma has a decreasing hazard function when k < 1. But how can you have fewer than 1 event? When k < 1 — or not an integer generally — it can be interpreted as the ability of the system to resist shocks, which makes sense if we consider that for integers, it is the number of shock events that happen before the system fails. The Gamma distribution can be derived from its survival function but the simple expression for the survival function doesn’t work with k < 1, so we will have to integrate the PDF using something called the upper incomplete Gamma function. I just wanted to mention the nice case because if you are interested, there is a great explanation by Aerin Kim. I also recommend reading the series she wrote on the Poisson and Exponential, which are necessary to understand the Gamma distribution.

The Gamma PDF is

where

is the Gamma function. The density is shown in Figure 9.

Figure 9. The Gamma distribution for different parameter values.

To get the survival function we integrate the PDF:

To prevent any confusion, x is the time variable and t is just a dummy variable. We take the constants out:

We bring the function inside the integral to the familiar form of the Gamma function above by changing variables. Let

We rewrite the result with the new variables and by changing the lower limit of integration:

where Gamma(k, lambda*x) is the upper incomplete Gamma function. We get the hazard from the usual formula:

which decreases monotonically for k < 1 (Figure 10).

Figure 10. The hazard function of the Gamma distribution for different values of the shape parameter. For k < 1, it gives strong Lindy.

5. Intuition for the Effect of Tails: Adding a Reflecting Barrier

Although the Lindy effect does not require a power law, it does need heavy tails. Let’s go back to Brownian motion for a second. What happens if we add a reflecting barrier above? The reflecting barrier doesn’t kill the process but simply keeps its value the same until it goes down, thereby bounding the lifetimes from above. This cuts the tail, presumably putting the distribution in the exponential class. I couldn’t find the density of the stopping time, so I simulated it with GBM and created a histogram (Figure 11). The stopping time distribution of a GBM with an absorbing barrier below and a reflecting barrier above has a much lighter tail:

Figure 11. The frequency distribution of GBM stopping times when bounded by a reflecting barrier.

Then I used the “bshazard” package (Rebore, Salim, & Reilly, 2018) to get a non-parametric estimation of the hazard rates (Figure 12). As we can see, the hazard rates fluctuate around a constant level for a while and then shoot up at a point when practically all the sample paths are dead. So no Lindy.

Figure 12. The estimated hazard function of the GBM stopping times when bounded by a reflecting barrier.

6. Conclusion

Let’s summarize. We first reviewed the definition of the Lindy effect, which means the remaining life expectancy increases conditional on age. Taleb and others have pointed out that power law distributions like Pareto have this property. This concept is mathematically captured by a decreasing hazard function, which means that the conditional rate of death goes down over time. Brownian motion with an absorbing barrier arises as a natural choice to model this phenomenon, as it is a stochastic process with a definite end, just like life. Analyzing the stopping time of Brownian motion gives two distributions: Levy and the inverse Gaussian. The former is a fat tailed distribution with asymptotic power law behavior, the latter is not. Both have an inverse-U shaped hazard rate, meaning Lindy takes some time to kick in. We called this weak Lindy. Strong Lindy is a non-increasing hazard function. We then showed that heavy tailed distributions that do not follow a power law, such as the Weibull and the Gamma distributions for certain parameter values, can give strong Lindy as well. Finally, we added a reflecting barrier to BM, showing that bounding the lifetimes from above kills the Lindy effect. There is nothing new there – it just gives good intuition.