In order to illustrate the next section of the non-life insurance course, consider the following example^{1}, inspired from http://sciencepolicy.colorado.edu/…. This is the so-called “*Normalized Hurricane Damages in the United States*” dataset, for the period 1900-2005, from Pielke *et al.* (2008). The dataset is available in xls format, so we have to spend some time to import it,

> library(gdata) > db=read.xls( + "http://sciencepolicy.colorado.edu/publications/special/public_data_may_2007.xls", + sheet=1) trying URL 'http://sciencepolicy.colorado.edu/publications/special/public_data_may_2007.xls' Content type 'application/vnd.ms-excel' length 119296 bytes (116 Kb) opened URL ================================================== downloaded 116 Kb perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "fr_CA:fr", LC_ALL = (unset), LANG = "fr_CA.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C").

The problem with excel spreadsheets is that some columns might have pre-specified format (here, losses are with a format 000,000,000 for instance)

> tail(db) Year Hurricane.Description State Category Base.Economic.Damage 202 2005 Cindy LA 1 320,000,000 203 2005 Dennis FL 3 2,230,000,000 204 2005 Katrina LA,MS 3 81,000,000,000 205 2005 Ophelia NC 1 1,600,000,000 206 2005 Rita TX 3 10,000,000,000 207 2005 Wilma FL 3 20,600,000,000 Normalized.PL05 Normalized.CL05 X X.1 202 320,000,000 320,000,000 NA NA 203 2,230,000,000 2,230,000,000 NA NA 204 81,000,000,000 81,000,000,000 NA NA 205 1,600,000,000 1,600,000,000 NA NA 206 10,000,000,000 10,000,000,000 NA NA 207 20,600,000,000 20,600,000,000 NA NA

To get data in a format we can play with, consider the following function,

> stupidcomma = function(x){ + x=as.character(x) + for(i in 1:10){x=sub(",","",as.character(x))} + return(as.numeric(x))}

and let’s convert those values into numbers,

> base=db[,1:4] > base$Base.Economic.Damage=Vectorize(stupidcomma)(db$Base.Economic.Damage) > base$Normalized.PL05=Vectorize(stupidcomma)(db$Normalized.PL05) > base$Normalized.CL05=Vectorize(stupidcomma)(db$Normalized.CL05)

Here is the dataset we will use, from now on,

> tail(base) Year Hurricane.Description State Category Base.Economic.Damage 202 2005 Cindy LA 1 3.20e+08 203 2005 Dennis FL 3 2.23e+09 204 2005 Katrina LA,MS 3 8.10e+10 205 2005 Ophelia NC 1 1.60e+09 206 2005 Rita TX 3 1.00e+10 207 2005 Wilma FL 3 2.06e+10 Normalized.PL05 Normalized.CL05 202 3.20e+08 3.20e+08 203 2.23e+09 2.23e+09 204 8.10e+10 8.10e+10 205 1.60e+09 1.60e+09 206 1.00e+10 1.00e+10 207 2.06e+10 2.06e+10

We can visualize the normalized costs of hurricanes, from 1900 till 2005, with the 207 hurricanes (here the *x*-axis is not time, it is simply the index of the loss)

> plot(base$Normalized.PL05/1e9,type="h",ylim=c(0,155))

As usual, there are two components when computing the pure premium of an insurance contract. The number of claims (or here hurricanes) and the individual losses of each claim. We’ve seen – above – individual losses, let us focus now on the annual frequency.

> TB <- table(base$Year) > years <- as.numeric(names(TB)) > counts <- as.numeric(TB) > years0=(1900:2005)[which(!(1900:2005)%in%years)] > db <- data.frame(years=c(years,years0), + counts=c(counts,rep(0,length(years0)))) > db[88:93,] years counts 88 2003 3 89 2004 6 90 2005 6 91 1902 0 92 1905 0 93 1907 0

On average, we experience about 2 (major) hurricanes per year,

> mean(db$counts) [1] 1.95283

In predictive modeling (here, we wish to price a reinsurance contract for, say, 2014), we need probably to take into account some possible trend in the hurricane occurrence frequency. We can consider either a linear trend,

> reg0 <- glm(counts~years,data=db,family=poisson(link="identity"), + start=lm(counts~years,data=db)$coefficients)

or an exponential one,

> reg1 <- glm(counts~years,data=db,family=poisson(link="log"))

We can plot those three predictions, and get a prediction for the number of (major) hurricanes in 2014,

> plot(years,counts,type='h',ylim=c(0,6),xlim=c(1900,2020)) > cpred1=predict(reg1,newdata=data.frame(years=1890:2030),type="response") > lines(1890:2030,cpred1,col="blue") > cpred0=predict(reg0,newdata=data.frame(years=1890:2030),type="response") > lines(1890:2030,cpred0,col="red") > abline(h=mean(db$counts),col="black") > (predictions=cbind(constant=mean(db$counts),linear= + cpred0[126],exponential=cpred1[126])) constant linear exponential 126 1.95283 3.573999 4.379822 > points(rep((1890:2030)[126],3),prediction,col=c("black","red","blue"),pch=19)

Observe that changing the model will change the pure premium: with a flat prediction, we expect less than 2 (major) hurricanes, but with the exponential trend, we expect more than 4…

This is for the expected frequency. Now, we should find a suitable model to compute the pure premium of a reinsurance treaty, with a (high) deductible, and a limited (but large) cover. As we will seen in class next week, *the* appropriate model is a Pareto distribution (see Hagstrœm (1925), Huyghues-Beaufond (1991) or a survey – in French – published a few years ago).

We can use Hill’s plot to estimate the tail index,

> library(evir) > hill(base$Normalized.PL05)

Clearly, costs of major hurricanes are heavy tailed.

Now, consider an insurance company, in the U.S., with 5% market share (just to illustrate). We will consider there \tilde Y_i= Y_i/20. The losses are given below. Consider a reinsurance treaty, with a deductible of 2 (billion) and a limited cover of 4 (billion),

For our Pareto model, consider only losses above 500 millions,

> threshold=.5 > (gpd.PL <- gpd(base$Normalized.PL05/1e9/20,threshold)$par.ests) xi beta 0.4424669 0.6705315

Keep in mind the 1 hurricane out of 8 reaches that level

> mean(base$Normalized.CL05/1e9/20>.5) [1] 0.1256039

Given that the loss exceeds 500 millions, we can now compute the expected value of the reinsurance contact,

To compute it we can use

> E <- function(yinf,ysup,xi,beta){ + as.numeric(integrate(function(x) (x-yinf)*dgpd(x,xi,mu=threshold,beta), + lower=yinf,upper=ysup)$value+ + (1-pgpd(ysup,xi,mu=threshold,beta))*(ysup-yinf)) + }

[**Nov 5th**] **there is a typo in the previous function, since the threshold should be used, here, as a parameter in the function, if you want to play with that function an see the impact of the threshold (see a more recent post on the same topic, but a different dataset)… but here, we do not change the threshold, so it is not a big deal.**

Now, it is probably time to bring all the pieces together. We might expect a bit less than 2 (major) hurricanes per year,

> predictions[1] [1] 1.95283

and each hurricane has 12.5% chances to cost more than 500 million for our insurance company,

> mean(base$Normalized.PL05/1e9/20>.5) [1] 0.1256039

and given that an hurricane exceeds 500 million loss, then the expected repayment by the reinsurance company is (in millions)

> E(2,6,gpd.PL[1],gpd.PL[2])*1e3 [1] 330.9865

So the pure premium of the reinsurance contract is simply

> predictions[1]*mean(base$Normalized.PL05/1e9/20>.5)* + E(2,6,gpd.PL[1],gpd.PL[2])*1e3 [1] 81.18538

for a cover of 4 billion, in excess of 2.

^{1.This example will be found in the Reinsurance and Extremal Events chapter in the forthcoming Computational Actuarial Science with R, by Eric Gilleland and Mathieu Ribatet.}

Good morning from Mexico Arthur. This is a topic I’ve been searching information and lucky me I found your publication. However, there seems to be a problem with some images/links. Would you be so kind to take a look at this issue? Thanks in advance, Eduardo

Great stuff… will shamelessly copy it at work next week. 🙂

I’d be glad to get more feed-back… Keep in mind that I did not mention confidence intervals, here, and there’s (really) big.

BTW: You can save your trouble of cleaning the data with XLConnect, which will recognise the formatting options used by the workbook.