1.2 Types of variables 3
Answering such questions is subject to pitfalls and problems. This book aims
to point these out and outline useful tools that have been developed to aid in
providing answers.
Modeling is not an end in itself, rather the aim is to provide a framework for
answering questions of interest. Different models can, and often are, applied
to the same data depending on the question of interest. This stresses that
modeling is a pragmatic activity and there is no such thing as the “true” model.
Models connect variables, and the art of connecting variables requires an
understanding of the nature of the variables. Variables come in different forms:
discrete or continuous, nominal, ordinal, categorical, and so on. It is impor-
tant to distinguish between different types of variables, as the way that they
can reasonably enter a model depends on their type. Variables can, and often
are, transformed. Part of modeling requires one to consider the appropriate
transformations of variables.
1.2 Types of variables
Insurance data is usually organized in a two-way array according to cases and
variables. Cases can be policies, claims, individuals or accidents. Variables
can be level of injury, sex, dollar cost, whether there is legal representation,
and so on. Cases and variables are flexible constructs: a variable in one study
forms the cases in another. Variables can be quantitative or qualitative. The
data displayed in Figure 1.1 provide an illustration of types of variables often
encountered in insurance:
• Claim amount is an example of what is commonly regarded as continuous
variable even though, practically speaking, it is confined to an integer num-
ber of dollars. In this case the variable is skewed to the right. Not indicated
on the graphs are a small number of very large claims in excess of $100 000.
The largest claim is around $4.5 million dollars. Continuous variables are
also called “interval” variables to indicate they can take on values anywhere
in an interval of the real line.
• Legal representation is a categorical variable with two levels “no” or “yes.”
Variables taking on just two possible values are often coded “0” and “1” and
are also called binary, indicator or Bernoulli variables. Binary variables indi-
cate the presence or absence of an attribute, or occurrence or non-occurrence
of an event of interest such as a claim or fatality.
• Injury code is a categorical variable, also called qualitative. The variable
has seven values corresponding to different levels of physical injury: 1–6
and 9. Level 1 indicates the lowest level of injury, 2 the next level and so on
up to level 5 which is a catastrophic level of injury, while level 6 indicates
death. Level 9 corresponds to an “unknown” or unrecorded level of injury
© Cambridge University Press www.cambridge.org
Cambridge University Press
978-0-521-87914-9 - Generalized Linear Models for Insurance Data
Piet de Jong and Gillian Z. Heller
Excerpt
More information