Count data

From Wikipedia, the free encyclopedia

In statistics, count data is a statistical data type describing countable quantities, data which can take only the counting numbers, non-negative integer values {0, 1, 2, 3, ...}, and where these integers arise from counting rather than ranking. The statistical treatment of count data is distinct from that of binary data, in which the observations can take only two values, usually represented by 0 and 1, and from ordinal data, which may also consist of integers but where the individual values fall on an arbitrary scale and only the relative ranking is important.[example needed]

Count variables[edit]

An individual piece of count data is often termed a count variable. When such a variable is treated as a random variable, the Poisson, binomial and negative binomial distributions are commonly used to represent its distribution.

Graphical examination[edit]

Graphical examination of count data may be aided by the use of data transformations chosen to have the property of stabilising the sample variance. In particular, the square root transformation might be used when data can be approximated by a Poisson distribution (although other transformation have modestly improved properties), while an inverse sine transformation is available when a binomial distribution is preferred.

Relating count data to other variables[edit]

Here the count variable would be treated as a dependent variable. Statistical methods such as least squares and analysis of variance are designed to deal with continuous dependent variables. These can be adapted to deal with count data by using data transformations such as the square root transformation, but such methods have several drawbacks; they are approximate at best and estimate parameters that are often hard to interpret.

The Poisson distribution can form the basis for some analyses of count data and in this case Poisson regression may be used. This is a special case of the class of generalized linear models which also contains specific forms of model capable of using the binomial distribution (binomial regression, logistic regression) or the negative binomial distribution where the assumptions of the Poisson model are violated, in particular when the range of count values is limited or when overdispersion is present.

See also[edit]

Further reading[edit]

  • Cameron, A. C.; Trivedi, P. K. (2013). Regression Analysis of Count Data Book (Second ed.). Cambridge University Press. ISBN 978-1-107-66727-3.
  • Hilbe, Joseph M. (2011). Negative Binomial Regression (Second ed.). Cambridge University Press. ISBN 978-0-521-19815-8.
  • Winkelmann, Rainer (2008). Econometric Analysis of Count Data (Fifth ed.). Springer. doi:10.1007/978-3-540-78389-3. ISBN 978-3-540-77648-2.