50 Statistics Terms To Know (With Definitions)

Updated September 30, 2022

Statistics is the analysis, interpretation and presentation of collected data. Statistical analysis is an important component of conducting effective research. Whether you’re a statistics student or use statistical language daily in your career, it can be helpful to understand and review the terminology you encounter in this branch of mathematics. In this article, we list 50 of the most common statistics terms to know, along with their definitions.

50 useful statistics terms

Here’s a list of 50 common statistics terms and their definitions:

1. Alternative hypothesis

An alternative hypothesis is a theory that contradicts a null hypothesis. A null hypothesis is an informed assumption about whether your premise is true or if there’s any relationship between the two terms. If the data you collect demonstrates that your original hypothesis or the null hypothesis was correct, you can reject the alternative hypothesis. Related: Alternative Hypothesis: Definition and When To Use It

2. Analysis of covariance

Analysis of covariance is a tool for evaluating data sets that contain two variables: effect, which is referred to as the variate, and treatment, which is the categorical variable. The categorical variable emerges when one more variable, referred to as the covariate, arises. This analysis increases the accuracy of a study and removes possible bias.

3. Analysis of variance

An analysis of variance (ANOVA) compares the relationship between more than two factors to determine whether there’s a link between them.

4. Average

The average refers to the mean of data. You can calculate the average by adding up the total of the data and dividing it by the number of data points.Related: How To Calculate Average

5. Bell curve

The bell curve, also called the normal distribution, displays the mean, median and mode of the data you collect. It usually follows the shape of a bell with a slope on each side.

6. Beta level

The beta level, or simply beta, is the probability of committing a Type II error in a hypothesis analysis, which entails agreeing to the null hypothesis when it isn’t true.Related: What Is a Positive Correlation in Finance?

7. Binomial test

When a test has two alternative outcomes, either failure or success, and you know what the possibilities of success are, you may apply a binomial test. People use a binomial test to determine if an observed test outcome is different from its predicted outcome.

8. Breakdown point

The breakdown point is a point in which an estimator is no longer useful. A lower breakdown point means the information may not be useful, whereas a higher number means there’s less chance of resistance.

9. Causation

Causation is a direct relationship between two variables. Two variables have a direct relationship if a change in one’s value causes a change in the other variable. In that case, one becomes the cause, and the other is the effect.Related: What Is Causal Research? (With Examples, Benefits and Tips)

10. Coefficient

A coefficient measures a variable by using a multiplier. When conducting research and computing equations, the coefficient is often a numerical value that multiplies by a variable, giving you a coefficient of the variable. If a variable doesn’t have a number, the coefficient is always one.

11. Confidence intervals

A confidence interval measures the level of uncertainty of a collection of data. This is the range in which you anticipate your values to fall within a specific degree of confidence if you repeat the same experiment.

12. Correlation coefficient

The correlation coefficient describes the level of correlation or dependence between two variables. This value is a number between -1 and +1, and if it falls beyond this limit, there’s been a mistake in the measurement of a coefficient.Related: Correlation Coefficient Formula: A Definitive Guide

13. Cronbach's alpha coefficient

Cronbach's alpha coefficient is a measurement of internal consistency. It shows the nature of the relationship between multiple variables in a group of data. Additionally, Cronbach's alpha coefficient increases when you increase the number of items and vice versa.

14. Dependent variable

A dependent variable is a value that depends on another variable to exhibit change. When computing in statistical analysis, you can use dependent variables to make conclusions about causes of events, changes and other translations in statistical research.

15. Descriptive statistics

Descriptive statistics depict the features of data in a study. This may include a representation of the total population or a sample population.Related: 11 Essential Data Science Statistics Concepts

16. Effect size

Effect size is a statistical term that quantifies the degree of a relationship between two given variables. For example, we can learn about the effect of therapy on anxiety patients. The effect size aims to determine whether the therapy is highly successful or mildly successful.

17. F-test

An F-test is any test that uses F-distribution with a null hypothesis. Researchers may employ the F-test to evaluate the parity of two population variances. They use it to assess if the two independent samples selected from a normal population have similar variability or not.

18. Factor analysis

Factor analysis requires condensing a significant number of variables into a smaller number of factors. This method pulls the largest common variance from across all variables and converts it into a single number. The additional analysis uses this number as an index of all factors.

19. Frequency distribution

Frequency distribution is the frequency with which a variable occurs. It provides you with data on how often something repeats.

20. Friedman's two-way analysis of variance

Researchers use Friedman's two-way analysis of variance test to examine if there are statistically meaningful variations between groups when comparing them, using various parameters for each group.

21. Hypothesis tests

A hypothesis test is a method of testing results. Before conducting research, the researcher creates a hypothesis or a theory for what they believe the results will prove. A study then tests that theory.Related: 5 Basic Statistics Tools and How To Choose the Right One

22. Independent t-test

The independent t-test analyzes the averages of two independent samples to see if there’s statistical proof that a related population average or mean differs substantially.

23. Independent variable

In a statistical experiment, an independent variable is one that you modify, control or manipulate in order to investigate its effects. It's called independent since no other factor in the research affects it.

24. Inferential statistics

Inferential statistics is a test you use to compare a certain set of data within a population in a variety of ways. Inferential statistics include parametric and nonparametric tests. When conducting an inferential statistical test, you take data from a small population and make inferences about whether it will provide similar results in a larger population.Related: 17 Jobs That Use Statistics

25. Marginal likelihood

The marginal likelihood is a parameter variable's likeliness to marginalize. Marginalize, in this sense, refers to tracking probabilities of new propositions with already existing probabilities.

26. Measures of variability

Measures of variability, also referred to as measures of dispersion, denote how scattered or dispersed a database is. Four main measures of variability are the interquartile range, range, standard deviation and variance.

27. Median

The median refers to the middle point of data. Typically, if you have a data set with an odd number of items, the median appears directly in the middle of the numbers. When computing the median of a set of data with an even number of items, you can calculate the simple mean between the two middle-most values to achieve the median.

28. Median test

A median test is a nonparametric test that tests two independent groups that have the same median. It follows the null hypothesis that each of the two groups maintains the same median.

29. Mode

Mode refers to the value in a database that repeats the most number of times. If none of the values repeat, there’s no mode in that database.

30. Multiple correlations

Multiple correlations are an estimate of how well you can predict a variable using a linear function of other variables. It uses predictable variables to derive a conclusion.

31. Multivariate analysis of covariance

A multivariate analysis of covariance is a technique that assesses statistical differences between multiple dependent variables. The analysis controls for a third variable, the covariate, and you can use additional variables depending on the sample size.

32. Normal distribution

Normal distribution is a method of displaying random variables in a bell-shaped graph, indicating that data close to the average or mean occur more frequently than data distant from the average or mean value.

33. Parameter

A parameter is a quantitative measurement that you use to measure a population. It’s the unknown value of a population on which you conduct research to learn more.

34. Pearson correlation coefficient

Pearson's correlation coefficient is a statistical test that determines the connection between two continuous variables. Since experts build it based on covariance, they recognize it as the best approach to quantify the relationship among variables of interest.

35. Population

Population refers to the group you’re studying. This might include a certain demographic or a sample of the group, which is a subset of the population.

36. Post hoc test

Researchers perform a post hoc test only after they’ve discovered a statistically relevant finding and need to identify where the differences actually originated.

37. Probability density

The probability density is a statistical measurement that measures the likely outcome of a calculation over a given range.

38. Quartile and quintile

Quartile refers to data divided into four equal parts, while quintile refers to data divided into five equal parts.

39. Random variable

A random variable is a variable in which the value is unknown. It can be discrete or continuous with any value given in a range.

40. Range

The range is the difference between the lowest and highest values in a collection of data.

41. Regression analysis

Regression analysis is an effective method for determining which factors have an effect on a certain variable of interest. The method of conducting a regression allows you to accurately establish the elements that are most important, which ones you may disregard and how these factors interact with one another. While there are many forms of regression analysis, they always focus on how independent variables impact a dependent variable.Related: 7 Types of Statistical Analysis Techniques (With the Statistical Analysis Process)

42. Standard deviation

The standard deviation is a metric that calculates the square root of a variance. It informs you how far a single or group result deviates from the average.Related: How To Calculate Standard Deviation in 4 Steps (With Example)

43. Standard error of the mean

A standard error of mean assesses the likelihood of a sample's mean deviating from the population mean. You can find the standard error of the mean if you divide the standard deviation by the square root of the sample size.

44. Statistical inference

Statistical inference occurs when you use sample data to generate an inference or conclusion. Statistical inference can include regression, confidence intervals or hypothesis tests.

45. Statistical power

Statistical power is a metric of a study's probability of discovering statistical relevance in a sample, provided the effect is present in the entire population. A powerful statistical test likely rejects the null hypothesis.

46. Student t-test

A student t-test is a hypothesis that tests the mean of a small sample with a bell curve where you don’t know the standard deviation. This can include correlated means, correlation, independent proportions or independent means.Related: 10 Jobs for Statistics Majors

47. T-distribution

T-distribution means when the population standard deviation is unknown and the data originates from a bell-curve population, it describes the standardized deviations of the mean of the sample to the mean of the population.

48. T-score

A t-score in a t-distribution refers to the number of standard deviations a sample is away from the average.

49. Z-score

A z-score, also known as a standard score, is a measurement of the distance between the mean and data point of a variable. You can measure it in standard deviation units.

50. Z-test

A z-test is a test that determines if two populations' means are different. To use a z-test, you need to know the differences in variances and have a large sample size.

Data scientists & statisticians