How to... choose the right statistical technique

Options:     Print Version - How to... choose the right statistical technique, part 1 Print view

Fundamentals

by Claire Creaser

Start to think about the techniques you will use for your analysis before you collect any data.

It is well worth spending a little time considering how you will analyse your data before you design your survey instrument or start to collect any data. This will ensure that data are collected – and, more importantly, coded – in an appropriate way for the analysis you hope to do. (See How to collect data)

What do you want to know?

The analysis must relate to the research questions, and this may dictate the techniques you should use.

What type of data do you have?

The type of data you have is also fundamental – the techniques and tools appropriate to interval and ratio variables are not suitable for categorical or ordinal measures. (See How to collect data for notes on types of data)

What assumptions can – and can’t – you make?

Many techniques rely on the sampling distribution of the test statistic being a Normal distribution (see below). This is always the case when the underlying distribution of the data is Normal, but in practice, the data may not be Normally distributed. For example, there could be a long tail of responses to one side or the other (skewed data). Non-parametric techniques are available to use in such situations, but these are inevitably less powerful and less flexible. However, if the sample size is sufficiently large, the Central Limit Theorem allows use of the standard analyses and tools.

Techniques for a non-Normal distribution

Parametric or non-parametric statistics?

Parametric methods and statistics rely on a set of assumptions about the underlying distribution to give valid results. In general, they require the variables to have a Normal distribution.

Non-parametric techniques must be used for categorical and ordinal data, but for interval & ratio data they are generally less powerful and less flexible, and should only be used where the standard, parametric, test is not appropriate – e.g. when the sample size is small (below 30 observations).

Central limit theorem

As the sample size increases, the shape of the sampling distribution of the test statistic tends to become Normal, even if the distribution of the variable which is being tested is not Normal.

In practice, this can be applied to test statistics calculated from more than 30 observations.

Image: the Normal distribution function

How much can you expect to get out of your data?

The smaller the sample size, the less you can get out of your data. Standard error is inversely related to sample size, so the larger your sample, the smaller the standard error, and the greater chance you will have of identifying statistically significant results in your analysis.