高级检索
  实用休克杂志  2018, Vol. 2Issue (5): 316-320  

引用本文 [复制中英文]

Peter Pang. Statistics made easy[J]. Journal of Practical Shock, 2018, 2(5): 316-320.

Corresponding author

Peter Pang, E-mail:pangkmp@ha.org.hk

History

Received date: 2018-06-22
Statistics made easy
Peter Pang     
Yan Chai Hospital, Tsuen Wan, Hong Kong

Before you start your own clinical trial, you should empower yourselves the literature appraisal tool. Below is the first chapter of a whole series of evidence-based teaching manual. After reading the manual, you will be able to appraise other authors' articles with confidence. Chapter one deals with simple statistics for clinicians. Normal distribution and probability of " 1 in 20" are fundamental to statistical testing. Central limit theorem suggests sample size of at least 30. Forest plot, point estimate and confidence intervals are explained, together with the difference between statistical significance and clinical significance. Student t test is used in parametric testing while chi square test is used in Non-parametric test.

Fundamentals of Statistics

Evidence-based medicine deals with evidence. Evidence comes with data. Data comes from results of clinical trials. And we use statistical tools to deal with data. Below are three fundamental concepts of statistics, namely,

(1) Probability (P) and "1 in 20 rule"

Take an example of rolling a dice. We make a bet on 1. When we roll a dice, the "probability of getting 1 by chance" is 1 in 6. (P) The "probability of NOT getting 1 by chance" is 5 in 6. (1-P) We then have an equation of odds. Odds=$\frac{P}{{1 - P}}$

In simple terms, the odds of the "probability of getting 1 by chance" over the " probability of NOT getting 1 by chance" will be 1/6 divided by 5/6, which is equal to 1 in 5. (Odds= 1/5) Is it likely or unlikely to occur if we get 1 and win the bet, knowing that the chance is 1 in 6? People may have different opinions. What if we lower the probability from 1 in 6 to 1 in 20?Is it likely or unlikely to occur by chance?In statistical hypothesis testing, we choose a significant probability threshold that the event to occur by chance alone is 1 in 20. (P=0.05). Meaning that it is very unlikely that event can occur by chance alone! And most people will accept.

(2) Central Limit theorem and Normal distribution

In statistical hypothesis testing, we use normal distribution curve/bell curve and formula with P < 0.05 as statistical significance in calculation. Hence when we collect our samples, how can we be assured that we can use the same statistical tools to work on our samples? Here comes the central limit theorem (CLT). CLT states that the sample mean will be approximately normally distributed for large sample sizes, regardless of the distribution from which we are sampling. The distribution of the sample mean tends toward the normal distribution as the sample size increases. The sample mean can be considered normally distributed if the sample size is at least 30. So to speak, we can then apply normal distribution formula to our data. And our sample size should be at least 30!

(3) Continuous data and dichotomous data

Data can be broadly classified as continuous data and dichotomous data. For example anti-hypertensive drug can lower blood pressure by 20 mm Hg. That is continuous data. We then use Parametric test. The common one is student t test. The same drug can lower the systolic blood pressure to less than 140 mm Hg. That is dichotomous data. We then use Non-parametric test. The common one is Chi square test.

Descriptive statistics

Descriptive statistics involve the summarization of collection of data from the observations in a clear and understandable way. Independent variables are those that are manipulated whereas dependent variables depend on independent variables. Data are divided into continuous or dichotomous.

2×2 Table for dichotomous data
(1) Central tendency

(1) Mean = arithmetic mean = average

Mean is a good measure of central tendency for roughly symmetric distributions but can be misleading in skewed distributions since it can be greatly influenced by extreme values. Therefore, other statistics such as median may be more informative.

(2) Median is the middle of a distribution: half the scores are above the median and half are below the median. The median is less sensitive to extreme scores than the mean and this makes it a better measure than the mean for highly skewed distributions.

The mean & median are equal in symmetric distribution. The mean is higher than the median in positively skewed distribution and lower than the median in negatively skewed one. When there is an odd number of numbers, the median is simply the middle number. For example, among the numbers 1, 4, 7, the median is 4.

When there is an even number of numbers, the median is the mean of the two middle numbers. Thus, among the numbers 1, 4, 7 & 10, the median is (4+7) /2= 5.5.

(3) Mode is the most frequently occurring score in a distribution.

(2) Skewness

A distribution is skewed if one of its tails is longer than the other. Generally, the mean is larger than the median in positively skewed distributions and less than the median in negatively skewed distributions.

 
(3) Spread or Dispersion

(1) Variance (σ2)

Variance is defined as the average of the squared differences from the mean. Variance measures how much the data are scattered about their mean.

It also equals to the standard deviation squared. ${{\rm{ \mathsf{ σ} }}^2} = \frac{{\sum {{{\left( {{\rm{X}} - {\rm{ \mathsf{ μ} }}} \right)}^2}} }}{{\rm{N}}}$"Variance" and "Magnitude of effect (Mean)" define a normal distribution curve. These two characteristics allow us to differentiate two different curves!

  For curve A, there is narrower variance with smaller mean. For curve B, it is flatter than curve A with greater mean.

(2) Standard deviation, (σ) equals to the square root of variance. S. D of 2 includes 95% of population. The outliner includes 1 in 20 of the population.

 
Inferential statistics

Inferential statistics involve the drawing of inferences about a population from a sample with regard to some characteristics of interest. 2 main methods used in inferential statistics are estimation and hypothesis testing. In estimation, the variable of the sample is described by a point estimate & a confidence interval around the point estimate. In hypothesis testing, null hypothesis is put forward and it is determined whether the data are strong enough to reject it.

(1) Normal distribution

In a normal distribution, 66.7% of the scores are within 1 standard deviation (SD) of the mean, ~ 95% of the scores are within 2 SD, & ~ 99.7% of the scores are within 3 SD.

(2) Central Limit Theorem

The theorem states that the sum of a large number of independent & identically-distributed random variables will be approximately normally distributed (i.e., following a Gaussian distribution, or bell-shaped curve) if the random variables have a finite variance. This is a powerful result, as it allows properties of the normal distribution to be applied to samples from non-Normal parent population (parent population without normal distribution).

(3) Point estimate

It is a one number summary of all data. An estimate of the true parameter value is made using the sample data. This is called a point estimate or a sample estimate. The results (e.g. mean, weighted difference, odds ratio, relative risk or risk difference) obtained in a sample (a study or a meta-analysis) which are used as the best estimate of what is true for the relevant population from which the sample is taken. For dichotomous outcomes, point estimates such as Odds Ratio, Relative Risk, Absolute Risk and NNT are used; for continuous data, mean difference in treatment is used. Point estimate indicates the magnitude of the treatment effect.

(4) Interval estimate-Confidence intervals (CI)

CI establishes a range & specifies the probability that this range encompasses the true population mean. This gives us a range of values around the mean where we expect the " true" (population) mean is located with a given level of certainty. 95%CI="If samples of the same size are drawn repeatedly from a population, and a confidence interval is calculated from each sample, then 95% of these intervals should contain the population mean."

95%CI=point estimate+/-1.96 SEM (where SEM = standard error of mean=$\rm{ \mathsf{ σ} } /\sqrt {\rm{n}} $)

 
(5) Hypothesis testing

Assume null hypothesis H0 (that there is no difference between test & control groups) is true. Set up the alternate hypothesis Ha (that there is difference between the 2 groups).

Decide for a 2 tailed test or a 1 tailed test. Select the appropriate test statistic. Select the level of significance for the test = the alpha value, usually set at 0.05 Data analysis & calculate the p value with the statistical test selected. The P value can be regarded as the probability that the observed result is due to chance alone. If P value < α (P < 0.05), it is statistically significant & the null hypothesis is rejected.

If P value≥α (P≥ 0.05), then it is not statistically significant, & the null hypothesis is accepted.

(1) Type Ⅰ error (α):Wrong rejection of the null hypothesis when it is true; = false positive error.Similar to " convicting an innocent person".Probability of making a type Ⅰ error=Significance level of a statistical test=α value.Usually, α is set at 0.05.After data collection from study, if the calculated P value < α(0.05), the result is statistically significant, i.e., the difference could not be explained by chance alone.

(2) Type Ⅱ error(β):

= Failure to reject the null hypothesis when it is false;

= false negative error.

Similar to " letting a guilty person to go free".

True difference in outcome:H0 is false No difference in outcome:H0 is true
Reject H0=conclude there is some difference; positive test True outcome difference:True positive Type Ⅰ error
False positive
Fail to reject H0=conclude there is no difference; negative test Type Ⅱ error
False negative
No outcome difference:True negative
  There is a tradeoff between Type Ⅰ and Type Ⅱ errors.

(3) Power

Power of a statistical test is the probability that the test will reject H0 when H0 is false (i.e.the probability of not committing a Type Ⅱ error, i.e., the probability of finding there is difference between the test & control group).The probability of a Type Ⅱ error occurring is referred to as the false negative rate (β).Therefore power is equal to 1-β.A generally accepted power is 80% or 0.8.If a more accurate result is needed, the power is set at 0.9.

There are 3 factors that influence the power of a study:①Sample size: The larger the sample size, the larger the power, the more likely you are to reject H0.②Effect size: The larger the effect size required, the larger the power needed, the more likely you are to reject H0.For a fixed power, we need a larger sample size to detect a smaller difference between 2 groups, i.e.a smaller effect size.③α value: the higher the α level, the more likely you are to reject H0.E.g., for a higher value of 0.1, you are easier to find a p value < 0.1 (let's say 0.07, not < 0.05) to have a statistically significant result.

(4) One-tailed test vs.two-tailed test

A probability computed considering differences in both directions is called a " two-tailed" probability.The name makes sense since both tails of the sampling distribution are considered.There are situations in which a researcher is concerned only with differences in one direction.One-tailed tests have lower Type Ⅱ error rates & more power than do two-tailed tests.Probability values for one-tailed tests are always one half the values for two-tailed tests as long as the effect is in the specified direction.

Statistical tests

1.Parametric Tests make the assumption that samples are normally distributed.When these assumptions are met, parametric tests are more powerful than their nonparametric counterparts & thus are preferable.Examples are:

(1) t-tests of the difference of means

(2) Normal curve z-tests of differences of means and proportions

The t test is used for studies involving continuous data.It assesses whether the means of two groups are statistically different from each other.The formula for the t-test is a ratio: t=(difference between two means)/(standard error of the difference).where standard error of the difference is a measure of the variability or dispersion of the data.

The t-test, one-way Analysis of Variance (ANOVA) and a form of regression analysis are mathematically equivalent and would yield identical results.

2.Non-parametric Tests do not assume the normal distribution.They compare medians rather than means; thus, they will limit the influence of the outliers, if present. They are used when:

(1) the sample sizes are small, thus normal distribution cannot be assumed.

(2) the study involves ranked data (not a normal distribution).

Typical examples are:Chi-square test for 2x2 table with large sample size; Fisher's exact test for 2 x 2 & larger tables with smaller size (a cell < 5).

Chi-square test (Χ2) is a statistical test whereby variables are categorized to determine whether a distribution of scores is due to chance or experimental factors.It is used for studies involving proportions or frequencies of dichotomous data.It requires a sufficient sample size for the Χ2 approximation to be valid.As a rule of thumb, avoid using Χ2 if any expected cell is less than 5.Yates' correction is an arbitrary, conservative adjustment to Χ2 test when applied to tables with one or more cells with frequencies less than 5.If Yates' correction for continuity is to be applied, due to cell counts below 5, the calculation is the same except for each cell, subtract an additional 0.5 from the difference of O-E, prior to squaring and then dividing by E.Note chi-square must be calculated on actual count data, not substituting percentages, which would have the effect of pretending the sample size is 100.The formula for Χ2 test is:${{\rm{X}}^2} = \sum {\frac{{{{\left( {{\rm{E}} - {\rm{O}}} \right)}^2}}}{{\rm{E}}}} $

3.Choice of parametric tests & non-parametric tests:Choice of parametric or non-parametric tests depends on the types of scale in the data collected.There are 4 types of scales to record data:

(1) Nominal: name, category, e.g.gender (male or female), race.

(2) Ordinal: order, rank, e.g.stages of cancer (first, second, third & fourth).

(3) Interval: only the difference between 2 levels has meaning, e.g.temperature (℃ or F scales).We cannot say 20℃ is twice as hot as 10℃.Zero or ratio has no meaning.

(4) Ratio: continuous data in which zero or ratio has its own meaning, e.g.weight, length.

Nominal & ordinal data use nonparametric tests, whereas interval & ratio use parametric tests.

Table 3 Comparison of parametric & nonparametric tests
Statistical significance vs.clinical significance

Statistically significant result may not be clinically significant, e.g.a test drug is just slightly better than placebo in a study involving a huge sample size. (For huge sample size, result tends to be statistically significant!) Vice versa, clinical significance may not be statistically significant, e.g.a small trial gives an alarming clinical result.Graph illustrating clinical & statistical significance:

  Vertical line: RR = 1 the line of no effect
A has both clinical & statistical significance. B has clinical significance (as point estimate < 1) but not statistical significance (CI includes 1). C has little or no clinical significance (as point estimate slightly >1) but no statistical significance (CI includes 1). C has the narrowest 95% CI, thus, it is the most precise study (the largest sample size).

From the above graph, we should not discard the result of B just because it does not have any statistical significance. The true effect may lie within the boundary of the confidence interval.Further study with a larger but optimal sample size may show statistical significance (larger sample size will narrow CI so that CI will not include the line of no effect). Or else, we may consider to change the study design if there is any inadequacy.