HomeBlogPaper B
Paper B2026-06-12 · 15 min read

Critical Review for MRCPsych Paper B: Statistics, Study Design, and Appraisal

PS
Written by PsychStar Clinical Team
NHS Consultant Psychiatrist · MRCPsych preparation expert

Critical review accounts for 50 of the 150 marks in Paper B. It is the single largest section in the paper, yet it is the one candidates are least prepared for. Unlike clinical psychiatry, which you practise daily, critical appraisal is a discrete skill set that requires deliberate study.

This guide covers the statistical knowledge, study design concepts, and appraisal frameworks you need, structured by how frequently each topic appears in the examination.

Statistical Tests: When to Use Which

The exam expects you to know which statistical test is appropriate for a given study design and data type. You are not expected to perform calculations (except for sensitivity, specificity, NNT) but you must interpret the output.

Data typeTwo groups (unpaired)Two groups (paired)Three+ groupsAssociation between variables
Continuous (normally distributed)Independent t-testPaired t-testANOVAPearson correlation
Continuous (skewed)Mann-Whitney UWilcoxon signed-rankKruskal-WallisSpearman correlation
CategoricalChi-squareMcNemarChi-squareChi-square / Fisher exact
Survival dataKaplan-Meier curves + log-rank test

The most commonly examined distinction is between parametric tests (t-test, ANOVA, Pearson) and non-parametric tests (Mann-Whitney, Kruskal-Wallis, Spearman). The key question is: is the data normally distributed? If yes, use parametric. If no, use non-parametric.

Measures of Effect

These are the calculations most likely to appear. Practise them until they become automatic.

Number Needed to Treat (NNT)

NNT = 1 / Absolute Risk Reduction (ARR). ARR = Control Event Rate (CER) – Experimental Event Rate (EER).

Example: In a trial, 25% of patients on placebo relapsed vs 10% on the drug. CER = 0.25, EER = 0.10. ARR = 0.15. NNT = 1 / 0.15 = 6.7. Round up to 7. You need to treat 7 patients to prevent one relapse.

Number Needed to Harm (NNH)

NNH = 1 / Attributable Risk (AR). AR = EER (adverse) – CER (adverse).

Example: 5% on placebo had sedation vs 20% on the drug. AR = 0.15. NNH = 6.7. For every 7 patients treated, 1 will experience sedation.

Risk Ratio (Relative Risk)

RR = EER / CER. RR of 1 means no effect. RR < 1 means the treatment reduces risk. RR > 1 means the treatment increases risk. The exam often asks you to interpret whether the 95% confidence interval crosses 1 (not statistically significant).

Odds Ratio (OR)

Used in case-control studies. OR = (odds of exposure in cases) / (odds of exposure in controls). OR approximates RR when the outcome is rare (<10%). When the outcome is common, OR overestimates RR.

Sensitivity and Specificity

  • Sensitivity: True positives / (True positives + False negatives). A sensitive test rules disease out (SnOUT). High sensitivity = few false negatives.
  • Specificity: True negatives / (True negatives + False positives). A specific test rules disease in (SpIN). High specificity = few false positives.
  • Positive Predictive Value (PPV): True positives / (True positives + False positives). Depends on prevalence.
  • Negative Predictive Value (NPV): True negatives / (True negatives + False negatives). Depends on prevalence.

Study Designs Ranked by Evidence Quality

LevelDesignKey features
1aSystematic review / Meta-analysis of RCTsPooled data, forest plot, heterogeneity (I²)
1bIndividual RCTRandomisation, blinding, intention-to-treat analysis
2aCohort studyExposed vs non-exposed, followed forward. Can calculate RR. Prone to confounding and attrition bias.
2bCase-control studyCases vs controls, looks backward. Can calculate OR. Prone to recall and selection bias.
3Cross-sectional studySingle time point. Can measure prevalence but not incidence. Cannot establish causation.
4Case series / Case reportDescriptive only. No comparison group. Hypothesis-generating only.

Bias Types You Must Know

  • Selection bias: Systematic differences between groups being compared. Example: healthier volunteers enrol in the treatment arm.
  • Information bias (misclassification): Errors in measuring exposure or outcome. Example: recall bias in case-control studies where cases remember exposures differently.
  • Publication bias: Studies with positive results are more likely to be published. Detected by funnel plot asymmetry.
  • Attrition bias: Differential dropout between groups. Intention-to-treat analysis mitigates this.
  • Detection bias: Systematic differences in how outcomes are assessed. Blinding prevents this.
  • Performance bias: Systematic differences in care provided apart from the intervention. Blinding prevents this.
  • Confounding: A third variable associated with both exposure and outcome. Example: age confounds the relationship between alcohol and dementia.

Critical Appraisal Frameworks

The exam may ask you to appraise a study using a structured framework. The most common are:

  • CASP (Critical Appraisal Skills Programme): Three broad questions: (1) Are the results valid? (2) What are the results? (3) Will they help locally? Each has 3–4 sub-questions specific to the study type.
  • SIGN (Scottish Intercollegiate Guidelines Network): Uses checklists with well-covered/adequately-addressed/poorly-reported/not-applicable ratings. Yields a study quality rating (++, +, or 0).
  • GRADE (Grading of Recommendations Assessment, Development and Evaluation): Rates the quality of evidence across studies for a given outcome. Starts high for RCTs, low for observational studies, then adjusts up or down based on specific criteria.

Worked Example: Forest Plot Interpretation

A forest plot from a meta-analysis shows individual study results as squares (point estimate) with horizontal lines (95% CI). The diamond at the bottom shows the pooled estimate. Key things to check:

  • Does the diamond cross the line of no effect (1.0 for RR/OR, 0 for mean difference)? If yes, the overall result is not significant.
  • What is the I² statistic? <25% = low heterogeneity, 25–50% = moderate, 50–75% = substantial, >75% = considerable. High I² means the studies may be too different to pool meaningfully.
  • Is the funnel plot symmetrical? Asymmetry suggests publication bias or small-study effects.

PsychStar’s Paper B question bank includes dedicated critical review questions with full teaching cascades covering statistics, study design, and bias identification. Start with 5 free questions at psychstar.io/try.

#critical review#statistics#study design#Paper B#critical appraisal

Put This Knowledge into Practice

Test yourself on 4,600+ adaptive questions that target your blind spots. Try 5 questions free.

Try 5 Free Questions
View pricing →

More Articles

Best MRCPsych Question Banks in 2026: A Consultant’s Review
How to Pass MRCPsych Paper A: The Complete Guide
MRCPsych Paper A vs Paper B: Key Differences