Critical Review for MRCPsych Paper B

Critical review accounts for 50 of the 150 marks in Paper B. It is the single largest section in the paper, yet it is the one candidates are least prepared for. Unlike clinical psychiatry, which you practise daily, critical appraisal is a discrete skill set that requires deliberate study.

This guide covers the statistical knowledge, study design concepts, and appraisal frameworks you need, structured by how frequently each topic appears in the examination.

Statistical Tests: When to Use Which

The exam expects you to know which statistical test is appropriate for a given study design and data type. You are not expected to perform calculations (except for sensitivity, specificity, NNT) but you must interpret the output.

Data type	Two groups (unpaired)	Two groups (paired)	Three+ groups	Association between variables
Continuous (normally distributed)	Independent t-test	Paired t-test	ANOVA	Pearson correlation
Continuous (skewed)	Mann-Whitney U	Wilcoxon signed-rank	Kruskal-Wallis	Spearman correlation
Categorical	Chi-square	McNemar	Chi-square	Chi-square / Fisher exact
Survival data	Kaplan-Meier curves + log-rank test

The most commonly examined distinction is between parametric tests (t-test, ANOVA, Pearson) and non-parametric tests (Mann-Whitney, Kruskal-Wallis, Spearman). The key question is: is the data normally distributed? If yes, use parametric. If no, use non-parametric.

Measures of Effect

These are the calculations most likely to appear. Practise them until they become automatic.

Number Needed to Treat (NNT)

NNT = 1 / Absolute Risk Reduction (ARR). ARR = Control Event Rate (CER) – Experimental Event Rate (EER).

Example: In a trial, 25% of patients on placebo relapsed vs 10% on the drug. CER = 0.25, EER = 0.10. ARR = 0.15. NNT = 1 / 0.15 = 6.7. Round up to 7. You need to treat 7 patients to prevent one relapse.

Number Needed to Harm (NNH)

NNH = 1 / Attributable Risk (AR). AR = EER (adverse) – CER (adverse).

Example: 5% on placebo had sedation vs 20% on the drug. AR = 0.15. NNH = 6.7. For every 7 patients treated, 1 will experience sedation.

Risk Ratio (Relative Risk)

RR = EER / CER. RR of 1 means no effect. RR < 1 means the treatment reduces risk. RR > 1 means the treatment increases risk. The exam often asks you to interpret whether the 95% confidence interval crosses 1 (not statistically significant).

Odds Ratio (OR)

Used in case-control studies. OR = (odds of exposure in cases) / (odds of exposure in controls). OR approximates RR when the outcome is rare (<10%). When the outcome is common, OR overestimates RR.

Sensitivity and Specificity

Sensitivity: True positives / (True positives + False negatives). A sensitive test rules disease out (SnOUT). High sensitivity = few false negatives.
Specificity: True negatives / (True negatives + False positives). A specific test rules disease in (SpIN). High specificity = few false positives.
Positive Predictive Value (PPV): True positives / (True positives + False positives). Depends on prevalence.
Negative Predictive Value (NPV): True negatives / (True negatives + False negatives). Depends on prevalence.

Study Designs Ranked by Evidence Quality

Level	Design	Key features
1a	Systematic review / Meta-analysis of RCTs	Pooled data, forest plot, heterogeneity (I²)
1b	Individual RCT	Randomisation, blinding, intention-to-treat analysis
2a	Cohort study	Exposed vs non-exposed, followed forward. Can calculate RR. Prone to confounding and attrition bias.
2b	Case-control study	Cases vs controls, looks backward. Can calculate OR. Prone to recall and selection bias.
3	Cross-sectional study	Single time point. Can measure prevalence but not incidence. Cannot establish causation.
4	Case series / Case report	Descriptive only. No comparison group. Hypothesis-generating only.

Bias Types You Must Know

Selection bias: Systematic differences between groups being compared. Example: healthier volunteers enrol in the treatment arm.
Information bias (misclassification): Errors in measuring exposure or outcome. Example: recall bias in case-control studies where cases remember exposures differently.
Publication bias: Studies with positive results are more likely to be published. Detected by funnel plot asymmetry.
Attrition bias: Differential dropout between groups. Intention-to-treat analysis mitigates this.
Detection bias: Systematic differences in how outcomes are assessed. Blinding prevents this.
Performance bias: Systematic differences in care provided apart from the intervention. Blinding prevents this.
Confounding: A third variable associated with both exposure and outcome. Example: age confounds the relationship between alcohol and dementia.

Critical Appraisal Frameworks

The exam may ask you to appraise a study using a structured framework. The most common are:

CASP (Critical Appraisal Skills Programme): Three broad questions: (1) Are the results valid? (2) What are the results? (3) Will they help locally? Each has 3–4 sub-questions specific to the study type.
SIGN (Scottish Intercollegiate Guidelines Network): Uses checklists with well-covered/adequately-addressed/poorly-reported/not-applicable ratings. Yields a study quality rating (++, +, or 0).
GRADE (Grading of Recommendations Assessment, Development and Evaluation): Rates the quality of evidence across studies for a given outcome. Starts high for RCTs, low for observational studies, then adjusts up or down based on specific criteria.

Worked Example: Forest Plot Interpretation

A forest plot from a meta-analysis shows individual study results as squares (point estimate) with horizontal lines (95% CI). The diamond at the bottom shows the pooled estimate. Key things to check:

Does the diamond cross the line of no effect (1.0 for RR/OR, 0 for mean difference)? If yes, the overall result is not significant.
What is the I² statistic? <25% = low heterogeneity, 25–50% = moderate, 50–75% = substantial, >75% = considerable. High I² means the studies may be too different to pool meaningfully.
Is the funnel plot symmetrical? Asymmetry suggests publication bias or small-study effects.

PsychStar’s Paper B question bank includes dedicated critical review questions with full teaching cascades covering statistics, study design, and bias identification. Start with 5 free questions at psychstar.io/try.

Critical Review for MRCPsych Paper B: Statistics, Study Design, and Appraisal