Objective To evaluate the comparative effectiveness and safety of analgesic medicines for acute non-specific low back pain.
Design Systematic review and network meta-analysis.
Data sources Medline, PubMed, Embase, CINAHL, CENTRAL, ClinicalTrials.gov, clinicialtrialsregister.eu, and World Health Organization’s International Clinical Trials Registry Platform from database inception to 20 February 2022.
Eligibility criteria for study selection Randomised controlled trials of analgesic medicines (eg, non-steroidal anti-inflammatory drugs, paracetamol, opioids, anti-convulsant drugs, skeletal muscle relaxants, or corticosteroids) compared with another analgesic medicine, placebo, or no treatment. Adults (≥18 years) who reported acute non-specific low back pain (for less than six weeks).
Data extraction and synthesis Primary outcomes were low back pain intensity (0-100 scale) at end of treatment and safety (number of participants who reported any adverse event during treatment). Secondary outcomes were low back specific function, serious adverse events, and discontinuation from treatment. Two reviewers independently identified studies, extracted data, and assessed risk of bias. A random effects network meta-analysis was done and confidence was evaluated by the Confidence in Network Meta-Analysis method.
Results 98 randomised controlled trials (15 134 participants, 49% women) included 69 different medicines or combinations. Low or very low confidence was noted in evidence for reduced pain intensity after treatment with tolperisone (mean difference −26.1 (95% confidence intervals −34.0 to −18.2)), aceclofenac plus tizanidine (−26.1 (−38.5 to −13.6)), pregabalin (−24.7 (−34.6 to −14.7)), and 14 other medicines compared with placebo. Low or very low confidence was noted for no difference between the effects of several of these medicines. Increased adverse events had moderate to very low confidence with tramadol (risk ratio 2.6 (95% confidence interval 1.5 to 4.5)), paracetamol plus sustained release tramadol (2.4 (1.5 to 3.8)), baclofen (2.3 (1.5 to 3.4)), and paracetamol plus tramadol (2.1 (1.3 to 3.4)) compared with placebo. These medicines could increase the risk of adverse events compared with other medicines with moderate to low confidence. Moderate to low confidence was also noted for secondary outcomes and secondary analysis of medicine classes.
Conclusions The comparative effectiveness and safety of analgesic medicines for acute non-specific low back pain are uncertain. Until higher quality randomised controlled trials of head-to-head comparisons are published, clinicians and patients are recommended to take a cautious approach to manage acute non-specific low back pain with analgesic medicines.
Systematic review registration PROSPERO CRD42019145257
Acute low back pain (for less than six weeks’ duration) is a common presentation in primary care.1 Acute non-specific low back pain, in which a pathoanatomical cause of pain cannot be reliably determined, represents more than 90% of these presentations.2 Clinical practice guidelines recommend advice, reassurance, encouragement of physical activity, and self-management of symptoms as first line care.3 Second line care includes non-pharmacological interventions (eg, manual therapy) and analgesic medicines.3456 Surveys about primary care indicate many adults receive an analgesic medicine (48% in the UK and 61% in Australia).78
Clinicians who prescribe medicines for low back pain must choose between medicines with different analgesic properties and safety profiles. Systematic reviews that compared medicines with placebo only partially inform this decision.91011121314151617 A network meta-analysis combines direct and indirect information across a network of randomised clinical trials to estimate the comparative effectiveness of multiple treatments.18 This study type incorporates evidence from placebo controlled trials and trials of comparative effectiveness.19 A previous network meta-analysis compared the effectiveness of classes of analgesic medicines as part of a broader evaluation of pharmacological and non-pharmacological interventions.20 However, no comprehensive evaluation of individual medicines is available to inform clinical decision making for the best medicine for acute non-specific low back pain.2122
Our study used a network meta-analysis to evaluate the comparative effectiveness of analgesic medicines for adults with acute non-specific low back pain.
We followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses-network meta-analysis (PRISMA-NMA) statement for this article.23 This report is part of a larger project (PROSPERO CRD42019145257) evaluating analgesic medicines for low back pain. The published protocol appears in supplement 1,24 and protocol updates are in supplement 2a and 2b.
We included randomised controlled trials of adults (≥18 years) with acute non-specific low back pain.1 We included randomised controlled trials that compared an analgesic medicine with another analgesic medicine, placebo medicine, or no treatment (including continuation of usual care or being placed on a waitlist). We did not restrict our criteria by language or publication status. We excluded randomised controlled trials with enriched enrolment because this method violates the transitivity assumption.242526
We included non-steroidal anti-inflammatory drugs, paracetamol, opioids, anticonvulsants, antidepressants, skeletal muscle relaxants, or corticosteroids from the World Health Organization Anatomical Therapeutic Chemical system (supplement 2c).27 Medicines must have had a license for use in humans in 2021 by the US Food and Drug Administration,28 UK Medicine and Healthcare Products Regulatory Agency,29 European Medicines Agency,30 or Australian Therapeutic Goods Administration.31 We included additional licensed medicines in these classes that were identified during the review process. Medicines must have been administered systemically (eg, oral, intravenous, and intramuscular) as a single drug or combination formulations, at any dose. We excluded non-systemic administrations (eg, topical and epidural). Trials that used non-pharmacological co-interventions were included and were considered in the assessment of transitivity.24
We only included trials that assessed the effects of medicines that had been administered for a minimum of 24 h or, where single administration was used, outcomes at the end of treatment had to have been measured a minimum of 24 h later. This threshold excluded trials that tested the analgesic effect of medicines on immediate term outcomes only, which typically examined acute emergency care or experimental settings and is different to primary care.3233
We searched five electronic databases and three clinical trial registers (Medline, PubMed, Embase, CINAHL, the Cochrane Central Register of Controlled Trials, ClinicalTrials.gov, EU Clinical Trials Register, and the World Health Organization’s International Clinical Trial Registry Platform) from database inception until 20 February 2022. Full search strategies appear in supplement 2d. We also searched previous reviews and reference lists of included trials, which returned no additional records.
Two authors (MAW and one of MDJ, MCF, AGC, RRNR, HBL, ADH, or SSh) independently screened records by title and abstract and full text in Covidence.34 Authors were experienced with similar eligibility criteria17353637 and were trained for this review. Discrepancies were resolved through discussion and arbitration from a third author (JHM). If required, the corresponding author of the trial was contacted up to three times to determine record eligibility. All included records underwent linkage to establish unique trials.38
Outcomes and data extraction
Two authors (MAW and one of MDJ, MCF, AGC, RRNR, HBL, ADH, or SSh) independently extracted data from included trials into standardised spreadsheets, with discrepancies resolved through discussion. Authors were experienced with these extraction sheets.17353637
We extracted information on trial characteristics (country, setting, number of trial sites, sample size, duration), participants (diagnosis, duration of low back pain, numbers of men and women, pain intensity at baseline, comorbidities), interventions (medicine, route of administration, duration of intervention, dosage, usage of rescue medication, provision of usual care, co-interventions prescribed by trial investigators), and outcomes.
The primary outcomes were low back pain intensity (0-100 scale, values as integers) at the end of treatment, and safety (number of participants who had any adverse event during the treatment period).39 The end of treatment endpoint accounts for the different treatment durations of medicines. Secondary outcomes were low back specific function (0-100 scale, values as integers), harm (number of participants who had a serious adverse event during the treatment period),3940 and acceptability (number of participants who stopped participation in the trial for any reason before the end of treatment).41
For pain intensity and function, data from continuous self-reported scales were extracted at the time point closest to end of treatment. The hierarchy for extraction of data formats was (1) group mean and standard deviation at end of treatment, (2) group mean change from baseline and standard deviation, and (3) between group differences. Data from studies reporting multiple measures for pain intensity were prioritised as follows: 100 mm visual analogue scale, 10 cm visual analogue scale, 11 point numerical rating scale, rating scale from a composite measure, and ordinal scale.1736 Data from studies that reported multiple measures for function were prioritised similarly: Oswestry Disability Index,42 Roland Morris Disability Questionnaire,43 rating scale from a composite measure, ordinal scale.1736 Data for pain intensity and function were normalised to 0-100 scales before analysis to improve clinical interpretability.91044 Data presented in other forms (eg, median or standard error) were transformed.4546 If measures of variance were not reported and unobtainable, the median standard deviation value from included studies with low risk of bias was imputed (30/100 for pain intensity and 35/100 for function). The number of participants per group who had one or more events was extracted for safety, harm, and acceptability.
The corresponding author of a trial was contacted up to three times via email to request missing outcomes (eg, mean and standard deviation for pain intensity or function and number of participants who had adverse events) and demographic data (eg, age, sex, baseline pain intensity).
Risk of bias
Two authors (MAW and one of MDJ, MCF, AGC, RRNR, HBL, ADH, or SSh) independently appraised outcome level risk of bias using the Cochrane tool for assessing risk of bias in randomised trials (RoB 2).47 For each outcome, we assessed risk of bias across five domains: randomisation process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported result. We visualised risk of bias ratings using the robvis tool.48
Data synthesis and analysis
Evaluation of transitivity
Transitivity, the key assumption for valid estimation of indirect comparisons, was assessed before conducting analyses.184950 The distributions of prespecified effect modifiers were examined across network comparisons: baseline pain intensity (continuous), presence of co-interventions (binary), sample size (continuous),51 whether participants were required to be previously untreated to the test medicine (binary), and medicine dose (binary).24 Dose was classified as within or above the standard dosing range, sourced from the Prescriber’s Digital Reference,52 Monthly Index of Medical Specialties,53 or Australian Medicines Handbook.54 If unavailable, the licensed dosing range was used.
Measures of effect
We analysed comparisons of between group level mean and standard deviation values for pain intensity and function at end of treatment using mean difference with 95% confidence intervals on a 0-100 scale (values as integers). We also analysed comparisons of between group level event rates for safety, harm, and acceptability by risk ratio with 95% confidence intervals. Effects were considered statistically significant when the 95% confidence interval did not cross the null. For pain intensity and function, between group differences were considered small if 5-10 points, moderate if more than 10-20 points, and large if more than 20 points.5556 Confidence in the effect estimates was judged using Confidence in Network Meta-Analysis (CINeMA),5758 which considered six domains: trial level risk of bias, reporting bias, indirectness, imprecision, heterogeneity, and incoherence. Descriptions of how we considered each domain are available in our protocol.24
We performed a random effects network meta-analysis using the netmeta package in R, which implements a frequentist method based on a graph theoretical approach, according to electrical network theory.5960 The method follows a two stage approach, in which study effect estimates and their variances are synthesised and weighted by the inverse of their variance. We assumed a common heterogeneity variance across the network for each outcome, which was added to each comparison of the network and estimated via the generalised DerSimonian-Laird method of moments estimator.6162 Dependent observations from trials with more than three groups were accounted for with a back calculation of variances.59 Results from the network meta-analysis were presented as summary relative effect sizes (mean difference or risk ratio) along with 95% confidence intervals, derived assuming a normal distribution of the effects, for each possible pair of treatments. We calculated P scores (the frequentist equivalent of the surface under the cumulative ranking curve (SUCRA)) to measure the extent of certainty that a treatment is better than any other treatment.63 Estimates of heterogeneity and the proportion of variability that was not due to sampling error were calculated for each comparison. Statistics were also calculated for heterogeneity across the network, within designs, and between designs. We evaluated coherence (statistical agreement between direct and indirect treatment effects in closed loops)19 by use of these heterogeneity statistics, and complemented with the design by treatment interaction model,6465 the net heat plot,66 and the Separating Indirect from Direct Evidence (node splitting) approach.67 Small trial effects were evaluated using comparison adjusted funnel plots, with reference to placebo (supplement 3).68
The nodes for the primary analysis of each outcome were defined at the level of the medicines. Each single drug or combination formulation was a separate node. We considered licensed sustained release formulations as separate nodes to conventional formulations of the same medicine. Different routes of administration for the same medicine (or combination) were merged into the same node. Where trials reported more than one intervention group within the same dosing range, we combined the outcome data.46
The secondary analysis considered classes of medicines as separate nodes in the network for each outcome. Medicines were combined into classes based on expertise of the author team, clinical guidelines, and previous reviews91011121314151617 (supplement 2a).
Prespecified sensitivity analyses of the primary outcomes (pain intensity and safety) assessed the effect of removing trials with overall high risk of bias, removing medicines with dosages above the standard or licensed dosing range, removing groups with baseline pain intensity above 70/100, removing trials with total sample sizes of fewer than 50 participants, and removing trials where data were imputed. These analyses were done where the network structure remained the same as the primary analysis. We also conducted a post hoc sensitivity analysis in which we removed two trials that were published in predatory journals, with concerns for research integrity (supplement 2b). We were asked during peer review to perform a post hoc sensitivity analysis on industry sponsorship.
Patient and public involvement
This study did not involve any patient representatives or members of the public in a formal capacity. As a result of limited funding, we were not able to engage with consumer groups and the review protocol was drafted before the involvement of patients and the public in reviews became standard practice. The review team provided the results of the review to their clinical colleagues and individuals from the general public with whom they had personal relationships. The team sought informal feedback from these individuals based on their experiences with low back pain as either patients or clinicians.
We identified 154 eligible records corresponding to 124 eligible trials. Twenty six trial registrations were noted as terminated, ongoing, or unknown. Therefore, we included 98 randomised controlled trials published between 1964 and 2021 (fig 1). The 1300 records excluded during full text screening are provided in supplement 2e. The 98 included trials (n=15 134 participants) evaluated 70 unique interventions (69 medicines or combinations, and placebo; supplement 2f). No trials included a no treatment group.
Participant characteristics (table 1) reflected typical acute non-specific low back pain populations: 49% women, mean age mostly between 30 and 60 years, low back pain duration ranged from 24 h to 21 days, and median pain intensity at baseline of 65/100 (interquartile range 57-72) across included trials. Thirty eight (39%) of 98 trials were placebo controlled, 66 trials (67%) masked both participants and clinicians, and 40 trials (41%) reported industry sponsorship. Analyses on industry sponsorship are reported in supplement 2. Characteristics about participants, interventions, and outcomes are available in supplement 2g and 2h. Characteristics about trial registrations noted as terminated, ongoing, or unknown are available in supplement 2i.
Forty two medicines were administered as a monotherapy and 27 as combinations (supplement 2f). Treatment duration ranged from one day (single administration) to 42 days. Eighty (82%) of 98 trials administered medicines orally, and 168 (98%) of 172 medicines were administered within a standard or licensed dosing range (table 1). Two trials6970 reported two or more intervention groups within the same dosing range that we combined.
Assessment of transitivity and incoherence
A comprehensive assessment of transitivity was limited by the small number of trials per comparison (supplement 2j). During the evaluation of network diagnostics, four trials were identified that had methodological discrepancies inconsistent with the network (two based on incoherence within the network and two based on heterogeneity within treatment comparisons) and were removed from all analyses (supplement 2k). We then re-evaluated the diagnostics for the updated models and agreed to proceed to interpreting treatment estimates. However, some comparisons show evidence of unexplained incoherence and, therefore, should be interpreted with caution. Important examples are the network meta-analysis effects for the outcome pain intensity for the comparisons ibuprofen versus placebo and paracetamol versus placebo, in which discrepancies between direct and indirect evidence resulted in a P<0.10 for the Separating Indirect from Direct Evidence approach. The full network diagnostics for the updated models are presented in supplement 3. Any remaining concerns about network heterogeneity and incoherence were addressed via downgrading confidence in estimates. A summary of confidence in effect estimates is provided in supplement 2l. Common reasons for downgrading confidence in estimates were imprecision, heterogeneity, and risk of bias. League tables with estimates and confidence for all comparisons are provided in the supplement and spreadsheets are available on the Open Science Framework.
Primary analysis: nodes as medicines
Pain intensity was measured in all 98 trials. Three trials measured pain intensity only during movement. Pain intensity was measured with a 100 mm visual analogue scale (23 trials), a 10 cm visual analogue scale (16 trials), a 11 point numerical rating scale (seven trials), or another ordinal scale (24 trials). Data for pain intensity were analysed in 66 (67%) of 98 trials. Ten trials were at low risk of bias, 36 trials had some concerns, and 20 trials were at high risk of bias (supplement 2m). Endpoint data were reported in 50 trials and changes from baseline were reported in 16 trials. Fifteen trials (23%) required standard deviation imputation. Pain intensity data were transformed in 16 trials: 12 used count data, two used 95% confidence interval for group mean, one used median, one used range. The 66 trials did not form a connected network (fig 2, fig 3). The placebo network compared 39 interventions (38 medicines and the central node of placebo) in 54 trials (fig 2). Most comparisons consisted of a single trial, ranging from one to three, and had a limited number of closed loops. Direct evidence was available for 52 (7%) of 741 comparisons. The naproxen network compared 13 medicines in 10 trials (the central node was naproxen) with one trial per comparison (fig 3). Direct evidence was available for 14 (18%) of 78 comparisons. Two trialswere not included in either network because these trials do not connect to any part of the network).
Data were of very low confidence in 648 (87%) of 741 comparisons and of low confidence in 93 (13%) of 741 of comparisons in the placebo network (supplement 2l). Tolperisone (mean difference −26.1 (95% confidence interval −34.0 to −18.2), low confidence), aceclofenac plus tizanidine (−26.1 (−38.5 to −13.6), very low confidence), and pregabalin (−24.7 (−34.6 to −14.7), low confidence) might be associated with the largest reductions in pain intensity compared with placebo (fig 4). Additionally, for statistically significant reductions, very low confidence was reported for large reductions (mean difference of >20 points) for four medicines, moderate reductions (>10-20 points) for seven medicines; and small reductions (5-10 points) for three medicines (fig 4). The estimates of comparative effectiveness and rankogram are in supplement 2n. No significant differences were noted between all medicines with large reductions in pain intensity compared with placebo, with data low or very low confidence. Similarly, low or very low confidence in evidence was reported for no significant differences between the medicines with large reductions in pain intensity and some medicines with moderate reduction in pain intensity compared with placebo. Some significant differences between medicines were noted; for example, low confidence data suggested that tolperisone is superior to carisoprodol at reducing pain intensity (mean difference −13.7 (−24.9 to −2.5)).
Confidence could not be evaluated for the naproxen network because of the small number of trials. Six medicines might be associated with a statistically significant reduction in pain intensity compared with naproxen (supplement 2o). The estimates of comparative effectiveness and rankogram are in supplement 2p. Sensitivity and post hoc analyses for pain intensity with nodes as medicines are reported in supplement 2q.
Ninety two trials reported measuring safety, but only 68 trials (74%) were analysed for the number of participants who reported an adverse event. The primary reasons for data unavailability were reports of only numbers of adverse events, rather than number of participants, or no data for the subset of participants with acute non-specific low back pain. Nine trials were at low risk of bias, 41 trials had some concerns, and 18 trials were at high risk of bias (supplement 2r). One network compared 55 interventions (54 medicines and placebo) in 66 trials (fig 5), and two trials did not connect to the network. All comparisons in the network consisted of a one or two trials and the number of closed loops was small. Direct evidence was available for 70 (4.7%) of 1485 comparisons. Effect estimates were analysed as risk ratios.
Comparisons were of very low confidence in 34 (2%) of 1485, low confidence in 1274 (86%) of 1485, moderate confidence in 168 (11%) of 1485, and high confidence in nine (1%) of 1485 (supplement 2l). Tramadol (risk ratio 2.6 (95% confidence interval 1.5 to 4.5), moderate confidence), paracetamol plus sustained release tramadol (2.4 (1.5 to 3.8), moderate confidence), baclofen (2.3 (1.5 to 3.4), low confidence), and paracetamol plus tramadol (2.1 (1.3 to 3.4), moderate confidence) might be associated with increased adverse events during treatment compared with placebo (fig 6). The estimates of comparative effectiveness and rankogram are provided in supplement 2s. Data had high to very low confidence that these four medicines were also more likely to increase adverse events compared with other medicines. For example, moderate confidence data suggested that tolperisone was associated with fewer adverse events than tramadol (0.2 (0.1 to 0.7)) and high confidence data suggested that paracetamol was associated with fewer adverse events than paracetamol plus sustained release tramadol (0.4 (0.2 to 0.6)).
The quality of adverse event measurement and reporting varied across trials. Generally, trials did not distinguish between an adverse event (an untoward medical occurrence) and an adverse effect (an untoward medical occurrence judged as related to treatment). Brief descriptions of adverse events reported in each trial are available in supplement 2h. Most commonly reported adverse events were related to the gastrointestinal system (nausea, dyspepsia, vomiting, diarrhoea) and the nervous system (drowsiness, dizziness, headache). Sensitivity and post hoc analyses for safety with nodes as medicines are reported in supplement 2t.
Secondary analysis: medicine classes
The secondary analysis of medicine classes included 65 trials (n=1107 participants; 33 trials that only compared medicines within the same class, primarily non-selective non-steroidal anti-inflammatory drugs, were excluded). A list of the 22 interventions (21 different classes or combinations, and placebo) is available in supplement 2u.
Pain intensity was analysed in 45 trials. One network compared 16 interventions (15 classes and placebo) in 44 trials, and one trial did not connect to the network (supplement 2v). Direct evidence was available for 22 (18%) of 120 comparisons. Of the 120 comparisons, evidence was of very low confidence in 114 (95%), low confidence in five (4%), and moderate confidence in one (1%) (supplement 2l).
Anticonvulsants (mean difference −18.6 (95% confidence interval −30.1 to −7.1), very low confidence), non-benzodiazepine antispasmodic (−14.3 (−18.8 to −9.7), very low confidence), non-selective non-steroidal anti-inflammatory drugs plus non-benzodiazepine antispasmodic (−12.7 (−17.9 to −7.5), very low confidence), non-selective non-steroidal anti-inflammatory drugs plus strong opioids plus paracetamol (−13.1 (−25.0 to −1.1), low confidence), non-selective non-steroidal anti-inflammatory drugs plus antispastic (−13.1 (−25.5 to −0.7), low confidence), non-selective non-steroidal anti-inflammatory drugs plus anticonvulsants (−12.3 (−23.3 to −1.3), very low confidence) might be associated with the moderate reductions in pain intensity compared with placebo (fig 7). The estimates of comparative effectiveness and rankogram are in supplement 2w. Very low confidence was shown for no statistically significant differences between any of the medicine classes that reduced pain intensity compared with placebo. Some differences between classes were noted; for example, evidence showed very low confidence that anticonvulsants were superior to weak opioids for reducing pain intensity (−14.5 (−28.7 to −0.4)). Sensitivity and post hoc analyses for pain intensity with nodes as medicine classes are reported in supplement 2x.
Safety was analysed in 46 trials. One network compared 19 interventions (18 classes and placebo) in 45 trials, and one trial did not connect to the network (supplement 2y). Direct evidence was available for 27 (16%) of 171 comparisons. Of 171 comparisons, seven (4%) were of very low confidence, 109 (64%) were of low confidence, 50 (29%) were of moderate confidence, and five (3%) were of high confidence (supplement 2l).
Compared with placebo, increased adverse events during treatment might be associated with antispastic drugs (risk ratio 2.3 (95% confidence interval 1.4 to 3.8), low confidence), weak opioids (1.9 (1.3 to 2.9), moderate confidence), non-selective non-steroidal anti-inflammatory drugs plus strong opioids plus paracetamol (1.9 (1.1 to 3.2), high confidence), weak opioids plus paracetamol (1.9 (1.3 to 2.7), moderate confidence), and non-selective non-steroidal anti-inflammatory drugs plus non-benzodiazepine antispasmodic (1.5 (1.1 to 2.1), moderate confidence) (supplement 2z). The estimates of comparative effectiveness and the rankogram are in supplement 2aa. Findings were of high to very low confidence that these classes were also more likely to increase adverse events compared with the other classes. For example, non-selective non-steroidal anti-inflammatory drugs plus strong opioids plus paracetamol were of high confidence and was associated with more adverse events than paracetamol (2.2 (1.2 to 4.4)). Sensitivity and post hoc analyses for safety with nodes as medicine classes are reported in supplement 2ab.
We also analysed secondary outcomes with nodes as medicines and nodes as medicine classes. We did not perform sensitivity and post hoc analyses for the secondary outcomes. Results for function are reported in supplement 2ac (nodes as medicines) and supplement 2ad (nodes as medicine classes). Results for acceptability are reported in supplement 2ae (nodes as medicines) and supplement 2af (nodes as medicine classes). Results for harm are reported in supplement 2ag.
Our review of analgesic medicines for acute non-specific low back pain found considerable uncertainty around effects for pain intensity and safety. The findings were of low or very low confidence that several medicines might be associated with large reductions in pain intensity compared with placebo, and some medicines might be more effective than other medicines. Several other medicines might be associated with an increased risk of adverse events compared with placebo, as well as compared with other medicines. In the secondary analysis of medicine classes, low or very low confidence evidence showed that seven classes might be associated with small to moderate reductions in pain intensity compared with placebo, with no statistically significant differences between these classes. However, low confidence showed that two of these classes increased the risk of adverse events compared with placebo.
Implications for clinicians and policy makers
Judgements of low or very low confidence in this review warrant caution for the clinical interpretation of these effects, which might change markedly with future research. Most effects were derived solely from indirect evidence and the findings were not robust to sensitivity analyses, with many effects becoming non-significant after the removal of trials based on different methodological considerations (eg, risk of bias). Similar findings of moderate to large effects for pain intensity but low confidence have been reported for several non-pharmacological interventions used for acute non-specific low back pain: superficial heat, massage, manual therapy, and acupuncture.2 Similar levels of uncertainty were identified in a network meta-analysis published in 2022 of 46 randomised controlled trials that compared pharmacological and non-pharmacological interventions for acute and subacute low back pain.20
Clinical practice guidelines recommend non-pharmacological treatments in first line and second line care for acute non-specific low back pain.3 Given the favourable natural history for most patients,71 we believe that clinicians and patients should take a cautious approach to the use of analgesic medicines. Similarly, policy makers should recommend a cautious approach when considering analgesic medicines, prioritising the minimisation of harm. Another consideration for clinicians and guideline developers is the legal availability of medicines. We included medicines licensed across the UK, Australia, USA, and Europe, which might not include medicines licensed in other countries and does not imply that the same medicines are available everywhere. Our estimates of comparative effectiveness suggest no differences between several medicines that were superior to placebo, meaning clinicians can incorporate our findings, a medicine’s availability, clinical expertise, and patient preferences when choosing an analgesic medicine.
Strengths and limitations
We believe that this review is the most comprehensive in the field. We preregistered and published the protocol and made our updates transparent. Our comprehensive search included published and unpublished literature in any language. Our rigorous method ensured as much data as possible were included, with scrutiny by an expert team. We closely examined network diagnostics to explore network heterogeneity, inconsistency, and incoherence (steps that are not often adequately undertaken)72 and we attempted to resolve these issues when they arose. However, this study has limitations. Firstly, we aspired to select a sample reflective of acute non-specific low back pain, but patients might differ across clinical settings.32 Secondly, most included studies had concerns related to risk of bias. Thirdly, data were missing and imputation was required for continuous outcomes, despite attempts to contact authors. Fourthly, no network meta-analysis methods can account for the uncertainty of variance estimates (analogous to the Hartung-Knapp approach for pairwise meta-analysis) and we were unable to thoroughly explore the influence of potential effect modifiers (eg, treatment duration, route of administration) because of the limited data and poor network structure. Finally, adverse event data in some trials were reported in a way that made them unable to be included in this study. In future trials, we encourage investigators to report the number of participants who had any adverse event, as well as type and severity of those adverse events.
The evidence base includes many different analgesic medicines or combinations, mostly compared to placebo. Relatively few randomised controlled trials evaluate comparative effectiveness. The structure of this information is not yet optimal to inform clinical decision making and the potential for network meta-analysis to contribute improved estimates of effects was under-realised. Most estimates were derived solely from indirect evidence, a key contributor to the low or very low confidence. Confidence was not substantially improved in the secondary analysis.
Other aspects of trial conduct might be improved in future work. Key limitations were moderate to high risk of bias and missing data, which have established influences on effect estimates.51 Analgesic medicines with larger effect sizes came from trials with lower methodological quality. Similarly, wide confidence intervals often arose from smaller studies. This uncertainty is propagated when networks make many comparisons via indirect evidence only. Concerns exist about research integrity and the large decrease in pain intensity from pregabalin was no longer apparent in sensitivity analyses. Synthesis of the trials at low risk of bias was not possible with conventional methods for network meta-analysis because they did not form a connected network. Our review, together with established methods for future trial design,7374 might be an important guiding contribution to further research. We identified 10 ongoing trials that could contribute additional data to future updates of this study, described briefly in supplement 2i. No further reviews are needed until high quality randomised controlled trials are published.
Despite nearly 60 years of research involving more than 15 000 patients, high quality evidence to guide clinical decisions on analgesic medicines for acute non-specific low back pain remains limited. Similarly, evidence from the secondary analysis of medicine classes had low confidence. Clinicians and patients are advised to take a cautious approach to the use of analgesic medicines. No further reviews are needed until high quality studies are published.
What is already known on this topic?
Analgesic medicines are a common treatment for acute non-specific low back pain
Previous reviews have evaluated analgesic medicines compared with placebo, but the evidence for the comparative effectiveness of these medicines is limited
What this study adds
Low or very low confidence evidence suggests that some analgesic medicines might be superior for reducing pain intensity, limited by trial risk of bias and imprecision in effect estimates
Evidence of moderate to very low confidence suggests that some analgesic medicines might increase the risk of adverse events during treatment
Clinicians and patients are recommended to take a cautious approach to managing acute non-specific low back pain with analgesic medicines until higher quality trials of head-to-head comparisons are published