Home About us Articles Multimedia Search Instructions Login 
IF 2017: 1.596 (® Clarivate Analytics)
Total Cites: 7606
Q2 in Medicine, General & Internal
Follow Us
Follow Us
  • Users Online: 1618
  • Home
  • Print this page
  • Email this page

 Table of Contents  
Year : 2018  |  Volume : 131  |  Issue : 2  |  Page : 247-248

Reliability and Accuracy of Sepsis-related Organ Failure Assessment Scoring among Emergency Physicians

Emergency Department, Beijing Chao-Yang Hospital Affiliated to Capital Medical University, Beijing 100020, China

Date of Submission29-Sep-2017
Date of Web Publication08-Jan-2018

Correspondence Address:
Prof. Shu-Bin Guo
Emergency Department, Beijing Chao-Yang Hospital Affiliated to Capital Medical University, Beijing 100020
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/0366-6999.222338

Rights and Permissions

How to cite this article:
Chen YX, Li YX, Guo SB, Mei X. Reliability and Accuracy of Sepsis-related Organ Failure Assessment Scoring among Emergency Physicians. Chin Med J 2018;131:247-8

How to cite this URL:
Chen YX, Li YX, Guo SB, Mei X. Reliability and Accuracy of Sepsis-related Organ Failure Assessment Scoring among Emergency Physicians. Chin Med J [serial online] 2018 [cited 2018 Oct 23];131:247-8. Available from: http://www.cmj.org/text.asp?2018/131/2/247/222338

To the Editor: With the growing importance of Sepsis-related Organ Failure Assessment (SOFA),[1] clinicians are increasingly concerned about its reliability and accuracy. In Intensive Care Units, SOFA scoring was found to be insufficiently accurate among both physicians and nurses.[2],[3] Few studies have reported the performance of emergency department (ED) physicians in manually assigning the SOFA score. We therefore conducted a single-center prospective investigation to examine that issue.

The study was conducted in May 2017 in the ED of Beijing Chao-Yang Hospital affiliated to Capital Medical University, Beijing, China. This investigation assessed the performance of ED physicians in SOFA scoring using hypothetical cases after a training program. The study did not involve actual patients, and there were no ethical considerations.

Thirty cases were developed by one of the authors (Yun-Xia Chen). The cases covered the most common diseases encountered in the ED including sepsis, shock, stroke, severe hemorrhage, acute heart failure, acute coronary syndrome, and acute pancreatitis. The mean age of these cases was 56 ± 17 years and male was 73%. The mean score values of SOFA and Acute Physiology and Chronic Health Evaluation (APACHE) II score were 9 ± 3 and 22 ± 9, respectively. The 30 cases were scored by two of the authors with rich experience in SOFA scoring (Yun-Xia Chen and Yi-Xian Li). The raw values for the 12 variables, scores for the six organ systems, and total SOFA scores were defined as the gold standard. Differences in the results were discussed by those two authors to achieve consensus over the standard.

For the participants, the raw values and scores of all the 12 variables, scores of all six organ systems, total SOFA score, and errors in scoring were recorded. A major error was defined as the total SOFA score deviating 2 or more points from the gold standard.

We analyzed all the data using SPSS, version 18.0 software (SPSS Inc., Chicago, IL, USA). The normally distributed data were expressed as means and standard deviations (SDs). We estimated the reliability of the total SOFA score by calculating the intraclass correlation coefficient (ICC) and the 95% confidence interval (CI) based on a single-measure, one-way random-effects model.[4] To evaluate the accuracy of the total SOFA score, agreement between participants' scores and the gold standard was assessed using Bland–Altman plots. It is recommended that 95% of data points should lie within ±1.96 SD of the mean differences.

In all, 40 ED physicians participated in the training and investigation including 18 residents (45%) and 22 specialists (55%). The mean age of participants was 37 ± 6 years, the mean working years was 11 ± 8 years, and male was 77.5%. Of the participants, 80% had experience with APACHE II scoring and 37.5% with SOFA scoring.

The ICC of the total SOFA score was 0.873 (95% CI, 0.805–0.928). The accuracy of the raw data value was moderate (72–84%) for PaO2/FiO2, respiratory support, Glasgow Coma Scale (GCS) score, and urine output (UO); it was excellent (>90%) for all other variables. The accuracy in assigning scores for the individual variables was good (80–82%) for the GCS score and UO and excellent (≥94%) for the others. The accuracy of scores for the individual organ systems was good (80–88%) for respiration, central nervous system, and renal systems and excellent (≥90%) for coagulation, liver, and cardiovascular systems. The accuracy of the total SOFA score was 54%, and there was no difference between residents and specialists (53% vs. 55%, P = 0.785).

[Figure 1] shows the Bland–Altman plots of the total SOFA scores for the gold standard and participants. The mean bias between the participants' and the gold standard total SOFA scores was 0.480 ± 1.899 (95% CI, –3.2 to 4.2). Of all the total SOFA scores, 243 (97%) were within the 95% CI of the mean difference [Figure 1]a and [Figure 1]b. Bias occurred in both the low and high values of the total SOFA score [Figure 1]b.
Figure 1: Bland†“Altman plot of the participants versus the gold standard resulted in a mean difference of 0.480 ± 1.899 (95% confidence interval, −3.2 to 4.2; n = 40): (a) distribution of differences according to case identification number; (b) distribution of differences according to the gold standard of the total SOFA score. Note: Plotted values frequently represent more than one case. SOFA: Sepsis-related Organ Failure Assessment; SD: Standard deviation.

Click here to view

In total, 258 errors were found. The first three most frequent errors were not using the worst value for 24 h, incorrect GCS score, and extrapolating UO per 24 h. Infrequent errors included calculating errors, using incorrect variables, failure to consider all variables in a single-organ system, and clerical errors. There were 51 (21%) major errors with the total SOFA score.

This investigation found that for the total SOFA score, reliability was good but the accuracy was not ideal. Human factors are the main influence on accuracy in SOFA scoring. Several studies also found good reliability, with the ICC ranging from 0.870 to 0.889, and poor accuracy (48–53%) with the total SOFA score.[2],[3],[5]

In our investigation, the ICC of the total SOFA was 0.873, and 97% of the differences were within the 95% CI of mean difference. Statistically, this result indicates good reliability of the total SOFA score among our participants. However, we also observed that the 95% CI was relatively broad, ranging from –3.2 to 4.2. Together with the low accuracy of the total SOFA score (54%), we considered that these variations would not significantly impact the prognostic value of SOFA, but they would essentially decrease its accuracy in diagnosing sepsis. Since the SOFA score acutely increasing by 2 or more points is defined as sepsis, variations in total SOFA score between raters should be less than that range. In our study, the major error rate in the total SOFA score was 21%; that includes scores both greater and less than the gold standard. If that situation occurred in clinical practice, overdiagnosis and misdiagnosis of sepsis would be the consequence, and that could result in unfavorable outcomes. As a diagnostic criterion, SOFA scoring should be highly precise. Systematic, rigorous, repeated training is the conventional means of improving the accuracy of SOFA scoring. Automated SOFA is attractive because of advantages in time saving, reliability, accuracy, and frequency.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.

  References Top

Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the working group on sepsis-related problems of the European Society of Intensive Care Medicine. Intensive Care Med 1996;22:707-10. doi: 10.1007/s001340050156.  Back to cited text no. 1
Arts DG, de Keizer NF, Vroom MB, de Jonge E. Reliability and accuracy of Sequential Organ Failure Assessment (SOFA) scoring. Crit Care Med 2005;33:1988-93. doi: 10.1097/01.CCM.0000178178.02574.AB.  Back to cited text no. 2
Baykara N, Gökduman K, Hoşten T, Solak M, Toker K. Comparison of Sequential Organ Failure Assessment (SOFA) scoring between nurses and residents. J Anesth 2011;25:839-44. doi: 10.1007/s00540-011-1232-2.  Back to cited text no. 3
Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 2016;15:155-63. doi: 10.1016/j.jcm.2016.02.012.  Back to cited text no. 4
Tallgren M, Bäcklund M, Hynninen M. Accuracy of Sequential Organ Failure Assessment (SOFA) scoring in clinical practice. Acta Anaesthesiol Scand 2009;53:39-45. doi: 10.1111/j.1399-6576.2008.01825.x.  Back to cited text no. 5


  [Figure 1]


Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

  In this article
Article Figures

 Article Access Statistics
    PDF Downloaded59    
    Comments [Add]    

Recommend this journal