Single-Case Analysis and Review Framework (SCARF):
Coding Instructions

SCARF_Overview

Overview

To download the SCARF spreadsheet, click here.

The SCARF is intended as a tool to assess the quality and outcomes of single case design studies. For the purposes of the tool, “study” refers to any single design, which may include a single participant (e.g., A-B-A-B designs) or multiple participants (e.g., multiple baseline design across participants). Each design should be evaluated separately, even if multiple designs are present in a single article.

Note: This tool is designed for the assessment of groups of articles for the purposes of answering the question: To what extent are studies sufficient and to what extent are outcomes consistent and replicated for Intervention X for changing Behavior Y for Participants with Z inclusion characteristics? Studies should be included or excluded based on your research questions.

Section 1 (Rigor): The first three components (and most highly weighted) are regarding the believability and sufficiency of the dependent variables (reliability), procedures (fidelity), and data.

Section 2 (Quality and Breadth of Measurement): The next seven components are ratings regarding important components regarding author descriptions necessary for replication (participant, condition, and dependent variables), presence of social and ecological validity indicators, and measurement of maintenance and generalization (response or stimulus generalization).

Section 3 (Outcomes): The next three coding components are regarding the primary outcomes, generalization outcomes, and maintenance outcomes.

Scoring

Scores (automatically populated and shown in graphs) are calculated based on the following formulas:

Graph #1: Primary Outcomes

  • Overall Quality & Rigor (range: 0-4) = [2 x (average rigor score) + (average quality and breadth of measurement score)]/3
  • Primary Outcomes (range 0-4) = Score coded by reader, based on visual analysis

Graphed scores generated from hypothetical coding results for primary outcomes:

SCARF_Graph 1


Graph #2: Generalized Outcomes

  • Quality & Rigor of Generalization Measurement (range 0-4) = Higher value of generalization scores coded by reader (max SG4,RG3) based on timing and consistency of response and/or stimulus generalization measurement
  • Generalized Outcomes (range 0-4) = Score coded by reader, based on consistency of generalized effects and confidence in effects based on measurement occasions (e.g., 3 coded for consistent positive effects with pre- and post-tests; 4 coded for consistent positive effects in context of single case design).

Graphed scores generated from hypothetical coding results for generalized outcomes:

SCARF_Graph 2


Graph #3: Maintained Outcomes

  • Quality & Rigor of Generalization Measurement (range 0-4) = Score coded by reader (M3) based on timing of maintenance measurement and number of measurement occasions
  • Maintained Outcomes (range 0-4): Score coded by reader, based on consistency of maintained effects and confidence in effects based on measurement occasions

Graphed scores generated from hypothetical coding results for maintained outcomes:

SCARF_Graph 3

 

Data Analysis

Now that you have completed the spreadsheet for all studies, the primary, generalization, and maintenance graphs should auto-populate in the graphs on the “Scores” tab. Each data point on a given graph corresponds to one design within a study. When analyzing your graphs, consider the following: the farther along the x-axis the data point is, the higher the quality and rigor of that portion of the study (i.e., primary, generalization, or maintenance). The further along the y-axis the data point is, the stronger the positive effects of the intervention. For an example, see how we analyzed the results of our hypothetical data below.

EBIP_SCARF_Overall Graph

These are the results for primary data measurement with hypothetical data. The green quadrant indicates studies with high quality evidence of positive effects (N = 3). The blue quadrant indicates studies with high quality evidence of minimal or negative effects (N = 1). The orange quadrant indicates studies with low quality evidence of positive effects (N = 4). The red quadrant indicates studies with low quality evidence of minimal or negative effects (N = 2). Given the data in the green and blue quadrants, we have some confidence that positive effects exist, but possibly with some limitations (given the score of “2” in one study). Because the remaining six studies had low quality and rigor, their results should be accepted with minimal confidence. It is interesting to note the relative linearity of scores, such that the higher quality studies had more consistent positive results. These results suggest additional high quality studies are needed.

EBIP_SCARF_Generalization Graph

These are the results for generalization measurement with hypothetical data. The green quadrant indicates studies with high quality evidence of positive effects (N = 0). The blue quadrant indicates studies with high quality evidence of minimal or negative effects (N = 0). The orange quadrant indicates studies with low quality evidence of positive effects (N = 3). The red quadrant indicates studies with low quality evidence of minimal or negative effects (N = 5). There were also two studies whose data points showed evidence of minimal or negative effects with moderate quality of measurement. Given the lack of data points in the green and blue quadrants, there were no studies with high enough quality and rigor to accept any results with confidence. Thus, even though there was some confidence in results for primary outcomes, more high-quality measurement of generalization is needed.

EBIP_SCARF_Maintenance Graph

These are the results for maintenance measurement with hypothetical data. The green quadrant indicates studies with high quality evidence of positive effects (N = 4). The blue quadrant indicates studies with high quality evidence of minimal or negative effects (N = 2). The orange quadrant indicates studies with low quality evidence of positive effects (N = 2). The red quadrant indicates studies with low quality evidence of minimal or negative effects (N = 0). Given that most studies showed positive results for maintenance (8/10), including some high quality studies, we can be confident that this intervention generally results in outcomes that are maintained after the intervention is withdrawn. Additional descriptive analysis of the study resulting in poor outcomes might be helpful (e.g., what is different about the intervention, contexts, participants, etc. of this study).

Additional Documents:

SCARF Spreadsheet
SCARF Coding Instructions (Printable)

To cite the SCARF (APA 6th Edition):

  • Ledford, J. R., Lane, J. D., Zimmerman, K. N., Chazin, K. T., & Ayres, K. A. (2016, April). Single case analysis and review framework (SCARF). Retrieved from: http://vkc.mc.vanderbilt.edu/ebip/scarf/