How We Rate Programs

August 29, 2025

In 2025, CrimeSolutions changed the way in which programs are rated. Newly rated or re-reviewed programs now are assigned ratings for specific outcomes instead of a single overall rating. This page reflects this new process. See How We Rated Programs Prior to 2025 to review the former process.

To be included on CrimeSolutions, programs undergo a multi-step review and evidence-rating process. Programs in CrimeSolutions now are rated by outcome – a single program may receive several ratings, which may differ, based on the outcomes evaluated [1] and selected for rating.

1. Identify Programs

We identify programs for potential inclusion on CrimeSolutions through:

Literature searches of relevant databases, journals and publications, including:
- Social science databases using keywords in the areas of criminal justice, juvenile justice, and victims of crime.
- Journals and other relevant resources.
- Databases of effective programs.
- Meta-analyses of evaluated programs.
A prioritizing algorithm that assists in screening research to identify potential programs for inclusion.

Historically, for every article reviewed:

8% resulted in an identified program.
3% resulted in a review.
1% resulted in a finding of inconclusive evidence.
2% resulted in a rating.

2. Screen Programs

Research staff review program materials to determine if the goals of the program fall within the scope of CrimeSolutions. To fall within the scope, the program must aim to:

Prevent or reduce crime, delinquency or related problem behaviors (such as aggression, gang involvement, or school attachment).
Prevent, intervene, or respond to victimization.
Improve justice systems or processes.

If a prevention program is not explicitly aimed at reducing or preventing a problem behavior, it must target a population of persons committing or convicted of a crime or an at-risk population (that is, individuals who have a higher potential of becoming involved in the justice system).

Historically, for every program identified:

34% resulted in a review.
20% resulted in a rating.
45% were screened out.
20% are put on hold.

See a list of screened-out program evaluations.

3. Search Literature for Program Background Information

Research staff expand the search for evaluations, research, and program materials to identify all relevant information needed for senior researcher and study reviewer consideration. Nonexperimental, qualitative, ethnographic, and case-study research is collected if it adds contextual information to the program description but is not used to determine the program's evidence rating.

4. Screen Studies

Research staff review identified studies against CrimeSolutions minimum requirements for review, which include:

The program must be evaluated with at least one randomized field experiment or quasi-experimental research design (with a comparison condition).
The outcomes assessed must relate to crime, delinquency, or victimization prevention, intervention or response.
At least one behavioral outcome must be assessed.
The evaluation must be published in a peer-reviewed publication or documented in a comprehensive evaluation report.
The date of publication must be 2000 or later.

For programs with evidence meeting the minimum criteria, research staff identify the program’s goals before the senior researcher selects the evidence base and outcomes for review.

Based on available resources, we place some identified program studies that have met the minimum standards of evidence for CrimeSolutions “on hold.” See Programs Held for Future Consideration.

5. Select Evidence Base

A senior researcher with subject-matter and research method expertise selects the studies representing the most rigorous study designs and methods from all available evaluations of the program. Although there is no limit on the number of studies that can be used to rate a program’s outcomes the senior researcher ensures, for each study, that –

Appropriate statistical comparisons between treatment and control groups were conducted in the studies.
Outcomes are reported in a manner that allows for the study reviewers to assess the quality of the results (for example, providing data to calculate an effect size).
Outcomes for the full treatment sample are reported in the studies (studies that provide outcomes only for subgroups are not eligible for review).

In selecting the studies to be used in the review, the senior researcher also considers the following in determining the rigor of the study:

Strength of research design
Validity of outcome measurement
Breadth of documentation
Type of analytic procedures used
Sample size
Independence of evaluator

The selected studies comprise the program's evidence base and will be scored by study reviewers and used as the basis for the evidence ratings of the selected outcomes. Additional studies identified through the literature search, but not included in the evidence base, may serve as supporting documentation.

If multiple articles and publications report on various aspects of a single study, they are generally treated as one study for purposes of the review. However, two studies that use the same data set but include different follow-up periods, statistical analyses, or outcomes may be considered as separate.

6. Select Outcomes

Senior researchers select all relevant outcomes with consideration of the intent or goals of the program. The outcomes must fall within the scope of CrimeSolutions. The criteria used to determine inclusion of outcomes include:

They measure the prevention or reduction of crime, delinquency, or related problem behaviors (such as aggression, gang involvement, or school attachment), which may be presented as measures such as individual behaviors, community-level behaviors, or crime rates.
They measure the prevention, intervention, or response to victimization.
They measure improvement in justice systems or processes.
They measure the reduction of risk factors for crime and delinquency, including school failure, psychological problems or mental illness, and so forth.

See CrimeSolutions list of outcomes.

7. Review Program

At least two trained and certified study reviewers use the online Program Scoring Instrument, to assess the quality, strength, and extent to which the evidence indicates that the program impacts reviewed outcomes.

The Program Scoring Instrument consists of five domains, which are evaluated at the program, study, outcome, and effect levels:

Conceptual framework (program level)
Program implementation (study level)
Internal validity (study level)
Outcome measures (outcome level)
Effect sizes (effect level)

At the program level, the conceptual framework is assessed once for each program regardless of the number of studies in the evidence base. Study reviewers make this assessment based on information from the study or studies under review and additional program materials (such as nonexperimental, qualitative, ethnographic and case-study research as well as implementation materials).

At the study level, program implementation and internal validity are assessed by the study reviewers for each study that is included as part of the evidence base.

At the outcome level, outcome measures are assessed for each outcome that is included as part of the evidence base.

At the effect level, an effect size is calculated for each outcome. (In Step 9, multiple measures of the same or similar outcome are then aggregated into a single outcome, for which an aggregated effect size is calculated.)

8. Calculate Research Quality

Based on the information entered by the study reviewers, the program scoring instrument calculates the following items to rate the overall quality of each study:

The conceptual framework.
Fidelity (based on items scored under program implementation).
Methodology (based on items scored under the internal validity, outcome measures, and effect sections).

To be considered high quality, it is necessary to --

Know what the intervention intended to achieve and how it planned to achieve it (conceptual framework)
Confirm that the intervention was implemented as planned (fidelity).
Be reasonably certain that the intervention caused the change (methodology).

9. Calculate Outcome Ratings

The Program Scoring Instrument goes through three steps in calculating outcome ratings.

First, an effect size is calculated for each measure, which tells us the direction (i.e., in favor of the treatment group or comparison group) and statistical significance (i.e., whether the effect was likely due to the intervention).

Second, the effect sizes are aggregated, as necessary, within and across studies to determine one aggregate effect size per outcome.

Third, an outcome is assigned one of four ratings– effective, promising, ineffective, and negative effects.

Effective - Studies are very rigorous and well-designed and find significant, positive effects on the outcome. For an outcome to be considered Effective, the following must be true:
- The overall mean effect is positive and significant.
- There are no individual significant negative effects.
- At least 50% of the eligible studies are high quality.
- At least one study received the highest possible quality score.
Promising - Studies are well-designed but slightly less rigorous, or there may be limitations in their fidelity or conceptual framework, and find significant, positive effects on the outcome. For an outcome to be considered Promising, the following must be true:
- The overall mean effect is positive and significant.
- There are no individual significant negative effects.
- Less than 50% of the eligible studies are high quality OR no studies received the highest possible quality score.
Negative Effects - Studies are very rigorous and well-designed and find significant, harmful effects on the outcome. For an outcome to be considered negative effect, the following must be true:
- The overall mean effect is negative and significant.
- At least 50% of the eligible studies are high quality.
- At least one study received the highest possible quality score.
Ineffective - Studies are very rigorous and well-designed and find no significant effects on the outcome. For an outcome to be considered ineffective, the following must be true:
- The overall mean effect is not significant.
- At least 50% of the eligible studies are high quality.
- At least one study received the highest possible quality score.

For some outcomes, a rating cannot be assigned. The studies may not provide enough information or have significant limitations in their study design to establish a causal relationship to that outcome. If this is true for all the outcomes considered for a program, that program is added to the inconclusive evidence list.

If there is a discrepancy between the study reviewers for any outcome, the senior researcher works to achieve a consensus classification. If necessary, the senior researcher will also review the study and make a final determination on the classification.

10. Re-Review Program and Update Ratings

We re-review programs for the following reasons:

New evaluation studies, or studies not previously identified, are found that meet the CrimeSolutions criteria. This may include studies that extend the follow-up period of previously reviewed studies.
New supplemental materials are submitted that better explain the conceptual framework and fidelity dimensions of the program, which may affect a program's outcome evidence rating.
As part of a periodic and continuous quality control assessment to ensure information, links, and research in program profiles are current, accurate, and consistent with any updates to policies and procedures for the program review.

A re-review may or may not be sufficient to warrant a new evidence rating. If a senior researcher determines that there is sufficient evidence in the new materials to warrant another review, then the new information is sent to the study reviewers for assessment. Even if the program's evidence rating does not change, new evidence, information, and materials may be included or referenced on the program's profile page.

We also may re-review select programs when changes or clarifications are made to the Program Scoring Instrument, criteria for inclusion of a study, and the guidance given to our reviewers.

If a newly identified study is determined to be more rigorous than a study that was previously reviewed, the program will be re-reviewed to ensure the program rating is based on research that is both most rigorous and most up to date.

Re-reviewed programs undergo an expedited review process with one trained and certified study reviewer using the online Program Scoring Instrument, to assess the quality, strength, and extent to which the evidence indicates that the program achieves its goals.

For more information: Inquiring About or Appealing an Evidence Rating

Date Published: August 29, 2025