Glossary

April 3, 2025

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

A

Adherence (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

Adherence is a measure of the degree to which the core program services were implemented as designed (implemented with fidelity). An assessment of adherence is important because many programs that fail to show impacts suffer from a failure to deliver the program as specified (implementation failure). In general, there are three types of implementation failure: 1) no, or not enough, treatment; 2) the wrong treatment; and 3) unstandardized treatment.

Analysis of Variance (ANOVA)

A method for analyzing the differences in the means of two or more groups. Specifically, this procedure partitions the total variation in the dependent variable into two components: between-group variation and within-group variation. It allows researchers to determine if the differences between a control group and a treatment group are attributed to the independent variable or treatment.

Analytic Approach (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

Analytic approach is the systemic and organized process by which a relationship, question, or problem is broken down into the elements necessary to dissect, explain, and solve it. There are various methods for analyzing data, and different academic disciplines have their preferred approaches. However, the data type and structure typically dictate the most suitable analytical methods to ensure accurate, valid, and meaningful results.

Anticipatory Benefits (Evidence Rating Element)

Occurs when the effects of a program are observed prior to the implementation of the program, generally because the target population believes the program has already started. This element was reviewed along with diffusion and displacement on the Overall Rating Program Scoring Instrument. These elements are typically considered in evaluations of community-level crime prevention efforts.

See crime displacement and diffusion of benefits for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Assignment Level (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

The level of assignment in a research design refers to the hierarchical level—such as individual, cluster, or larger unit—at which participants are allocated to different conditions or treatments. Understanding the level of assignment is crucial for designing the study and analyzing the data appropriately. In general, subjects are assigned at either the individual or group level.

Assignment Type (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

Assignment type refers to the process of allocating subjects to different groups or conditions within the study. Nonrandom assignment involves assigning participants to different groups or conditions within the study in a manner that is not random. A natural cut point involves using a running variable to determine the assignment. Individuals just below the cutoff are assigned to one group (typically the treatment group). Individuals just above the cutoff are assigned to the other group (typically the control group). A natural experiment takes advantage of a naturally occurring event to study its effect on an outcome of interest. The event creating the different conditions or groups is outside the researcher’s control. Random assignment involves randomly assigning participants to different groups or conditions within the study in such a way that each participant has an equal chance of being placed in any group.

Attrition (Evidence Rating Element from the Overall Rating Program Scoring Instrument and the Rate-By-Outcome Program Scoring Instrument)

The loss of participants during the course of a study, which often occurs because subjects move or they refuse to participate in the study. This may be a threat to the study’s internal validity.

In the Overall Rating Program Scoring Instrument, attrition was considered as one part of the overall assessment on a study's internal validity. In the Rate-By-Outcome Program Scoring Instrument, attrition is assessed based on the calculation of overall and differential attrition.

Overall attrition refers to the combined loss of data for any sample member from either condition. This loss can occur for various reasons, such as participants dropping out, becoming nonresponsive, or being excluded because of noncompliance with study protocols.
Differential attrition refers to the difference in the rate of attrition between the intervention and comparison conditions.

B

Baseline Outcome Differences (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

Baseline outcome differences refer to variations in the starting point (baseline) of a measured outcome between different groups in a study, which can potentially bias the results if not addressed. Outcome equivalence ensures that any observed differences in outcomes after the intervention can be attributed to the intervention itself, rather than to preexisting differences between the groups. The method of assignment plays a crucial role in creating baseline equivalence. However, when the groups are not equivalent at baseline, statistical techniques such as covariate adjustment can be used to control for the baseline differences.

Bivariate Analysis

An analysis of the relationship between two variables, such as correlations and one-way analysis of variance (ANOVA).

C

Causal Evidence

Evidence that documents a relationship between an activity, treatment, or intervention (including technology) and its intended outcomes, including measuring the direction and size of a change, and the extent to which a change may be attributed to the activity or intervention. Causal evidence depends on the use of scientific methods to rule out, to the extent possible, alternative explanations for the documented change. This differs from descriptive evidence.

Chi-Square Test

A statistical test used to compare differences between observed, categorical data and expected data (based on a specific hypothesis) to determine if any difference that occurred is the same as would occur by chance.

Cluster Adjustment (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

A cluster adjustment is required if the level of analysis does not match the level of assignment (e.g., schools are randomized into treatment and comparison conditions, but the student is the unit of analysis) OR the analysis does not account for clustering (e.g., multilevel modeling or adjusting means and SDs for clustering). When data are clustered, the assumption of independence between observations is violated, which can lead to biased estimates and incorrect inferences if not properly addressed.

Coder Reliability (Evidence Rating Element from the Practice Scoring Instrument)

Coder reliability is an assessment of how the authors of the meta-analysis handled reliability of data extraction from the primary research reports. Ideally, two or more coders would extract all pieces of information from each eligible research report and reliability statistics would be used to assess coder reliability and/or consensus would be reached on all items.

Comparative Effectiveness Research (CER)

Comparative effectiveness research (CER): An evaluation approach to show the relative strengths and weaknesses of two (or more) programs on the same outcome. This approach generally compares a target program with another program, instead of a true control condition [i.e., treatment as usual (TAU) or no treatment]. The CrimeSolutions review process evaluates the effectiveness of a target program, by comparing a group that receives the program with a group that receives TAU or no treatment. Currently, CER studies are not eligible for review for CrimeSolutions because the comparison group is considered to receive more than TAU.

Comparison Group

A group of individuals whose characteristics are similar to those of a treatment group. Comparison group individuals may not receive any services, or they may receive a different set of services, treatment, or activities as the treatment group. In no instance do they receive the same services as the individuals being evaluated (the treatment group). Comparison groups are used in quasi-experimental designs where random assignment is not possible or practical.

Confounding Variables (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

A confounder variable is a variable whose presence affects the variables being studied so that the results do not reflect the actual relationship. Confounding variables correlate (positively or negatively) with both the dependent variable and the independent variable. Some of the methods that can be applied to account for confounding variables include randomization, statistical controls, simple matching, and propensity score matching.

Contamination (Evidence Rating Element from the Overall Rating Program Scoring Instrument and the Rate-By-Outcome Program Scoring Instrument)

Occurs when members of the control group or the comparison group are inadvertently exposed to the intervention or treatment being studied. Contamination threatens the study’s internal validity.

See internal validity for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Control Group

A group of individuals whose characteristics should be almost identical to those of the treatment group but do not receive the program services, treatments, or activities being evaluated. In experimental designs, individuals are placed into control groups and treatment groups through random assignment.

Correlation

A statistical term that measures the degree of the relationship between two variables. A correlation has two components, magnitude and direction. Magnitude is a measure of strength and ranges from 0, no correlation, to 1, perfect correlation. Direction determines whether a correlation is positive or negative. A positive correlation means that as one variable, X, increases so does another variable, Y. A negative correlation means that as one variable, X, decreases so does another variable, Y. An inverse correlation means that as one variable, X, increases the other variable, Y, decreases and vice versa. For example, if variables X and Y have a correlation of 0.7 this means they have a strong, positive relationship. Correlation does not imply a causal relationship between variables.

Crime Displacement and Diffusion of Benefits (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

In the Rate-By-Outcome Program Scoring Instrument displacement and diffusion are scored together.

Crime displacement refers to the relocation of criminal activities from one place, time, target, offense, or method to another as a result of crime prevention efforts. When measures are taken to reduce crime in a specific area or context, offenders may shift their behavior to avoid detection or capture, thereby continuing their criminal activities elsewhere or in a different manner.
Diffusion of benefits occurs when crime prevention measures not only decrease crime in the targeted area but also have positive effects on surrounding areas or similar contexts without additional direct intervention. The deterrent effects or improved conditions of the intervention spill over into nearby or related areas, leading to a broader reduction in crime or to other social benefits.

D

Data Source (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

The data source refers to the origin of the data. Many items may be collected from study participants through survey instruments or other forms of self-report. Additionally, studies may use measures derived from the observations reported by participants’ teachers, parents, or peers. In some cases, other observer reports may be used (e.g., from researchers, practitioners, or program staff). In many criminal justice studies, the outcomes being assessed may rely on official or administrative records, such as measures of recidivism, court case dispositions, or school disciplinary or absenteeism records. Finally, some measures, such as urinalysis tests, may rely on specimen/medical tests.

Dependent Variable

A variable whose outcome is influenced or changed by some other variable, usually the independent variable or the treatment. It is the “effect” or outcome variable in a cause and effect relationship.

Descriptive Evidence

Evidence used to characterize individuals, groups, events, processes, trends, or relationships using quantitative statistical methods, correlational methods, or qualitative research methods. This differs from causal evidence.

Diffusion (Evidence Rating Element from the Overall Rating Program Scoring Instrument)

This element was reviewed along with anticipatory benefits and displacement on the Overall Rating Program Scoring Instrument. These elements are often considered in evaluations of community-level crime prevention efforts.

See crime displacement and diffusion of benefits for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Dimensions

Sections of the Scoring Instruments that consist of broad categories of information used to review and rate an evidence base. Each dimension consists of multiple evidence rating elements.

The Practice Scoring Instrument consists of three dimensions, which include Eligibility, Quality, and Validity.
The Overall Rating Program Scoring Instrument consisted of four dimensions, which include Program’s Conceptual Framework; Study Design Quality; Study Outcomes; and Program Fidelity.
The Rate-By-Outcomes Program Scoring Instrument consists of five dimensions, which Conceptual Framework, Program Fidelity, Internal Validity, Outcomes, and Effects.

Direction (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

The direction of a measure refers to the operationalization of a measure. Specifically, whether higher values of a measure reflect negative results or higher values reflect positive results.

Displacement (Evidence Rating Element from the Overall Rating Program Scoring Instrument)

This element is reviewed along with diffusion and anticipatory benefits on the CrimeSolutions Overall Rating Program Scoring Instrument. These elements are typically considered in evaluations of community-level crime prevention efforts.

See crime displacement and diffusion of benefits for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Duration of Intervention (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

Duration of the intervention refers to the length (e.g., Hours/Days/Months/Years) of program activity described in the program protocol. Some programs have a variable length (e.g., the duration of a participant’s probation), others are ongoing with no clear endpoint (e.g., some mentoring interventions), and others have a duration that cannot be determined.

E

Effect Size (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

A standardized, quantitative index representing the magnitude and direction of an empirical relationship. More specifically, the effect size is a value that reflects the magnitude of the treatment effect. An effect size from an outcome evaluation represents the change in an outcome measure from before a program is implemented to the follow-up period. The effect size of the treatment group can be compared to the effect size from the control group to determine if there are any differences, and if so, whether those differences are statistically significant (which allows for greater confidence that the difference was due to the program). See statistical significance for more information. The most common types of effect sizes in the criminal justice and delinquency literature are the standardized mean difference effect size; odds ratios and risk ratios; and correlation coefficients.

A continuous effect size quantifies the magnitude of the relationship or difference between two variables or groups, often using metrics like Cohen's d or standardized mean difference. Continuous data can take on an infinite number of values within a given range.
A dichotomous effect size measures the magnitude of the difference or association between groups or variables, often using metrics like odds ratios, risk ratios, or risk differences. Dichotomous data only has two possible outcomes (i.e., Yes/No, Arrested/Not Arrested),

In program evaluation, the effect size is typically hypothesized a priori to guide decisions about needed sample size and the likelihood of Type I and Type II errors (See type I and type II error for more information). In a meta-analysis, the effect sizes from the various evaluation studies are standardized to be in the same form. By representing the findings of each study included in a meta-analysis in the same form, this permits a synthesis of those findings across studies. After evaluation data are analyzed, an actual effect can usually be estimated from the data, and this value is often used as a basis for comparative effectiveness research on alternative interventions.

The magnitude of an effect size is often judged using “rules of thumb” from social science research. For example, standardize mean difference effect sizes (Cohen’s d or Hedge’s g) are judge using the following rules: small=0.20; medium=0.50; large=0.80. These are not hard cut-off points but rather approximation. There are different standards for each type of effect size.

Effective, Rating

This rating means that, based on a systematic assessment of the evidence base, implementing the program or practice is likely to result in the intended outcome(s). This is an evidence rating on CrimeSolutions that indicates that a program or practice with strong evidence achieves justice-related outcomes when implemented with fidelity.

In the Overall Rating Program Scoring Instrument this rating is assigned to the program level and means overall the program is likely to result in the intended outcomes in general. In the Rate-By-Outcome Program Scoring Instrument, this rating is assigned to the outcome level and means the program is likely to produce the intended results on that outcome specifically. Read more about How We Rate Programs.

In the Practice Scoring Instrument this rating indicates that, on average, there is strong evidence of a statistically significant mean effect size favoring the intended effect of the practice. It is likely that implementing a program encompassed by the practice will achieve the intended outcome. Read more about How We Rate Practices.

Effectiveness

The strength of the evidence demonstrates that a program achieves justice-related outcomes.

Evidence

Information about a question that is generated through systematic data collection, research, or program evaluation using accepted scientific methods that are documented and replicable. Evidence may be classified as either descriptive or causal.

Evidence base (studies reviewed)

For programs, evidence base represents all available and eligible studies reviewed and scored by CrimeSolutions study reviewers, the results of which are aggregated to determine a program’s evidence rating. For practices, the evidence base comprises all available and eligible meta-analyses.

Evidence Ratings

Refers to one of four designations on CrimeSolutions indicating the extent of the evidence that a program works. Ratings are assigned from standardized review of rigorous evaluations. The four designations are:

Evidence Ratings Defined
Icon	Rating	Description
	Effective	Implementing the program, or a program encompassed by the practice, is likely to result in the intended outcome(s).
	Promising	Implementing the program, or a program encompassed by the practice, may result in the intended outcome(s).
	Ineffective	Implementing the program, or a program encompassed by the practice, is unlikely to result in the intended outcome(s).
	Negative Effects	Implementing the program, or program encompassed by the practice, will not result in intended outcome(s) and may result in harmful effects.

Programs and Practices are also assigned icons to identify whether they have been evaluated with multiple sample sizes. The two designations are:

A single-study icon is used to identify programs that have been evaluated with a single sample. A program with multiple publications listed in the evidence base may receive a single-study icon because:

The publications resulted from a study based on a single sample.
the studies that comprised the program’s evidence base did not demonstrate effects in a consistent direction.

A multiple studies icon is used to represent a greater extent of evidence supporting the evidence rating. The icon depicts programs that have more than one study in the evidence base demonstrating effects in a consistent direction. See How We Rate Programs or How We Rate Practices for more information.

Evidence Rating Element

Subcategories within the dimensions included in the Program and Practice scoring instruments that are used to review and rate the evidence for programs and practice. See How We Rate Practices for more information.

Evidence-based Programs

Evidence-based programs and practices generally have one or more rigorous outcome evaluations that demonstrated effectiveness by measuring the relationship between the program and its intended outcome(s).

Experimental Design

A research design in which participants are randomly assigned to an intervention/treatment group or a control group. Many social scientists believe that studies using random assignment lead to the highest confidence that observed effects are the result of the program and not another variable. See also Randomized Controlled Trial (RCT).

F

Fidelity (Evidence Rating Element)

The degree to which a program’s core services, components, and procedures are implemented as originally designed. Programs replicated with a high degree of fidelity are more likely to achieve consistent results.

See fidelity measurement for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Fidelity Measurement (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

To effectively establish causality, program designers should operationally define the core components of the program that are necessary and sufficient to achieve the desired outcomes. The implementation of these core components should then be documented and assessed empirically to determine whether the program under study meets a minimum threshold of evidence for implementation fidelity. Program evaluation studies should then use measures of implementation fidelity to identify the program’s underlying causal mechanism (or mechanisms).

Anecdotal evidence refers to information or examples based on stories, individual cases, or isolated incidents rather than systematic scientific evidence.
Qualitative evidence refers to information that is descriptive and conceptual but gathered through systematic means, such as interviews, focus groups, observations, and content analysis.
Quantitative evidence refers to information that is systematically collected and expressed numerically.

Follow-up Period (Evidence Rating Element from Overall Rating Program Scoring Instrument)

The length of time that the study period continues after the program ends to determine the program’s sustained or continued effects.

See post-intervention assessment reference point and post-intervention assessment period length for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

G

Grey Literature (Evidence Rating Element from the Practice Scoring Instrument)

Research and evaluations that are not controlled by commercial publishers (i.e., not published in a peer-review journal or a book). Sources of grey literature or unpublished studies include dissertations, theses, government reports, technical reports, conference presentations, and other unpublished sources. This is a publication bias for more information.

Note: If the literature search does not include an effort to locate unpublished studies, or is explicitly restricted to published literature, it is not eligible for inclusion as a practice on CrimeSolutions.

H

Heterogeneity (Evidence Rating Element from the Practice Scoring Instrument)

Refers to the variability of the effect sizes from the different evaluation studies included in a meta-analysis (e.g., some evaluations may show strong, significant effects while other evaluations show small or no effects). This is a dimension that rates a meta-analysis on whether the authors were aware of and attentive to heterogeneity (i.e., variability) in the effect sizes from the studies in the meta-analysis. Heterogeneity statistics include tau (t), tau-squared (t2), Q, or I-squared (I2).

History (Evidence Rating Element from all three scoring instruments)

An event that takes place between the pretest (data collected prior to the beginning of treatment) and the posttest (data collected after the treatment ends) that has nothing to do with the treatment but may impact observed outcomes. History is a potential threat to internal validity. See How We Rate Programs for more information.

I

Implementation

Refers to implementing a program in the same or similar manner targeting the same or similar population in order to achieve the same results that occurred when a program was originally implemented.

Inconclusive Evidence (formerly "Insufficient Evidence")

Programs or practices with inconclusive evidence are those that have been reviewed by CrimeSolutions Study Reviewers, but were not assigned an evidence rating due to limitations of the studies included in the programs' evidence base. Programs are placed on the inconclusive evidence list if the study (or studies) reviewed (1) had significant limitations in the study design or (2) lacked sufficient information about program fidelity so that it was not possible to determine if the program was delivered as designed.

Independent Variable

A variable that changes or influences another variable, usually the dependent variable and precedes the outcome variable in time. It is the “cause” in a cause and effect relationship.

Ineffective (formerly “No Effects”), Rating

This rating means that based on a systematic assessment of the evidence base, implementing the program is unlikely to result in the intended outcome(s). This is an evidence rating on CrimeSolutions that indicates that a program or practice with strong evidence is unlikely to achieve justice-related outcomes.

In the Overall Rating Program Scoring Instrument this rating is assigned to the program level and means overall the program is unlikely to result in the intended outcomes in general. In the Rate-By-Outcome Program Scoring Instrument, this rating is assigned to the outcome level and means the program is unlikely to produce positive results on that outcome specifically. Read more about How We Rate Programs.

In the Practice Scoring Instrument this rating indicates that, on average, there is no evidence of a statistically significant mean effect size on the intended effect of the practice. It is unlikely that implementing a program encompassed by the practice will achieve the intended outcome. Read more about How We Rate Practices.

Independent Variable

A variable that changes or influences another variable, usually the dependent variable. This is often the treatment in experimental designs and precedes the outcome variable in time. It is the “cause” in a cause and effect relationship.

Instrumentation (Program Evidence Rating Element from the Overall Program Scoring Instrument)

The measures used in a study. The instrumentation quality is dependent on the measures’ reliability and validity. Reliability refers to the degree to which a measure is consistent or gives very similar results each time it is used, and validity refers to the degree to which a measure is able to scientifically answer the question it is intended to answer. Instrumentation is a component considered within internal validity.

See internal validity for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Intended Outcomes

The results that a program deliberately sets out to achieve by its design (i.e., the program’s goals). For example, a reentry program's intended outcomes might be to reduce recidivism among program participants.

Intent-to-Treat Analysis

An analysis based on the initial treatment intent, not on the treatment eventually administered. For example, if the comparison group, and outcomes are compared only for those who completed the treatment, the study results may be biased. An intent-to-treat design ensures that all study participants are followed until the conclusion of the study, irrespective of whether the participant is still receiving or complying with the treatment.

Internal Validity (Evidence Rating Element for all three Scoring Instruments)

The degree to which observed changes can be attributed to the program. The validity of a study depends on both the research design and the measurement of the program activities and outcomes. Threats to internal validity may affect the extent to which observed effects may be attributed to a program or intervention.

On the CrimeSolutions Overall Rating Program Scoring Instrument, this includes: attrition, maturation, instrumentation, regression toward the mean, selection bias, contamination, and history, as well as other factors.
On the CrimeSolutions Rate-By-Outcome Program Scoring Instrument, this includes selection bias, contamination, regression toward the mean, history, and maturation.
On CrimeSolutions Practice Scoring Instrument, internal validity is measured by the number of randomized controlled trials used to calculate the mean effect size. Mean effect sizes calculated using only randomized controlled trials are considered to have fewer threats to internal validity then mean effect sizes calculated using only quasi-experimental designs.

Intraclass Correlation (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

The intraclass correlation coefficient (ICC) is a measure used to assess the degree of similarity or correlation of observations within clusters or groups. It quantifies the proportion of the total variability in a dependent variable that is attributable to differences between clusters, as opposed to differences within clusters.

J

Justice-related outcomes

CrimeSolutions rates programs based on justice-related outcomes. For CrimeSolutions, those outcome include:

Prevent or reduce crime, delinquency, or related problem behaviors.
Prevent, intervene, or respond to victimization.
Improve justice systems or processes.
Assist those who have been convicted of a crime or at-risk populations of individuals with potential to become involved in the justice system.

M

Maturation (Evidence Rating Element from the Overall Rating Program Scoring Instrument)

When observed outcomes are a result of natural changes of the program participants over time rather than because of program impact. Maturation is a threat considered within internal validity.

See internal validity for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Measurement Validity (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

Measurement validity refers to the extent to which the instrument accurately measures the underlying theoretical construct it is supposed to measure.

Meta-analysis

In general terms, meta-analysis is a social science method that allows us to look at effectiveness across numerous evaluations of similar, but not necessarily identical, programs, strategies, or procedures. Meta-analysis examines conceptually similar approaches and answers the question, "on average, how effective are these approaches?" On CrimeSolutions, we use the term "practices" to refer to these categories of similar programs, strategies, or procedures and meta-analyses form the evidence-base for practices.

A more precise definition for meta-analysis is that it is the systematic quantitative analysis of multiple studies that address a set of related research hypotheses in order to draw general conclusions, develop support for hypotheses, and/or produce an estimate of overall program effects.

Methodological Quality (Evidence Rating Element from the Practice Scoring Instrument)

The Practice Scoring Instrument assesses the methodological quality of a meta-analysis by taking into consideration the extent to which the authors of the meta-analysis were aware of and attentive to the methodological quality of the studies included in the meta-analysis. In addition to assessing the methodological quality of included studies, it is important for a meta-analysis to address whether quality had any effect on the main findings (e.g., does it bias the results).

Multilevel Models/Hierarchical Models

A statistical method that allows researchers to estimate separately the variance between settings. For example, when evaluating a school-based program it is important to know the variation of students within the same school as well as the variation of students between different schools. This ensures that when programs are evaluated, the effects are not attributed to the program when there could be underlying differences between schools or between the students in those schools.

Multisite

This term refers to programs that are evaluated in more than one site across multiple studies or evaluated at more than one site within a single study.

To receive a multisite tag on CrimeSolutions, a program must be evaluated 1) at more than one site within a single study; or 2) in more than one site across multiple studies. If the program is evaluated in more than one site across multiple studies, the studies’ ratings must be consistent (i.e., the demonstrated effects must be in a consistent direction) to receive the tag.

The publications resulted from a study based on a single sample.
the studies that comprised the program’s evidence base did not demonstrate effects in a consistent direction.

A multiple studies icon is used to represent a greater extent of evidence supporting the evidence rating. The icon depicts programs that have more than one study in the evidence base demonstrating effects in a consistent direction. For practices, the rating designations take a slightly different meaning.

Multivariate Analysis

Research strategy and analytic technique that involves the investigation of more than two variables at the same time or within the same statistical analysis. For example, in a multiple regression analysis, the effects of two or more independent variables are assessed in terms of their impact on the dependent variable.

N

Negative Effects, Rating

This rating means that, based on a systematic assessment of the evidence base, the program or practice is not likely to result in the intended outcome(s) and may result in harmful effects. This is an Evidence Rating on CrimeSolutions that indicates that a program or practice with strong evidence does not achieve justice-related outcomes when implemented with fidelity.

In the Overall Rating Program Scoring Instrument this rating is assigned to the program level and means overall the program is not likely to result in positive outcomes, and may result in harmful effect, in general. In the Rate-By-Outcome Program Scoring Instrument, this rating is assigned to the outcome level and means the program is not likely to produce positive results, and may result in harmful effects, on that outcome specifically. Read more about How We Rate Programs.
A Negative Effects rating assigned to a practice outcome indicates that, on average, there is strong evidence of a statistically significant mean effect size in the opposite direction of the intended effect for the practice. It is likely that implementing a program encompassed by practice will not achieve the intended outcome and may result in harmful effects. Read more about How We Rate Practices.

Non-experimental

Refers to a research design in which participants are not assigned to treatment and control/comparison groups (randomly or otherwise). Such designs do not allow researchers to establish causal relationships between a program or treatment and its intended outcomes. Non-experimental designs are sometimes used when ethics or circumstances limit the ability to use a different design or because the intent of the research is not to establish a causal relationship. Examples of non-experimental designs include case studies, ethnographic research, or historical analysis.

O

Office of Justice Programs (OJP)

An agency of U.S. Department of Justice, the Office of Justice Programs works in partnership with the justice community to identify the most pressing crime-related challenges confronting the justice system and to provide information, training, coordination, and funding of innovative strategies and approaches to address these challenges. The following bureaus and offices are part of the Office of Justice Programs: the Bureau of Justice Assistance (BJA), the Bureau of Justice Statistics (BJS), the National Institute of Justice (NIJ), the Office of Juvenile Justice and Delinquency Prevention (OJJDP), the Office for Victims of Crime (OVC), and the Office of Sex Offender Sentencing, Monitoring, Apprehending, Registering, and Tracking (SMART). Read more About the Office of Justice Programs.

Outcome

A desired change in behavior or attitude attributable to the implementation of a program or practice. Outcome must fall within the scope of CrimeSolutions and the Model Programs Guide and must relate to the

Prevention or reduction of crime, delinquency, or related problem behaviors (such as aggression, gang involvement, or school attachment), which may be presented as individual behaviors, community-level behaviors, crime rates, and the like.
Prevention, intervention, or response to victimization.
Improvement of justice systems or processes.
Reduction of risk factors for crime and delinquency, such as school failure, internalizing or externalizing behaviors, and so forth.

For operational purposes, outcomes are categorized by tier levels during the screening of the program and practice evidence base.

Tier 1 outcomes refer to the general outcome constructs (e.g., crime/delinquency, drugs and substance abuse, mental/behavioral health, education, victimization, family, etc.).
Tier 2 outcomes refer to the specific outcome constructs (e.g., property offenses, sex-related offenses, or violent offenses under crime/delinquency; alcohol, cocaine/crack cocaine, and heroin/opioids under drugs and substance abuse; internalizing behavior, externalizing behavior, and psychological functioning under mental/behavioral health; etc.).
Tier 3 outcomes refer to subcategories of specific outcome constructs from Tier 2.

On the Practice Scoring Instrument, effect sizes are coded to a Tier 2 outcome construct. In the future, effect sizes will be able to be coded to the Tier 3 outcome construct.

In the Overall Rating Program Scoring Instrument, outcomes were considered and rated separately because programs may target multiple outcomes. Examples of outcomes include reducing drug use, increasing system response to crime victims, and reducing fear of crime.

Primary outcomes refer to the primary or central intended behavioral effects of a program as documented by the theory of change. Within the scope of CrimeSolutions, those primary outcomes must also relate to criminal justice, juvenile justice, or victim services.
Secondary outcomes are the ancillary effects of a program.

The Rate-By-Outcome Program Scoring Instrument does use primary or secondary outcome assignments.

Outcome Evaluation

A formal study that seeks to determine if a program is working. An outcome evaluation involves measuring change in the desired outcomes (e.g., changes in behaviors or changes in crime rates) before and after a program is implemented, and determines if those changes can be attributed to the program. Outcome evaluations can use many different research designs: randomized controlled trials, quasi-experimental designs, time-series analysis, simple pre/posttest, etc. For CrimeSolutions, a program must be evaluated with at least one randomized controlled trial or quasi-experimental research design (with a comparison condition) in order for the outcome evaluation to be included in the program’s evidence base. See How We Rate Programs for more information.

Outcome Measure Scaling (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

Measure scaling refers to the categorization attributes of the outcome measure. The main difference between a continuous scale and a dichotomous scale lies in the nature and granularity of the data they measure. Distinguishing between continuous and dichotomous measures is important because it determines the appropriate statistical methods used to calculate the effect metric.

A continuous scale takes on an infinite number of values within a given range. These values typically are numerical and can include fractions and decimals. For example, height, weight, temperature, time, and scores on a standardized test are all measured on a continuous scale.
A dichotomous scale takes on only two possible values. These values typically are categorical and mutually exclusive. Yes/no, true/false, success/failure are examples of a dichotomous scale.

Outlier (Evidence Rating Element from the Practice Scoring Instrument)

An unusually high or low effect size. When combining effect sizes from various evaluations, extreme outliers can potentially distort the overall mean effect size. This is a dimension in the CrimeSolutions Practices Scoring Instrument that assesses whether the meta-analysis checks for effect size outliers in the data. Note that this item refers to outlying effect sizes included in the meta-analysis, not the outlying data in the evaluation studies that contributed to the meta-analysis.

P

Post-Intervention Assessment Period Length (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

The post-intervention assessment period length is the timeframe following the implementation of an intervention during which the effect of the intervention is evaluated.

Post-Intervention Assessment Reference Point (Evidence Rating Element from the Rate-By-Outcome Program Scoring Instrument)

A post-intervention assessment reference point is the specific moment or date when the assessment of an intervention begins.

Practical Significance

Refers to the practical importance of an effect size. For example, an outcome evaluation may show that the treatment group performed statistically significantly better than the control group following participation in a program, but if the effect size is very small and the program costs are very high, the results may not be practically significant. Practical significance can be subjective, and can be assessed by looking at the magnitude of the effect size, the costs and resources of the program, and various other factors.

Practice

A general category of programs, strategies, or procedures that share similar characteristics with regard to the issues they address and how they address them. CrimeSolutions uses the term “practice” in a very general way to categorize causal evidence that comes from meta-analyses of multiple program evaluations. Using meta-analysis, it is possible to group program evaluation findings in different ways to provide information about effectiveness at different levels of analysis. Therefore, practices on CrimeSolutions may include the following:

Program types – A generic category of programs that share similar characteristics with regard to the matters they address and how they do it. For example, family therapy is a program type that could be reported as a practice in CrimeSolutions.
Program infrastructures – An organizational arrangement or setting within which programs are delivered. For example, boot camps may be characterized as a practice.
Policies or strategies – Broad approaches to situations or problems that are guided by general principles but are often flexible in how they are carried out. For example, hot spots policing may be characterized as a practice.
Procedures or techniques – More circumscribed activities that involve a particular way of doing things in relevant situations. These may be elements or specific activities within broader programs or strategies. For example, risk assessment.

On the CrimeSolutions website, a practice is distinguished from a program. Whereas the evidence base for a practice is derived from one or more meta-analyses, the evidence base for a program is derived from one to three individual program evaluations.

Preponderance of Evidence

To determine if a program works, most of the outcome evidence must indicate effectiveness or ineffectiveness.

Process Evaluation

A study that seeks to determine if a program is operating as it was designed to. Process evaluations can be conducted in a number of ways, but may include examination of the service delivery model, the performance goals and measures, interviews with program staff and clients, etc. Process evaluations are not included in a program’s evidence base and therefore do not determine a program’s evidence rating, but may be used as supporting documentation. See How We Rate Programs for more information.

Program

A planned, coordinated group of activities and processes designed to achieve a specific purpose. A program should have specified procedures (e.g., a defined curriculum, an explicit number of treatment or service hours, and an optimal length of treatment) to ensure the program is implemented with fidelity to its model. It may have, but does not necessarily need, a “brand” name and may be implemented at single or multiple locations.

On the CrimeSolutions website, a program is distinguished from a practice. Whereas the evidence base for a program is derived from one to three individual program evaluations, the evidence base for a practice is derived from one or more meta-analyses.

Program Description (Evidence Rating Element from the Overall Scoring Instrument and the Rate-By-Outcome Program Scoring Instrument)

A program description serves as a guide for understanding the implementation of the program. The description should delineate five items: 1) a list of key activities, 2) the frequency or duration of key activities, 3) the targeted population, 4) the targeted behavior (or behaviors) [i.e., the intent of the program], and 5) the setting.

Promising, Rating

This rating means that, based on a systematic assessment of the evidence base, implementing the program or practice may result in the intended outcome(s). This is an evidence rating on CrimeSolutions that indicates that a program or practice with some evidence that it may achieve justice-related outcomes.

In the Overall Rating Program Scoring Instrument this rating is assigned to the program level and means overall the program may result in the intended outcomes in general. In the Rate-By-Outcome Program Scoring Instrument, this rating is assigned to the outcome level and means the program may produce the intended results on that outcome specifically. Read more about How We Rate Programs.

In the Practice Scoring Instrument this rating indicates that, on average, there is some evidence of a statistically significant mean effect size favoring the intended effect of the practice. It may be that implementing a program encompassed by the practice could achieve the intended outcomes. Read more about How We Rate Practices.

Publication Bias (Evidence Rating Element from the Practice Scoring Instrument)

Broadly refers to the idea that published evaluations are more likely to show large and/or statistically significant program effects, whereas unpublished evaluations are more likely to show null, small, or “negative” (i.e., opposite of what would be predicted) program effects. This is a dimension in the CrimeSolutions Practice Scoring Instrument that rates the extent to which a meta-analysis investigates the potential for publication bias in the sample of included studies.

Q

Quasi-experimental Design

A research design that resembles an experimental design, but in which participants are not randomly assigned to treatment and control groups. Quasi-experimental designs are generally viewed as weaker than experimental designs because threats to validity cannot be as thoroughly minimized. This reduces the level of confidence that observed effects may be attributed to the program and not other variables.

R

Randomized Controlled Trial (RCT) / Randomized Field Experiment

Refers to an control group . Most social scientists consider random assignment to lead to the highest level of confidence that observed effects are the result of the program and not other variables.

To receive a “randomized controlled trial” tag on CrimeSolutions, a program must include in the evidence base at least 1 study that (1) allocates groups via a valid random assignment procedure; and (2) is rated highly for overall design quality by CrimeSolutions Study Reviewers and (3) has outcome evidence consistent with the overall program rating.

The RCT tag designation is “This program's rating is based on evidence that includes at least one high-quality randomized controlled trial.”

Regression toward the Mean (Evidence Rating Element from the Overall Rating Program Scoring Instrument)

The statistical tendency for extreme scores relative to the mean to move closer to the average score in subsequent measurements. Regression toward the mean is a threat to the study’s internal validity.

See internal validity for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Research Design (Evidence Rating Element)

The plan for how a study’s information is gathered, which includes identifying the data collection method(s), the instrumentation used, the administration of those instruments, and the methods to organize and analyze the data. The quality of the research design impacts whether a causal relationship between program treatment and outcome may be established. Research designs may be divided into three categories: How We Rate Programs for more information.

Reviewer Confidence (Evidence Rating Element from the Overall Rating Program Scoring Instrument)

As a final step on the Overall Rating Program Scoring Instrument, How We Rate Programs for more information.

S

Sample Size (Evidence Rating Element from the Overall Rating Program Scoring Instrument and the Rate-By-Outcome Program Scoring Instrument)

A sample is the subset of the entire population that is included in a research study. Typically, all else being equal, a larger sample size leads to increased precision in estimates of various properties of the population. The sample size affects the statistical power of a study and the extent to which a study is capable of detecting meaningful program effects. It is included as an element with How We Rate Programs for more information.

The assigned sample refers to the initial set of subjects allocated to different conditions of a research study.
The analytic sample refers to a subset of subjects included in the statistical analysis of a research study. The size of the analytic sample can differ from the initially assigned sample because of various factors, such as attrition, nonresponses, or data quality issues.

Scoring Instrument

The method by which aspects, strengths, and weaknesses of programs and practices are consistently and objectively rated for evidence.

For practices, the Practice Scoring Instrument is a compilation of the dimensions and elements of a research study that are reviewed and assigned a numerical score by the CrimeSolutions Study Reviewers in order to assess the evidence of a practice’s effectiveness. The instrument provides a standard method to assess the quality of each practice’s evidence base, while also reflecting Study Reviewers’ judgment and expertise. For more information, see How We Rate Practices.
For programs, the Overall Rating and Rate-By-Outcome Program Scoring Instruments are a compilation of the dimensions and elements of a research study that are reviewed and assessed by the CrimeSolutions Study Reviewers in order to assess the evidence of a program’s effectiveness. The instrument provides a standard method to assess the quality of each program’s evidence base, while also reflecting Study Reviewers’ judgment and expertise. For more information, see How We Rate Programs.

Selection Bias (Evidence Rating Element in the Overall Rating Program Scoring Instrument and the Rate-By-Outcome Program Scoring Instrument)

Occurs when study participants are assigned to groups such that pre-existing differences (unrelated to the program, treatment, or activities) impact differences in observed outcomes. Selection bias threatens the study’s internal validity. Even if the subjects are randomly assigned, this threat is of particular concern with studies that have small samples.

Senior Researcher

Subject matter and research methodology experts who serve a leadership role in selecting the studies that comprise the evidence base for a program or practice and who coordinate the review process for a given topic area on CrimeSolutions. They also ensure that any scoring discrepancies between Study Reviewers are resolved and consensus is achieved prior to a program or practice outcome being assigned a final evidence rating. Read more about Senior Researchers and Study Reviewers.

Site

This term refers to the location of an evaluation that examines the effectiveness of a specific program.

For purposes of evaluation and assessment in CrimeSolutions, the term “site” includes the following three elements: 1) geographic location (determined by factors such as physical location or boundaries or contextual variations); 2) jurisdictional or organizational independence of implementation (determined by factors such as independence of decisionmakers); and 3) population independence or uniqueness (determined by factors such as race/ethnicity and socioeconomic status).

Statistical Adjustment (Evidence Rating Element from the Overall Rating Program Scoring Instrument)

The use of statistical controls to account for the initial measured differences between groups. It is not applicable for all research designs.

See effect size for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Statistical Power (Evidence Rating Element from the Overall Rating Program Scoring Instrument)

The ability of a statistical test to detect meaningful program effects. It is a function of several factors, including: 1) the size of the sample; 2) the magnitude of the expected effect; and 3) the type of statistical test used. Statistical power is an element within sample size on the CrimeSolutions scoring instrument.

See sample size for more information on how this item is rated in the Rate-By-Outcome Program Scoring Instrument.

Study Reviewer

Subject matter and research methodology experts who review and assess the individual evaluation studies (for programs) or meta-analyses (for practices) that comprise the Senior Researchers and Study Reviewers.

Subgroup

A subset of the full study sample other than “a priori” treatment, comparison, or control groups . Researchers sometimes conduct analyses on subgroups when they want to examine whether the program or practice works better for one type of participant or another (for example, whether a juvenile prevention program works better for boys than girls, or whether a prison-based program works better for high-risk than low-risk individuals).

Subgroup Analysis

Analysis that involves dividing the analyzed full study sample into a subset of study participants, most often to make comparisons between them. While subgroup analyses can provide valuable information, they are most often observational, or correlational, analyses, as no proper comparison/control groups are included in these analyses. Under the CrimeSolutions review process, only the full study sample is scored (even if the study authors state clearly an “a priori” theoretical rationale for why the program or practice would be expected to work for a given subgroup and not another).

Analyses of subgroups are described in the "Other Information (Including Subgroup Findings) sections of CrimeSolutions program profiles, but the results do not impact the program’s overall evidence rating. Examples of subgroups that may be reported in program profiles include those categorized by sex (e.g., male versus female); race/ethnicity (e.g. Black, Hispanic); age (e.g., older versus younger participants); setting (e.g., urban versus suburban versus rural); risk status (e.g., high-risk versus low-risk); family structure (e.g., single-parent versus two-parent household); delivery setting (e.g., community versus institutional); dosage (e.g., partial versus full implementation); and offense types (e.g., violent versus nonviolent persons).

Systematic Review

A process by which the research evidence from multiple studies on a particular topic is reviewed and assessed using systematic methods to reduce bias in selection and inclusion of studies. A systematic review is generally viewed as more thorough than a non-systematic literature review, but does not necessarily involve the quantitative statistical techniques of a meta-analysis.

T

Theory of Change (Evidence Rating Element in the Overall Rating Program Scoring Instrument and the Rate-By-Outcome Program Scoring Instrument)

The framework that outlines how and why a program or practice is expected to bring about change. The theory of change may be explicit or implicit.

An explicit program theory is formally documented and communicated as part of the program description.
An implicit program theory generally is unstated and appeals to common sense.

Time Series Analysis

An analytic technique that uses a sequence of data points, measured typically at successive, uniform time intervals, to identify trends and other characteristics of the data. For example, a time series analysis may be used to study a city’s crime rate over time and predict future crime trends.

Treatment Group

The subjects or program participants of the set of services, treatment, or activities being studied or tested.

Type I Error

The probability of a Type I error, usually signified as “alpha,” is often used to indicate the chance of failing to reject a null hypothesis that is actually false (e.g., concluding that a program works when in fact it does not, also called a false positive).

Type II Error

The probability of a Type II error, usually signified as “beta,” is often used to indicate the chance that an actual effect goes undetected (e.g., concluding that a program doesn’t work when it fact it does, also called a false negative).

V

Variance

A statistical measure of how far a set of data points are dispersed from the mean or average for a population or a sample. It is the average deviation of outcomes from the mean of outcomes for a group. It is used as a step in determining the effect of an intervention or treatment on a population.

W

Weighting of Results (Evidence Rating Element from the Practice Scoring Instrument)

Studies in a meta-analysis that have larger samples and produce more precise results are given more credit to the effect size estimate than results from smaller samples. The Practice Scoring Instrument assesses whether the meta-analysis uses appropriate weighting schemes when estimating mean effect sizes and in other analyses.

Date Published: April 3, 2025