A method for analyzing the differences in the means of two or more groups. Specifically, this procedure partitions the total variation in the dependent variable into two components: between-group variation and within-group variation. It allows researchers to determine if the differences between a control group and a treatment group are attributed to the independent variable or treatment.
Occurs when the effects of a program are observed prior to the implementation of the program, generally because the target population believes the program has already started. This element is reviewed along with diffusion and displacement on the CrimeSolutions Scoring Instrument. These elements are typically considered in evaluations of community-level crime prevention efforts. See Program Review and Rating from Start to Finish for more information.
The loss of participants during the course of a study, which often occurs because subjects move or they refuse to participate in the study. This may be a threat to the study’s Internal Validity. See Program Review and Rating from Start to Finish for more information.
Evidence that documents a relationship between an activity, treatment, or intervention (including technology) and its intended outcomes, including measuring the direction and size of a change, and the extent to which a change may be attributed to the activity or intervention. Causal evidence depends on the use of scientific methods to rule out, to the extent possible, alternative explanations for the documented change. This differs from descriptive evidence.
A statistical test used to compare differences between observed, categorical data and expected data (based on a specific hypothesis) to determine if any difference that occurred is the same as would occur by chance.
A group of individuals whose characteristics are similar to those of a treatment group. Comparison group individuals may not receive any services, or they may receive a different set of services, treatment, or activities as the treatment group. In no instance do they receive the same services as the individuals being evaluated (the treatment group). Comparison groups are used in quasi-experimental designs where random assignment is not possible or practical.
Occurs when members of the control group or the comparison group are inadvertently exposed to the intervention or treatment being studied. Contamination threatens the study’s Internal Validity. See Program Review and Rating from Start to Finish for more information.
A group of individuals whose characteristics should be almost identical to those of the treatment group but do not receive the program services, treatments, or activities being evaluated. In experimental designs, individuals are placed into control groups and treatment groups through random assignment.
A statistical term that measures the degree of the relationship between two variables. A correlation has two components, magnitude and direction. Magnitude is a measure of strength and ranges from 0, no correlation, to 1, perfect correlation. Direction determines whether a correlation is positive or negative. A positive correlation means that as one variable, X, increases so does another variable, Y. A negative correlation means that as one variable, X, decreases so does another variable, Y. An inverse correlation means that as one variable, X, increases the other variable, Y, decreases and vice versa. For example, if variables X and Y have a correlation of 0.7 this means they have a strong, positive relationship. Correlation does not imply a causal relationship between variables.
A variable whose outcome is influenced or changed by some other variable, usually the independent variable or the treatment. It is the “effect” or outcome variable in a cause and effect relationship.
Evidence used to characterize individuals, groups, events, processes, trends, or relationships using quantitative statistical methods, correlational methods, or qualitative research methods. This differs from causal evidence.
Occurs when the effects or benefits of a program extend beyond the places, individuals, problems, or behaviors directly or indirectly targeted. This element is reviewed along with anticipatory benefits and displacement on the CrimeSolutions Scoring Instrument. These elements are often considered in evaluations of community-level crime prevention efforts. See Program Review and Rating from Start to Finish for more information.
One of four broad categories of information included in the CrimeSolutions Scoring Instrument used to review and rate program evidence. The dimensions include: Program’s Conceptual Framework; Study Design Quality; Study Outcomes; and Program Fidelity, which consists of multiple Evidence Rating Elements. See Program Review and Rating from Start to Finish for more information.
Occurs when an intervention has the effect of moving the problem in question (such as crime) rather than producing an actual reduction in incidence. This element is reviewed along with diffusion and anticipatory benefits on the CrimeSolutions Scoring Instrument. These elements are typically considered in evaluations of community-level crime prevention efforts. See Program Review and Rating from Start to Finish for more information.
A standardized, quantitative index representing the magnitude and direction of an empirical relationship. More specifically, the effect size is a value that reflects the magnitude of the treatment effect. An effect size from an outcome evaluation represents the change in an outcome measure from before a program is implemented to the follow-up period. The effect size of the treatment group can be compared to the effect size from the control group to determine if there are any differences, and if so, whether those differences are statistically significant (which allows for greater confidence that the difference was due to the program). See Statistical Significance for more information. The most common types of effect sizes in the criminal justice and delinquency literature are the standardized mean difference effect size; odds ratios and risk ratios; and correlation coefficients.
In program evaluation, the effect size is typically hypothesized a priori to guide decisions about needed sample size and the likelihood of Type I and Type II errors (See Type I Error and Type II Error for more information). In a meta-analysis, the effect sizes from the various evaluation studies are standardized to be in the same form. By representing the findings of each study included in a meta-analysis in the same form, this permits a synthesis of those findings across studies. After evaluation data are analyzed, an actual effect can usually be estimated from the data, and this value is often used as a basis for comparative effectiveness research on alternative interventions.
The magnitude of an effect size is often judged using “rules of thumb” from social science research. For example, standardize mean difference effect sizes (Cohen’s d or Hedge’s g) are judge using the following rules: small=0.20; medium=0.50; large=0.80. These are not hard cut-off points but rather approximation. There are different standards for each type of effect size.
An Evidence Rating on CrimeSolutions that indicates a program with strong evidence that it achieves justice-related outcomes when implemented with fidelity. Read more About CrimeSolutions or about Program Review and Rating from Start to Finish. "Effective" programs are represented throughout the site with the "Effective" icon.
The strength of the evidence demonstrating that a program achieves justice-related outcomes.
Information about a question that is generated through systematic data collection, research, or program evaluation using accepted scientific methods that are documented and replicable. Evidence may be classified as either descriptive or causal.
For programs, evidence base represents the three or fewer studies reviewed and scored by CrimeSolutions Study Reviewers the results of which are aggregated to determine a program’s Evidence Rating. For practices, the evidence base comprises all available meta-analyses. Read more About CrimeSolutions or about Program Review and Rating from Start to Finish or Practice Review and Rating from Start to Finish.
Subcategories within the four broad dimensions included in the CrimeSolutions Scoring Instrument used to review and rate the evidence for a program or practice. See Program Review and Rating from Start to Finish or Practice Review and Rating from Start to Finish for more information.
The National Institute of Justice considers programs to be evidence-based when their effectiveness has been demonstrated by causal evidence obtained through high quality outcome evaluations and that have been replicated and evaluated in at least three sites.
NIJ defines high quality outcome evaluations as those using rigorous, randomized controlled trials on programs implemented with fidelity.
A research design in which participants are randomly assigned to an intervention/treatment group or a control group. Many social scientists believe studies using random assignment lead to the highest confidence that observed effects are the result of the program and not another variable. See also Randomized Controlled Trial (RCT).
The degree to which a program’s core services, components, and procedures are implemented as originally designed. Programs replicated with a high degree of fidelity are more likely to achieve consistent results. See Program Review and Rating from Start to Finish for more information.
Research and evaluations that are not controlled by commercial publishers (i.e., not published in a peer-review journal or a book). Sources of grey literature or unpublished studies include dissertations, theses, government reports, technical reports, conference presentations, and other unpublished sources. This is a dimension in the CrimeSolutions Practices Scoring Instrument that assesses the extent to which a meta-analysis includes results from unpublished or “grey” literature sources. A meta-analysis should always attempt to include grey literature due to consistent evidence that the nature and direction of research findings is often related to publication status. See Publication Bias for more information.
Note: If the literature search does not include an effort to locate unpublished studies, or is explicitly restricted to published literature, it is not eligible for inclusion as a practice on CrimeSolutions.
An event that takes place between the pretest (data collected prior to the treatment beginning) and the posttest (data collected after the treatment ends) that has nothing to do with the treatment but may impact observed outcomes. History is a potential threat to Internal Validity. See Program Review and Rating from Start to Finish for more information.
Programs or practices with inconclusive evidence are those that have been reviewed by CrimeSolutions Study Reviewers, but were not assigned an evidence rating due to limitations of the studies included in the programs' evidence base. Programs are placed on the inconclusive evidence list if the study (or studies) reviewed (1) had significant limitations in the study design or (2) lacked sufficient information about program fidelity so that it was not possible to determine if the program was delivered as designed.
Note that these programs and practices were previously referred to as "insufficient evidence."
A variable that changes or influences another variable, usually the dependent variable. This is often the treatment in experimental designs and precedes the outcome variable in time. It is the “cause” in a cause and effect relationship.
The measures used in a study. The instrumentation quality is dependent on the measures’ reliability and validity. Reliability refers to the degree to which a measure is consistent or gives very similar results each time it is used, and validity refers to the degree to which a measure is able to scientifically answer the question it is intended to answer. Instrumentation is a component considered within Internal Validity. See Program Review and Rating from Start to Finish for more information.
An analysis based on the initial treatment intent, not on the treatment eventually administered. For example, if the treatment group has a higher attrition rate than the control or comparison group, and outcomes are compared only for those who completed the treatment, the study results may be biased. An intent-to-treat design ensures that all study participants are followed until the conclusion of the study, irrespective of whether the participant is still receiving or complying with the treatment.
The degree to which observed changes can be attributed to the program. The validity of a study depends on both the research design and the measurement of the program activities and outcomes. Threats to internal validity may affect the extent to which observed effects may be attributed to a program or intervention, on CrimeSolutions’ Scoring Instrument, which includes: Attrition, Maturation, Instrumentation, Regression toward the Mean, Selection Bias, Contamination,and History, as well as other factors. See Program Review and Rating from Start to Finish for more information.
On CrimeSolutions’ Scoring Instrument for practices, internal validity is measured by the number of randomized controlled trials used to calculate the mean effect size. Mean effect sizes calculated using only randomized controlled trials are considered to have fewer threats to internal validity then mean effect sizes calculated using only quasi-experimental designs. See Practice Review and Rating from Start to Finish for more information.
CrimeSolutions rates programs based on justice-related outcomes. For CrimeSolutions, those outcome include:
- Prevent or reduce crime, delinquency, or related problem behaviors.
- Prevent, intervene, or respond to victimization.
- Improve justice systems or processes.
- Assist offenders or at-risk populations of individuals with potential to become involved in the justice system.
When observed outcomes are a result of natural changes of the program participants over time rather than because of program impact. Maturation is a threat considered within Internal Validity. See Program Review and Rating from Start to Finish for more information.
In general terms, meta-analysis is a social science method that allows us to look at effectiveness across numerous evaluations of similar, but not necessarily identical, programs, strategies, or procedures. Meta-analysis examines conceptually similar approaches and answers the question, "on average, how effective are these approaches?" On CrimeSolutions, we use the term "practices" to refer to these categories of similar programs, strategies, or procedures and meta-analyses form the evidence-base for practices.
A more precise definition for meta-analysis is that it is the systematic quantitative analysis of multiple studies that address a set of related research hypotheses in order to draw general conclusions, develop support for hypotheses, and/or produce an estimate of overall program effects.
A statistical method that allows researchers to estimate separately the variance between subjects within the same setting, and the variance between settings. For example, when evaluating a school-based program it is important to know the variation of students within the same school as well as the variation of students between different schools. This ensures that when programs are evaluated, the effects are not attributed to the program when there could be underlying differences between schools or between the students in those schools.
This term refers to programs that are evaluated in more than one site across multiple studies or evaluated at more than one site within a single study.
To receive a multisite tag on CrimeSolutions, a program must be evaluated 1) at more than one site within a single study; or 2) in more than one site across multiple studies. If the program is evaluated in more than one site across multiple studies, the studies’ ratings must be consistent (i.e., the demonstrated effects must be in a consistent direction) to receive the tag.
Research strategy and analytic technique that involves the investigation of more than two variables at the same time or within the same statistical analysis. For example, in a multiple regression analysis, the effects of two or more independent variables are assessed in terms of their impact on the dependent variable.
An Evidence Rating on CrimeSolutions indicating that a program or practice has strong evidence that the program did not have the intended effects or had harmful effects when trying to achieve justice-related outcomes. While programs and practices rated No Effects may have had some positive effects, the overall rating is based on the preponderance of evidence. Read more About CrimeSolutions or Program Review and Rating from Start to Finish. "No Effects" programs are represented throughout the site with the "No Effects" icon: .
Refers to a research design in which participants are not assigned to treatment and control/comparison groups (randomly or otherwise). Such designs do not allow researchers to establish causal relationships between a program or treatment and its intended outcomes. Non-experimental designs are sometimes used when ethics or circumstances limit the ability to use a different design or because the intent of the research is not to establish a causal relationship. Examples of non-experimental designs include case studies, ethnographic research, or historical analysis.
A formal study that seeks to determine if a program is working. An outcome evaluation involves measuring change in the desired outcomes (e.g., changes in behaviors or changes in crime rates) before and after a program is implemented, and determines if those changes can be attributed to the program. Outcome evaluations can use many different research designs: randomized controlled trials, quasi-experimental designs, time-series analysis, simple pre/posttest, etc. For CrimeSolutions, a program must be evaluated with at least one randomized controlled trial or quasi-experimental research design (with a comparison condition) in order for the outcome evaluation to be included in the program’s evidence base. See Program Review and Rating from Start to Finish for more information.
The intended results of a program’s activities or operation and a dimension in the CrimeSolutions Scoring Instrument. Primary outcomes refer to the primary or central intended effects of a program. Within the scope of CrimeSolutions, those primary outcomes must also relate to criminal justice, juvenile justice, or victim services. Secondary outcomes are the ancillary effects of a program. Outcomes are considered and rated separately within this dimension because programs may target multiple outcomes. Examples of outcomes include: reducing drug use, increasing system response to crime victims, and reducing fear of crime.
A general category of programs, strategies, or procedures that share similar characteristics with regard to the issues they address and how they address them. CrimeSolutions uses the term “practice” in a very general way to categorize causal evidence that comes from meta-analyses of multiple program evaluations. Using meta-analysis, it is possible to group program evaluation findings in different ways to provide information about effectiveness at different levels of analysis. Therefore, practices on CrimeSolutions may include the following:
- Program types – A generic category of programs that share similar characteristics with regard to the matters they address and how they do it. For example, family therapy is a program type that could be reported as a practice in CrimeSolutions.
- Program infrastructures – An organizational arrangement or setting within which programs are delivered. For example, boot camps may be characterized as a practice.
- Policies or strategies – Broad approaches to situations or problems that are guided by general principles but are often flexible in how they are carried out. For example, hot spots policing may be characterized as a practice.
- Procedures or techniques – More circumscribed activities that involve a particular way of doing things in relevant situations. These may be elements or specific activities within broader programs or strategies. For example, risk assessment.
On the CrimeSolutions website, a practice is distinguished from a program. Whereas the evidence base for a practice is derived from one or more meta-analyses, the evidence base for a program is derived from one to three individual program evaluations.
A planned, coordinated group of activities and processes designed to achieve a specific purpose. A program should have specified procedures (e.g., a defined curriculum, an explicit number of treatment or service hours, and an optimal length of treatment) to ensure the program is implemented with fidelity to its model. It may have, but does not necessarily need, a “brand” name and may be implemented at single or multiple locations.
On the CrimeSolutions website, a program is distinguished from a practice. Whereas the evidence base for a program is derived from one to three individual program evaluations, the evidence base for a practice is derived from one or more meta-analyses.
A research design that resembles an experimental design, but in which participants are not randomly assigned to treatment and control groups. Quasi-experimental designs are generally viewed as weaker than experimental designs because threats to validity cannot be as thoroughly minimized. This reduces the level of confidence that observed effects may be attributed to the program and not other variables.
Refers to an experimental research design in which participants are randomly assigned to a treatment or a control group. Most social scientists consider random assignment to lead to the highest level of confidence that observed effects are the result of the program and not other variables.
To receive a “randomized controlled trial” tag on CrimeSolutions, a program must include in the evidence base at least 1 study that (1) allocates groups via a valid random assignment procedure; and (2) is rated highly for overall design quality by CrimeSolutions Study Reviewers and (3) has outcome evidence consistent with the overall program rating.
As a final step on the Scoring Instrument, Study Reviewers provide an assessment as to their overall confidence in the study design. If both Study Reviewers agree, and the Lead Researcher concurs, that there is a fundamental flaw in the study design (not captured in the Design Quality dimension) that raises serious concerns about the study’s results, the study is removed from the evidence base and not factored into the Evidence Rating. This final determination serves as an additional safeguard to ensure that only the most rigorous studies comprise the evidence base. The study citation will be listed among the program’s additional references. See Program Review and Rating from Start to Finish for more information.
A sample is the subset of the entire population that is included in a research study. Typically, all else being equal, a larger sample size leads to increased precision in estimates of various properties of the population. The sample size affects the statistical power of a study and the extent to which a study is capable of detecting meaningful program effects. It is included as an element with Statistical Power in the CrimeSolutions Scoring Instrument. See Program Review and Rating from Start to Finish for more information.
The method by which aspects, strengths, and weaknesses of programs and practices are consistently and objectively rated for evidence. For programs, the scoring instrument is a compilation of the dimensions and elements of a research study that are reviewed and assigned a numerical score by the CrimeSolutions Study Reviewers in order to assess the evidence of a program’s effectiveness. The instrument provides a standard method to assess the quality of each program’s evidence base, while also reflecting Study Reviewers’ judgment and expertise. A similar method of scoring the aspects of meta-analyses is used for practices. See Program Review and Rating from Start to Finish or Scoring Instrument for more information.
This term refers to the location of an evaluation that examines the effectiveness of a specific program.
For purposes of evaluation and assessment in CrimeSolutions, the term “site” includes the following three elements: 1) geographic location (determined by factors such as physical location or boundaries or contextual variations); 2) jurisdictional or organizational independence of implementation (determined by factors such as independence of decisionmakers); and 3) population independence or uniqueness (determined by factors such as race/ethnicity and socioeconomic status).
The use of statistical controls to account for the initial measured differences between groups. It is not applicable for all research designs. See Program Review and Rating from Start to Finish for more information.
The ability of a statistical test to detect meaningful program effects. It is a function of several factors, including: 1) the size of the sample; 2) the magnitude of the expected effect; and 3) the type of statistical test used. Statistical power is an element within Sample Size on the CrimeSolutions Scoring Instrument. See Program Review and Rating from Start to Finish for more information.
In an evaluation, statistical significance refers to the probability that any differences found between the treatment group and control group are not due to chance but are the result of the treatment group’s participation in the program or intervention being studied. For example, if an outcome evaluation finds that after participating in a substance abuse program, the treatment group was statistically significantly less likely to abuse substances compared with the control group, this means that the difference between the two groups is likely due to the program and not due to chance.
In social science, researchers generally use a p-value of 0.05 or less, which means the probability that the difference between the treatment group and control group is due to chance is less than 5 percent. The p=0.05 is the cut-off point that CrimeSolutions Expert Reviewers use to score whether an outcome is statistically significant. If the p-value is larger than 0.05, the outcome is not statistically significant, and the difference between the treatment and control group could be due to chance. See Program Review and Rating from Start to Finish for more information.
Subject matter and research methodology experts who review and assess the individual evaluation studies (for programs) or meta-analyses (for practices) that comprise the evidence base upon which CrimeSolutions ratings are based. All Reviewers must complete training and receive certification prior to becoming a Study Reviewer. Read more about CrimeSolutions Researchers and Reviewers.
Analysis that involves dividing the analyzed full study sample into a subset of study participants, most often to make comparisons between them. While subgroup analyses can provide valuable information, they are most often observational, or correlational, analyses, as no proper comparison/control groups are included in these analyses. Under the CrimeSolutions review process, only the full study sample is scored (even if the study authors state clearly an “a priori” theoretical rationale for why the program or practice would be expected to work for a given subgroup and not another).
Analyses of subgroups are described in the "Other Information (Including Subgroup Findings) sections of CrimeSolutions program profiles, but the results do not impact the program’s overall evidence rating. Examples of subgroups that may be reported in program profiles include those categorized by sex (e.g., male versus female); race/ethnicity (e.g. Black, Hispanic); age (e.g., older versus younger participants); setting (e.g., urban versus suburban versus rural); risk status (e.g., high-risk versus low-risk); family structure (e.g., single-parent versus two-parent household); delivery setting (e.g., community versus institutional); dosage (e.g., partial versus full implementation); and offense types (e.g., violent versus nonviolent offenders).
A process by which the research evidence from multiple studies on a particular topic is reviewed and assessed using systematic methods to reduce bias in selection and inclusion of studies. A systematic review is generally viewed as more thorough than a non-systematic literature review, but does not necessarily involve the quantitative statistical techniques of a meta-analysis.
An analytic technique that uses a sequence of data points, measured typically at successive, uniform time intervals, to identify trends and other characteristics of the data. For example, a time series analysis may be used to study a city’s crime rate over time and predict future crime trends.
The subjects or program participants of the set of services, treatment, or activities being studied or tested.
A statistical measure of how far a set of data points are dispersed from the mean or average for a population or a sample. It is the average deviation of outcomes from the mean of outcomes for a group. It is used as a step in determining the effect of an intervention or treatment on a population.
Updated definition of "No Effects."