We are searching data for your request:
Upon completion, a link will appear to access the found materials.
I am designing a study in which a large number of people are being treated. I cannot influence the treatment (e.g., a delayed treatment). I am able to make pre and post treatment measures on the subjects. The treatment has many effects on the subjects. I have a screener that I believe will identify subjects that will predict individuals that will improve on measure A.
The proposed experimental design is for two groups to be identified based on a screening test (the screening test can be administered pre or post treatment). Both groups will be subjected to the same treatment. Prior to and following treatment both groups will be tested on measures A and B. The hypothesis is that the treatment will effect the groups differently on measure A but not on measure B. Is there a term for this design? Is this a valid design?
I would call it a pre-post quasi-experiment, albeit where you are assessing the intervention in two different pre-existing groups.
In terms of assessing the effect of the intervention, there are more threats to causal inference, when you don't have a control group (i.e., one where you have pre and post measures but with no intervention or a control intervention). I.e., In addition to the effect of the treatment, there are many other explanations for any changes observed pre-post. For example, learning, maturation, fatigue, and so on. Background knowledge may be able to assist you in appraising which if any of these are likely to be significant.
You then have two observed groups. This aspect is more like an observational study. You are assessing the interaction between group and the treatment. Because group is an observational variable, you would want to take care in attributing causal explanations to any differences between groups in treatment effect to group membership. Nonetheless, often in this context, the interest is more about understanding how well a treatment generalises to different populations.
So in summary, It would be better if there was a control group. Ideally, you would have a fully crossed design. I.e., group (A and B) by control (treatment and control) where participants were randomly allocated to treatment or control. But if that's not possible, the data would still be interesting; you'd need to think carefully about carry-over effects.
What is a Counterbalanced Measures Design?
The simplest type of counterbalanced measures design is used when there are two possible conditions, A and B. As with the standard repeated measures design, the researchers want to test every subject for both conditions. They divide the subjects into two groups and one group is treated with condition A, followed by condition B, and the other is tested with condition B followed by condition A.
If you have three conditions, the process is exactly the same and you would divide the subjects into 6 groups, treated as orders ABC, ACB, BAC, BCA, CAB and CBA.
The problem with complete counterbalancing is that for complex experiments, with multiple conditions, the permutations quickly multiply and the research project becomes extremely unwieldy. For example, four possible conditions requires 24 orders of treatment (4x3x2x1), and the number of participants must be a multiple of 24, due to the fact that you need an equal number in each group.
More Than Four Conditions
With 5 conditions you need multiples of 120 (5x4x3x2x1), with 7 you need 5040! Therefore, for all but the largest research projects with huge budgets, this is impractical and a compromise is needed.
Experimental Group Definition
An experimental group in a scientific experiment is the group on which the experimental procedure is performed. The independent variable is changed for the group and the response or change in the dependent variable is recorded. In contrast, the group that does not receive the treatment or in which the independent variable is held constant is called the control group.
The purpose of having experimental and control groups is to have sufficient data to be reasonably sure the relationship between the independent and dependent variable is not due to chance. If you perform an experiment on only one subject (with and without treatment) or on one experimental subject and one control subject you have limited confidence in the outcome. The larger the sample size, the more probable the results represent a real correlation.
Matched pairs analysis/matched-pair t-test
Matched-pair t test is used to test if there is a difference in mean between two matched/related pairs. The matched –pair t test is also referred to as the paired samples t-test of the dependent t-test. The test is a parametric test whose assumptions are
- The dependent variable must be continuous
- Observations should be independent of each other
- Dependent variable should be approximately normal.
- There should be no outliers in the dependent variable.
The matched-pair t test statistic is calculated as below
The calculated value is compared to the tabulated value with n-1 degrees of freedom.
Repeated Measures ANOVA Example
Let&rsquos imagine that we used a repeated measures design to study our hypothetical memory drug. For our study, we recruited five people, and we tested four memory drugs. Everyone in the study tried all four drugs and took a memory test after each one. We obtain the data below. You can also download the CSV file for the Repeated_measures_data.
In the dataset, you can see that each subject has an ID number so we can associate each person with all of their scores. We also know which drug they took for each score. Together, this allows the model to develop a baseline for each subject and then compare the drug specific scores to that baseline.
How do we fit this model? In your preferred statistical software package, you need to fit an ANOVA model like this:
- Score is the response variable.
- Subject and Drug are the factors,
- Subject should be a random factor.
Subject is a random factor because we randomly selected the subjects from the population and we want them to represent the entire population. If we were to include Subject as a fixed factor, the results would apply only to these five people and would not be generalizable to the larger population.
Drug is a fixed factor because we picked these drugs intentionally and we want to estimate the effects of these four drugs particularly.
Experimental design with two groups, pre-post measurement, and where everyone is treated - Psychology
Single-subject experimental designs – also referred to as within-subject or single case experimental designs – are among the most prevalent designs used in CSD treatment research. These designs provide a framework for a quantitative, scientifically rigorous approach where each participant provides his or her own experimental control.
An Overview of Single-Subject Experimental Design
What is Single-Subject Design?
Transcript of the video Q&A with Julie Wambaugh.
The essence of single-subject design is using repeated measurements to really understand an individual’s variability, so that we can use our understanding of that variability to determine what the effects of our treatment are.
For me, one of the first steps in developing a treatment is understanding what an individual does. So, if I were doing a group treatment study, I would not necessarily be able to see or to understand what was happening with each individual patient, so that I could make modifications to my treatment and understand all the details of what’s happening in terms of the effects of my treatment. For me it’s a natural first step in the progression of developing a treatment.
Also with the disorders that we deal with, it’s very hard to get the number of participants that we would need for the gold standard randomized controlled trial. Using single-subject designs works around the possible limiting factor of not having enough subjects in a particular area of study.
My mentor was Dr. Cynthia Thompson, who was trained by Leija McReynolds from the University of Kansas, which was where a lot of single-subject design in our field originated, and so I was fortunate to be on the cutting edge of this being implemented in our science back in the late s early s. We saw, I think, a nice revolution in terms of attention to these types of designs, giving credit to the type of data that could be obtained from these types of designs, and a flourishing of these designs really through the 1980s into the 1990s and into the 2000s. But I think — I’ve talked with other single-subject design investigators, and now we’re seeing maybe a little bit of a lapse of attention, and a lack of training again among our young folks. Maybe people assume that people understand the foundation, but they really don’t. And more problems are occurring with the science. I think we need to re-establish the foundations in our young scientists. And this project, I think, will be a big plus toward moving us in that direction.
What is the Role of Single-Subject Design?
Transcript of the video Q&A with Ralf Schlosser.
So what has happened recently, is with the onset of evidence-based practice and the adoption of the common hierarchy of evidence in terms of designs. As you noted the randomized controlled trial and meta-analyses of randomized controlled trials are on top of common hierarchies. And that’s fine. But it doesn’t mean that single-subject cannot play a role.
For example, single-subject design can be implemented prior to implementing a randomized controlled trial to get a better handle on the magnitude of the effects, the workings of the active ingredients, and all of that. It is very good to prepare that prior to developing a randomized controlled trial.
After you have implemented the randomized controlled trial, and then you want to implement the intervention in a more naturalistic setting, it becomes very difficult to do that in a randomized form or at the group level. So again, single-subject design lends itself to more practice-oriented implementation.
So I see it as a crucial methodology among several. What we can do to promote what single-subject design is good for is to speak up. It is important that it is being recognized for what it can do and what it cannot do.
Basic Features and Components of Single-Subject Experimental Designs
Single-subject designs are defined by the following features:
- An individual “case” is the unit of intervention and unit of data analysis.
- The case provides its own control for purposes of comparison. For example, the case’s series of outcome variables are measured prior to the intervention and compared with measurements taken during (and after) the intervention.
- The outcome variable is measured repeatedly within and across different conditions or levels of the independent variable.
See Kratochwill, et al. (2010)
Structure and Phases of the Design
Single-subject designs are typically described according to the arrangement of baseline and treatment phases.
The conditions in a single-subject experimental study are often assigned letters such as the A phase and the B phase, with A being the baseline, or no-treatment phase, and B the experimental, or treatment phase. (Other letters are sometimes used to designate other experimental phases.)
Generally, the A phase serves as a time period in which the behavior or behaviors of interest are counted or scored prior to introducing treatment.
In the B phase, the same behavior of the individual is counted over time under experimental conditions while treatment is administered.
Decisions regarding the effect of treatment are then made by comparing an individual’s performance during the treatment, B phase, and the no-treatment.
McReynolds and Thompson (1986)
Important primary components of a single-subject study include the following:
- The participant is the unit of analysis, where a participant may be an individual or a unit such as a class or school.
- Participant and setting descriptions are provided with sufficient detail to allow another researcher to recruit similar participants in similar settings.
- Dependent variables are (a) operationally defined and (b) measured repeatedly.
- An independent variable is actively manipulated, with the fidelity of implementation documented.
- A baseline condition demonstrates a predictable pattern which can be compared with the intervention condition(s).
- Experimental control is achieved through introduction and withdrawal/reversal, staggered introduction, or iterative manipulation of the independent variable.
- Visual analysis is used to interpret the level, trend, and variability of the data within and across phases.
- External validity of results is accomplished through replication of the effects.
- Social validity is established by documenting that interventions are functionally related to change in socially important outcomes.
Single-Subject Experimental Designs versus Case Studies
Transcript of the video Q&A with Julie Wambaugh.
One of the biggest mistakes, that is a huge problem, is misunderstanding that a case study is not a single-subject experimental design. There are controls that need to be implemented, and a case study does not equate to a single-subject experimental design.
People misunderstand or they misinterpret the term “multiple baseline” to mean that because you are measuring multiple things, that that gives you the experimental control. You have to be demonstrating, instead, that you’ve measured multiple behaviors and that you’ve replicated your treatment effect across those multiple behaviors. So, one instance of one treatment being implemented with one behavior is not sufficient, even if you’ve measured other things. That’s a very common mistake that I see.
There’s a design — an ABA design — that’s a very strong experimental design where you measure the behavior, you implement treatment, and you then to get experimental control need to see that treatment go back down to baseline, for you to have evidence of experimental control. It’s a hard behavior to implement in our field because we want our behaviors to stay up! We don’t want to see them return back to baseline.
Oftentimes people will say they did an ABA. But really, in effect, all they did was an AB. They measured, they implemented treatment, and the behavior changed because the treatment was successful. That does not give you experimental control. They think they did an experimentally sound design, but because the behavior didn’t do what the design requires to get experimental control, they really don’t have experimental control with their design.
Single-subject studies should not be confused with case studies or other non-experimental designs.
In case study reports, procedures used in treatment of a particular client’s behavior are documented as carefully as possible, and the client’s progress toward habilitation or rehabilitation is reported. These investigations provide useful descriptions. . . .However, a demonstration of treatment effectiveness requires an experimental study.
A better role for case studies is description and identification of potential variables to be evaluated in experimental studies. An excellent discussion of this issue can be found in the exchange of letters to the editor by Hoodin (1986) [Article] and Rubow and Swift (1986) [Article].
McReynolds and Thompson (1986)
Other Single-Subject Myths
- Obviously, it requires only one subject, one participant. But that’s a misnomer to think that single-subject is just about one participant. You can have as many as twenty or thirty.
- I think a lot of students in the clinic are used to the measurement of one pre-test and one post-test because of the way the goals are written, and maybe there’s not enough time to collect continuous data.But single-case experimental designs require ongoing data collection. There’s this misperception that one baseline data point is enough. But for single-case experimental design you want to see at least three data points, because it allows you to see a trend in the data. So there’s a myth about the number of data points needed. The more data points we have, the better.
- Single-subject design has its own tradition of methodology. It seems very easy to do when you read up on one design. But there are lots of things to consider, and lots of things can go wrong.It requires quite a bit of training. It takes at least one three-credit course that you take over the whole semester.
Further Reading: Components of Single-Subject Designs
Horner, R. H. , Carr, E. G. , Halle, J. , McGee, G. , Odom, S. , & Wolery, M. (2005). The use of single subject research to identify evidence-based practice in special education. Exceptional Children, 71,165–179. [Article]
Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M. & Shadish, W. R. (2010). Single-case designs technical documentation. From the What Works Clearinghouse. http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=229
McReynolds, L. V. & Thompson, C. K. (1986). Flexibility of single-subject experimental designs. Part I: review of the basics of single-subject designs. Journal of Speech and Hearing Disorders, 51, 194-203. [Article] [PubMed]
Further Reading: Single-Subject Design Textbooks
Kazdin, A. E. (2011). Single-case research designs: Methods for clinical and applied settings. Oxford University Press.
McReynolds, L. V. & Kearns, K. (1983). Single-subject experimental designs in communicative disorders. Baltimore: University Park Press.
Further Reading: Foundational Articles
Baer, D. M., Wolf, M. M. & Risley, T. R. (1968). Some current dimensions of applied behavior analysis. Journal of applied behavior analysis, 1, 91-97. [Article] [PubMed]
Baer, D. M., Wolf, M. M. & Risley, T. R. (1987). Some still-current dimensions of applied behavior analysis. Journal of Applied Behavior Analysis, 20, 313-327. [Article] [PubMed]
University of Utah
The content of this page is based on selected clips from video interviews conducted at the ASHA National Office.
Additional digested resources and references for further reading were selected and implemented by CREd Library staff.
Selecting and Assigning Experimental Participants
Now that our study is designed, we need to obtain a sample of individuals to include in our experiment. Our study involves human participants so we need to determine who to include. Participants are the subjects of psychological research, and as the name implies, individuals who are involved in psychological research actively participate in the process. Often, psychological research projects rely on college students to serve as participants. In fact, the vast majority of research in psychology subfields has historically involved students as research participants (Sears, 1986 Arnett, 2008). But are college students truly representative of the general population? College students tend to be younger, more educated, more liberal, and less diverse than the general population. Although using students as test subjects is an accepted practice, relying on such a limited pool of research participants can be problematic because it is difficult to generalize findings to the larger population.
Our hypothetical experiment involves children, and we must first generate a sample of child participants. Samples are used because populations are usually too large to reasonably involve every member in our particular experiment (Figure 4). If possible, we should use a random sample (there are other types of samples, but for the purposes of this section, we will focus on random samples). A random sample is a subset of a larger population in which every member of the population has an equal chance of being selected. Random samples are preferred because if the sample is large enough we can be reasonably sure that the participating individuals are representative of the larger population. This means that the percentages of characteristics in the sample—sex, ethnicity, socioeconomic level, and any other characteristics that might affect the results—are close to those percentages in the larger population.
In our example, let’s say we decide our population of interest is fourth graders. But all fourth graders is a very large population, so we need to be more specific instead we might say our population of interest is all fourth graders in a particular city. We should include students from various income brackets, family situations, races, ethnicities, religions, and geographic areas of town. With this more manageable population, we can work with the local schools in selecting a random sample of around 200 fourth graders who we want to participate in our experiment.
In summary, because we cannot test all of the fourth graders in a city, we want to find a group of about 200 that reflects the composition of that city. With a representative group, we can generalize our findings to the larger population without fear of our sample being biased in some way.
Figure 4. Researchers may work with (a) a large population or (b) a sample group that is a subset of the larger population. (credit “crowd”: modification of work by James Cridland credit “students”: modification of work by Laurie Sullivan)
Now that we have a sample, the next step of the experimental process is to split the participants into experimental and control groups through random assignment. With random assignment, all participants have an equal chance of being assigned to either group. There is statistical software that will randomly assign each of the fourth graders in the sample to either the experimental or the control group.
Random assignment is critical for sound experimental design. With sufficiently large samples, random assignment makes it unlikely that there are systematic differences between the groups. So, for instance, it would be very unlikely that we would get one group composed entirely of males, a given ethnic identity, or a given religious ideology. This is important because if the groups were systematically different before the experiment began, we would not know the origin of any differences we find between the groups: Were the differences preexisting, or were they caused by manipulation of the independent variable? Random assignment allows us to assume that any differences observed between experimental and control groups result from the manipulation of the independent variable.
The Difference Between Control Group and Experimental Group
In an experiment, data from an experimental group is compared with data from a control group. These two groups should be identical in every respect except one: the difference between a control group and an experimental group is that the independent variable is changed for the experimental group, but is held constant in the control group.
Key Takeaways: Control vs. Experimental Group
- The control group and experimental group are compared against each other in an experiment. The only difference between the two groups is that the independent variable is changed in the experimental group. The independent variable is "controlled" or held constant in the control group.
- A single experiment may include multiple experimental groups, which may all be compared against the control group.
- The purpose of having a control is to rule out other factors which may influence the results of an experiment. Not all experiments include a control group, but those that do are called "controlled experiments."
- A placebo may also be used in an experiment. A placebo isn't a substitute for a control group because subjects exposed to a placebo may experience effects from the belief they are being tested.
Intervention studies are considered to provide the most reliable evidence in epidemiological research. Intervention studies can generally be considered as either preventative or therapeutic .
Therapeutic trials are conducted among individuals with a particular disease to assess the effectiveness of an agent or procedure to diminish symptoms, prevent recurrence, or reduce mortality from the disease.
Preventative trials are conducted to evaluate whether an agent or procedure reduces the risk of developing a particular disease among individuals free from that disease at the beginning of the trial, for example, vaccine trials. Preventative trials may be conducted among individuals or among entire communities
Types of experimental interventions may include:
- - Therapeutic agents
- Prophylactic agents
- Diagnostic agents
- Surgical procedures
- Health service strategies
Characteristics of an intervention study
* A distinguishing characteristic of an intervention study is that the intervention (the preventative or therapeutic measure) being tested is allocated by the investigator to a group of two or more study subjects (individuals, households, communities).
* Subjects are followed prospectively to compare the intervention vs. the control (standard treatment, no treatment or placebo).
The main intervention study design is the randomised controlled trial (RCT).
A pre-post clinical trial/cross-over trial is one in which the subjects are first assigned to the treatment group and, after a brief interval for cessation of residual effect of the drug, are shifted into the placebo /alternative group. Thus, the subjects act as their own control at the end of the study. However, such studies are not feasible if there is mortality, or if the disease is easily cured by one of the interventions.
Randomised controlled trials
The randomised controlled trial is considered as the most rigorous method of determining whether a cause-effect relationship exists between an intervention and outcome . The strength of the RCT lies in the process of randomisation that is unique to this type of epidemiological study design.
Generally, in a randomised controlled trial, study participants are randomly assigned to one of two groups: the experimental group receiving the intervention that is being tested and a comparison group (controls) which receives a conventional treatment or placebo. These groups are then followed prospectively to assess the effectiveness of the intervention compared with the standard or placebo treatment.
The random allocation of subjects is used to ensure that the intervention and control groups are similar in all respects (distribution of potential confounding factors) with the exception of the therapeutic or preventative measure being tested. The choice of comparison treatments may include an existing standard treatment or a placebo (a treatment which resembles the intervention treatment in all respects except that it contains no active ingredients).
Note that ethical constraints limit the choice of comparison treatments.
Figure 1. General outline of a two armed randomised controlled trial.
Basic outline of the design of a randomised controlled trial
1. Development of a comprehensive study protocol. The study protocol will include:
- - Aim and rationale of the trial
- Proposed methodology/data collection
- Definition of the hypothesis
- Ethical considerations
- Background/review of published literature
- Quality assurance and safety
- Treatment schedules, dosage, toxicity data etc.
2. Formulation of hypothesis.
3. Objectives of the trial.
4. Sample size calculations.
5. Define reference population.
6. Choice of a comparison treatment - placebo or current available best treatment.
7. Selection of intervention and control groups, including source, inclusion and exclusion criteria, and methods of recruitment.
8. Informed consent procedures.
9. Collection of baseline measurements, including all variables considered or known to affect the outcome(s) of interest.
10. Random allocation of study participants to treatment groups (standard or placebo vs. new).
11. Follow-up of all treatment groups, with assessment of outcomes continuously or intermittently.
12. Monitor compliance and losses to follow-up.
14. Analysis - comparison of treatment groups.
15. Interpretation (assess the strength of effect, alternative explanations such as sampling variation, bias).
The aim of randomisation is to ensure that any observed differences between the treatment groups are due to differences in the treatment alone and not due to the effects of confounding (known or unknown) or bias. That is, that the groups are similar in all respects with the exception of the intervention under investigation.
Methods of random allocation are used to ensure that all study participants have the same chance of allocation to the treatment or control group, and that the likelihood of receiving an intervention is equal regardless of when the participant entered the study. Therefore, the probability of any participant receiving the intervention or the standard treatment/placebo is independent of any other participant being assigned that treatment.
The assignment of study subjects to each intervention is determined by formal chance process and cannot be predicted or influenced by the investigator or participant. In a well designed RCT, random allocation is determined in advance.
Methods of randomisation - allocation of subjects to intervention and control groups
1. Simple randomisation
For example, computer generated random number tables. Simple randomisation is rarely used.
2. Block randomisation
Block randomisation is a method used to ensure that the numbers of participants assigned to each group is equally distributed and is commonly used in smaller trials.
3. Stratified randomisation
Stratified randomisation is used to ensure that important baseline variables (potential confounding factors) are more evenly distributed between groups than chance alone may assure. However, there are a limited number of baseline variables that can be balanced by stratification because of the potential for small numbers of subjects within each stratum.
4. Minimized randomisation
This method may be used when the study is sufficiently small and simple randomisation will not result in balanced groups.
Note that deterministic methods of allocation such as by date of birth or alternate assignment to each group are not considered as random.
Advantages of randomisation
- Eliminates confounding - tends to create groups that are comparable for all factors that influence outcome, known, unknown or difficult to measure. Therefore, the only difference between the groups should be the intervention.
- Eliminates selection bias.
- Gives validity in statistical tests based on probability theory.
- Any baseline differences that exist between study groups are attributable to chance rather than bias. Though this should still be considered as a potential concern.
Disadvantages of randomisation
Does not guarantee comparable groups as differences in confounding variables may arise by chance.
Blinding in randomised controlled trials
Blinding is a process where the critical information on allocation of treatment is hidden either from the patients, or from observer or the evaluator in the study. The method of blinding in RCT is used to ensure that there are no differences in the way in which each group is assessed or managed, and therefore minimize bias. Bias may be introduced, for example, if the investigator is aware of which treatment a subject is receiving, as this may influence (intentionally or unintentionally) the way in which outcome data is measured or interpreted. Similarly, a subject's knowledge of treatment assignment may influence their response to a specific treatment.
Blinding also involves ensuring that the intervention and standard or placebo treatment appears the same.
Double blinding is when neither the investigator nor the study participant is aware of treatment assignments. However, this design is not always possible.
A single blind RCT is when the investigator but not the study participants know which treatment has been allocated.
Strengths of a randomised controlled trial
- A well designed randomised control trial provides the strongest evidence of any epidemiological study design that a given intervention has a postulated effectiveness and is safe.
- A RCT provides the best type of epidemiological study from which to draw conclusions on causality.
- Randomisation provides a powerful tool for controlling for confounding, even by factors that may be unknown or difficult to measure. Therefore, if well designed and conducted, a RCT minimizes the possibility that any observed association is due to confounding.
- Clear temporal sequence - exposure clearly precedes outcome.
- Provides a strong basis for statistical inference.
- Enables blinding and therefore minimizes bias.
- Can measure disease incidence and multiple outcomes.
Weaknesses of a randomised controlled trial
- Ethical constraints - for example, it is not always possible or ethical to manipulate exposure at random.
- Expensive and time consuming.
- Requires complex design and analysis if unit of allocation is not the individual.
- Inefficient for rare diseases or diseases with a delayed outcome.
- Generalisability - subjects in a RCT may be more willing to comply with the treatment regimen and therefore may not be representative of all individuals who might be given the treatment.
1. Hennekens CH, Buring JE. Epidemiology in Medicine, Lippincott Williams & Wilkins, 1987.
2. Kendall JM. Designing a research project: Randomised Controlled trials and their principles, Emerg Med J. 2003, March20(2)164-168.
3. Hollis S, Campbell F, What is meant by intention to treat analysis? Survey of published randomised controlled trials. BMJ 1999 319670-74.
1. Pocock SJ. Clinical Trials: A practical approach, Chichester, Wiley, 1984.
2. Sibbald B, Roland M, Understanding controlled trials: Why are randomised controlled trials important?, BMJ 1998, 316:201.
3. Altman DG, Randomisation. BMJ 19913021481-2.
Pretest-Posttest Nonequivalent Groups Design
Another way to improve upon the posttest only nonequivalent groups design is to add a pretest. In the pretest-posttest nonequivalent groups design t here is a treatment group that is given a pretest, receives a treatment, and then is given a posttest. But at the same time there is a nonequivalent control group that is given a pretest, does not receive the treatment, and then is given a posttest. The question, then, is not simply whether participants who receive the treatment improve, but whether they improve more than participants who do not receive the treatment.
Imagine, for example, that students in one school are given a pretest on their attitudes toward drugs, then are exposed to an anti-drug program, and finally, are given a posttest. Students in a similar school are given the pretest, not exposed to an anti-drug program, and finally, are given a posttest. Again, if students in the treatment condition become more negative toward drugs, this change in attitude could be an effect of the treatment, but it could also be a matter of history or maturation. If it really is an effect of the treatment, then students in the treatment condition should become more negative than students in the control condition. But if it is a matter of history (e.g., news of a celebrity drug overdose) or maturation (e.g., improved reasoning), then students in the two conditions would be likely to show similar amounts of change. This type of design does not completely eliminate the possibility of confounding variables, however. Something could occur at one of the schools but not the other (e.g., a student drug overdose), so students at the first school would be affected by it while students at the other school would not.
Returning to the example of evaluating a new measure of teaching third graders, this study could be improved by adding a pretest of students’ knowledge of fractions. The changes in scores from pretest to posttest would then be evaluated and compared across conditions to determine whether one group demonstrated a bigger improvement in knowledge of fractions than another. Of course, the teachers’ styles, and even the classroom environments might still be very different and might cause different levels of achievement or motivation among the students that are independent of the teaching intervention. Once again, differential history also represents a potential threat to internal validity. If asbestos is found in one of the schools causing it to be shut down for a month then this interruption in teaching could produce a difference across groups on posttest scores.
If participants in this kind of design are randomly assigned to conditions, it becomes a true between-groups experiment rather than a quasi-experiment. In fact, it is the kind of experiment that Eysenck called for—and that has now been conducted many times—to demonstrate the effectiveness of psychotherapy.
Evaluation Approaches: Observed Variables vs. Latent Variables
Broadly speaking, approaches to intervention evaluation can be distinguished into two categories: (1) approaches using observed variables and (2) approaches using latent variables. The first category includes widely used parametric tests such as Student's t, repeated measures analysis of variance (RM-ANOVA), analysis of covariance (ANCOVA), and ordinary least-squares regression (see Tabachnick and Fidell, 2013). However, despite their broad use, observed variable approaches suffer from several limitations, many of them ingenerated by the strong underlying statistical assumptions that must be satisfied. A first series of assumption underlying classic parametric tests is that the data being analyzed are normally distributed and have equal population variances (also called homogeneity of variance or homoscedasticity assumption). Normality assumption is not always met in real data, especially when the variables targeted by the treatment program are infrequent behaviors (i.e., externalizing conducts) or clinical syndromes (Micceri, 1989). Likewise, homoschedasticy assumption is rarely met in randomized control trial as a result of the experimental variable causing differences in variability between groups (Grissom and Kim, 2012). Violation of normality and homoscedasticity assumptions can compromise the results of classic parametric tests, in particular on rates of Type-I (Tabachnick and Fidell, 2013) and Type-II error (Wilcox, 1998). Furthermore, the inability to deal with measurement error can also lower the accuracy of inferences based on regression and ANOVA-family techniques which assume that the variables are measured without errors. However, the presence of some degree of measurement error is a common situation in psychological research where the focus is often on not directly observable constructs such as depression, self-esteem, or intelligence. Finally, observed variable approaches assume (without testing it) that the measurement structure of the construct under investigation is invariant across groups and/or time (Meredith and Teresi, 2006 Millsap, 2011). Thus, lack of satisfied statistical assumptions and/or uncontrolled unreliability can lead to the under or overestimation of the true relations among the constructs analyzed (for a detailed discussion of these issues, see Cole and Preacher, 2014).
On the other side, latent variable approaches refer to the class of techniques termed under the label structural equation modeling (SEM Bollen, 1989) such as confirmatory factor analysis (CFA Brown, 2015) and mean and covariance structures analysis (MACS Little, 1997). Although a complete overview of the benefits of SEM is beyond the scope of the present work (for a thorough discussion, see Little, 2013 Kline, 2016), it is worthwhile mentioning here those advantages that directly relate to the evaluation of intervention programs. First, SEM can easily accommodate the lack of normality in the data. Indeed, several estimation methods with standard errors robust to non-normal data are available and easy-to-use in many popular statistical programs (e.g., MLM, MLR, WLSMV, etc. in Mplus Muthén and Muthén, 1998). Second, SEM explicitly accounts for measurement error by separating the common variance among the indicators of a given construct (i.e., the latent variable) from their residual variances (which include both measurement error and unique sources of variability). Third, if multiple items from a scale are used to assess a construct, SEM allows the researcher to evaluate to what extent the measurement structure (i.e., factor loadings, item intercepts, residual variances, etc.) of such scale is equivalent across groups (e.g., intervention group vs. control group) and/or over time (i.e., pretest and posttest) this issue is known as measurement invariance (MI) and, despite its crucial importance for properly interpreting psychological findings, is rarely tested in psychological research (for an overview see Millsap, 2011 Brown, 2015). Finally, different competitive SEMs can be evaluated and compared according to their goodness of fit (Kline, 2016). Many SEM programs, indeed, print in their output a series of fit indexes that help the researcher assess whether the hypothesized model is consistent with the data or not. In sum, when multiple indicators of the constructs of interest are available (e.g., multiple items from one scale, different informants, multiple methods, etc.), latent variables approaches offer many advantages and, therefore, they should be preferred over manifest variable approaches (Little et al., 2009). Moreover, when a construct is measured using a single psychometric measure, there are still ways to incorporate the individuals' scores in the analyses as latent variables, and thus reduce the impact of measurement unreliability (Cole and Preacher, 2014).