Epidemiology for the Defense
Epidemiology for the Defense
By Bill Masters
Wallace, Klor & Mann, P.C.
I. Introduction
1. Definition of Epidemiology
Epidemiology literally means, in ancient Greek, the study (“logos”) of (“epi”) the people (“demos”).1 It is, more formally, the study of states of health (or states of disease) in the population.2 For the defense, it is principally a method to assess whether or not an allegedly harmful exposure is associated with a deleterious effect or disease in order to prove or disprove causation.3
Epidemiology, as a method, essentially involves comparing attributes of groups of people–it is the art and science of comparison. It describes how to best acquire data for comparison, how to compare those data on variables of interest, and how to tease from that comparison conclusions about what variables co-vary.
But epidemiology, as both art and science, is a peculiar sort of discipline, much like economics, neither hard nor soft, but an unequal blend of both, grounded in the observations and demographic variables of social science but elevated slightly to the domain of quantitative science through use of sophisticated mathematics. As with any discipline that blends the quantitative with the primarily qualitative, epidemiology receives mixed praise about its predictive value. Some epidemiologists consider it of great value, a formidable tool in the fight against disease, helpful, for instance, in identifying the intermediate cause of cholera in the 19th century or Legionnaire’s disease in this century. Others consider it of little value, a science akin to meteorology, whose reliance on complex observations produces poor predictions.4 Even so, epidemiology is now judicially recognized as critical in proving or disproving general causation in toxic tort and other scientifically imbued litigation.5
2. Basic Ideas of Epidemiology
Epidemiology is primarily the art and science of observation, not of experiment. Experiment involves the active manipulation of certain variables to determine their specific effect. In clinical medicine, the classic kind of experiment is the randomized, controlled trial. That kind of experiment may determine, for instance, the effect of a proposed new drug or device. But, unlike experiment, observation involves comparing groups of people on some key variables, through passive observation, to determine whether or not these variables are correlated. For instance, is exposure to this toxin correlated or associated with this disease? As a rule, experiment is preferred, but at times observation is all that’s available. For example, most, if not all, studies of potentially harmful exposures in humans are observational. The reason is simple enough: ethically people should not be experimentally exposed to something potentially harmful.
Very broadly, then, an epidemiologic study involves (1) identifying a potentially harmful exposure to a group; (2) identifying whether or not a specific effect occurs in that group; (3) assessing whether or not the exposure is “associated” with the effect; and, given an association, (4) assessing whether or not the exposure “caused” the effect.
3. Study Types
In epidemiology there are three basic kinds of observational studies: (1) the cross-sectional study, (2) the cohort study and (3) the case-control study.6
The Cross-Sectional Study
In the cross-sectional study, the epidemiologist determines, simultaneously, for each subject the status for exposure and effect.7 For instance, if an epidemiologist wants to determine whether or not having silicone breast implants is associated with rheumatoid arthritis, she will survey the population of women between the ages of twenty and sixty to ascertain how many have both silicone breast implants and rheumatoid arthritis. This kind of epidemiologic study is popularly used to assess risks for diseases of slow onset and long duration for which medical care is not sought until the disease is advanced. But cross-sectional studies have two major limitations: (1) they cannot readily determine whether the exposure preceded the effect; and (2) a series of “prevalent” cases (not newly diagnosed cases) will have a higher proportion of cases with chronic disease than a series of “incident” cases (newly diagnosed cases) so that if subjects whose disease is of short duration have characteristics different from those with chronic disease, the result of the study will likely be faulty.
The Cohort Study
“In a cohort study, the epidemiologist selects a group of exposed individuals and a group of unexposed individuals and follows both groups to compare the incidence of disease in the two groups.”8 If, at the end of the period of follow up, the proportion of those exposed who have the effect is greater than the proportion of those unexposed who have the effect, the epidemiologist concludes that the exposure is positively associated with the effect. Cohort studies produce results that tend to be very reliable. Unfortunately, cohort studies are time consuming and expensive, often requiring many years of follow up and costing millions of dollars.
The Prospective Cohort Study
In a prospective cohort study, the epidemiologist measures exposure before manifestation of the effect. This kind of study usually requires many subjects and a lengthy follow up. This is especially true if the effect is rare. As a result, a prospective cohort study is unlikely to appear in the context of litigation, at least for several years. Even so, it is the gold standard for determining the effect of an exposure when the exposure is potentially harmful. Good examples of this kind of study are those by Stampfer et. al. Vitamin E Consumption and the Risk of Coronary Heart Disease in Women. NEJM, 328: 1444-1449 (1993) and by Fuchs, C.S. et. al. Dietary Fiber and the Risk of Colorectal Cancer and Adenoma in Women. NEJM, 340: 169-176 (1999), noting that the results of retrospective studies were inconclusive whether dietary fiber protects against colorectal cancer, but that the results of a large prospective cohort study (the Nurses’ Health Study) do not support a protective effect of dietary fiber.
The Retrospective Cohort Study
In a retrospective cohort study, the epidemiologist measures exposure after manifestation of the effect. In the context of litigation, most cohort studies are apt to be retrospective. Good examples of population based retrospective cohort studies are those by Gabriel et. al. Risk of Connective-Tissue Diseases and Other Disorders After Breast Implantation. NEJM, 330: 1697-1702 (1994) and by Hennekens et. al. Self-Reported Breast Implants and Connective-Tissue Diseases in Female Health Professionals. JAMA, 275: 616-621 (1996).
The Case-Control Study
In the case-control study, the epidemiologist identifies a group of people with the disease (“cases”) and a group of people without the disease (“controls”) and then determines what proportion of the cases were exposed and what proportion were unexposed.9 If the proportion of cases exposed is greater than the proportion of controls unexposed, then the epidemiologist concludes that the exposure is positively associated with the disease or effect. Most litigation over harmful exposures will involve case-control studies. For instance, case-control studies are the kind of epidemiologic study plaintiffs used to assert that toxic shock syndrome was positively associated with use of tampons; that eosinophilia myalgia syndrome was positively associated with one manufacturer’s L-tryptophan supplements; and that silicone breast implants were sometimes positively associated with a variety of ill-defined symptoms sometimes disingenuously called “silicone breast implant disease.” An example of a population-based case-control study is that by Burns et. al. The Epidemiology of Scleroderma among Women: Assessment of Risk from Exposure to Silicone and Silica. Journal of Rheumatology, 23:1904-1911 (1996).
Three designs are available for case-control studies. First, in the “traditional” case-control design, the control group is a sample of the population remaining at risk at the end of the risk period, the period over which incident cases are ascertained and enrolled for the study. Next, sometimes a case-control study will be constructed within a prospective cohort study. This is called a “nested” case-control study. In that kind of study, people in the cohort who develop the disorder become cases, and random sampling of exposed and unexposed people in the cohort free of disease become controls. Next, in the “case-cohort” design, the control group is a sample of the entire population-at-risk at the start of the risk period, excluding then existing cases at that time. Finally, in the “incidence-density” design, the controls are sampled longitudinally throughout the risk period. Some of the sampled controls might develop the disease after their selection and so the final control group might contain some incident cases. Such cases would be entered in both the case and control groups.
Case-control studies have several advantages over cohort studies. First, they are well suited to study rare effects or disorders. Second, they are also well suited to study disorders with long “induction” periods. That is, rather than waiting years for the prospective accrual of cases, the epidemiologist may compress time by using historical documents to evaluate earlier exposures. Third, case-control studies are much less expensive to conduct than cohort studies because they require fewer controls and no lengthy follow-up.
Unfortunately, case-control studies also have some nettlesome disadvantages. They are not well suited for detecting weak associations (odds ratio < 1.5). But this is not a worry for the defense. More importantly, they are also susceptible to a variety of systematic errors or biases.10 These errors cause the results of the study to be ambiguous and often unreliable. This is a very serious concern for the defense.
4. Natural History of Epidemiology in Litigation
Most often, epidemiology is encountered in litigation of “toxic tort” cases. “Toxic Tort cases” are those cases in which plaintiffs assert claims for damages for problems with their health owing to purported exposure to “immunogens” [molecules that generate an immune response], “antigens” [foreign substances that induce a “specific” immune response], “toxins” [a poisonous substance], “teratogens” [substances that result in the development of an abnormal embryo resulting in deformed fetus] or “carcinogens” [substances that increase the risk of development cancer], that is, any substance that can cause humans acute or chronic injury, either directly by altering or destroying cells or indirectly through stimulation of a harmful immune response. In the past, toxic tort cases have involved, for example, Thalidomide, DES {diethylstilbestrol), Agent Orange, IUDs (Dalkon Shield), asbestos, Bendectin, L-tryptophan, and the silicon in silicone breast implants.
Toxic tort cases fall into two basic categories: (1) those in which the toxin produces a “signature disease,” such as mesothelioma from asbestos; and (2) those in which a putative toxin allegedly produces a constellation of non-specific symptoms not considered to be a signature disease. Ordinarily this second type of toxic tort case arises in advance of proper scientific data either for or against plaintiffs’ claims. These cases are propelled along as a result of mass hysteria fueled by sensationalist journalism and nurtured by “uninformed” plaintiffs’ lawyers.
To better understand why, first consider what is occurring before anyone is exposed to the putative toxin or immunogen. At any given time, a portion of those who will eventually be exposed to the toxin or immunogen either will have some non-specific symptoms that mimic symptoms of a rheumatologic disorder and that develop quite apart from their eventual exposure to the putative toxin or immunogen, or a proportion of the people will develop—quite apart from their exposure to the putative toxin or immunogen —rheumatologic disorders.
Eventually someone who is exposed to the putative toxin or immunogen decides that she has symptoms associated with this exposure. That person sees her doctor who, after examining her, may publish a case report describing this “curious association” between the putative toxin or immunogen and these reported symptoms.
Other physicians read this case report and, if they have patients who have been exposed to the putative toxin or immunogen, will query them about their symptoms. “Do you have fatigue? Do you have muscle aches and pains?” “Mrs. Macbeth, are you remembering information as well as you used to?” “If not, maybe it’s due to your use of this drug.” Of course, consistent with human nature, some of these physicians will have fantasies of being immortalized in medical history as having discovered a new syndrome or disease: “Dim’s disease” or “Grub’s disease” or “Crock’s syndrome.” They too write up their case findings for publication. Soon the medical journals contain a number of “case reports” about this “curious association.”
Enter the news media, ever the public watchdog. The news media, of course, have eager, ambitious journalists assigned to read key medical journals for developing medical issues that might capture the public’s interest or fears—the more the theme of the story taps into populist prejudices, the more newsworthy the story. Unfortunately, those who present the news, while having a keen nose for ratings, have no appreciation for scientific method: “Case reports or randomized controlled double-blinded clinical trials, what’s the difference?”
So typically the news media publishes an article or airs a nationally televised news program, highlighting the “curious association” between the putative toxin or immunogen and these symptoms. As would be expected, some readers or viewers will have been exposed to the putative toxin or immunogen, and a subset of these people will have either symptoms of a rheumatologic disorder or symptoms that mimic a rheumatologic disorder. Couple that fact with the phenomenon known as “effort after meaning” – when someone develops symptoms, look for an explanation for them – and the news media has served them with a nice explanation for their woes.
Enter the plaintiffs’ lawyer, ever ready, ever eager to assist the “downtrodden.” These plaintiffs’ lawyers begin advertising and begin receiving referrals from attorneys who lack the resources or expertise to prosecute these cases. These plaintiffs’ lawyers are usually very rich because cases like these are expensive to work up and try.
These wealthy plaintiffs’ lawyers, recognizing the merits of certain economies of scale, organize into national steering committees to investigate and formulate the various issues that will constitute a generic case against the manufacturer and suppliers of the putative toxin or immunogen. These committees assign teams of lawyers to cover issues of liability and issues of general causation. When these teams of lawyers finish, they have a script with modest variations to apply to whatever plaintiff is ready, willing and able to go to trial.
Soon plaintiffs are encouraged by their lawyers to attend support groups, purportedly created to comfort plaintiffs, these alleged victims, in their time of need. But, more realistically, these groups are congregations to mammonism, disingenuously designed to capitalize on plaintiff’s suggestibility and need to belong by suggesting to them that if they want to receive what they need, they need to have symptoms S1, S2, S3, . . . Sn. As Montaigne remarked,
“A woman, thinking she had swallowed a pin with her bread, was screaming in agony as though she had an unbearable pain in her throat, where she thought she felt stuck; but because externally there was neither swelling nor alteration, a smart man, judging that it was only a fancy and notion derived from some bit of bread that had scratched her as it went down, made her vomit, and, on the sly, tossed a crooked pin into what she threw up. The woman, thinking she had thrown it up, felt herself suddenly relieved of her pain.”
These support groups are usually headed by someone who excels at fanning the flames of the “victim’s” prejudices toward the putative corporate victimizer who manufactured the putative toxin or immunogen. This person helps perpetuate the mass hysteria triggered by the news media and manipulated by plaintiffs’ lawyers.
At these group meetings, plaintiffs are often re-routed from their treating physician to the expert witnesses plaintiffs’ attorneys have retained. These expert witnesses may be highly trained clinicians and experimenticians, but typically most are no more than professional witnesses.
At this point, these experts begin preparing and publishing reports of “case series,” discussing the signs and symptoms developing in their cohort of “patients” and relating those signs and symptoms to the alleged toxic or immunogenic exposure so prominently discussed in the media.
These reports of case series are not epidemiologic or comparison studies because they lack adequate control groups. So at this point, various epidemiologists, seeing a need for epidemiology, become interested and begin designing and conducting small case-control studies. The results of these case-control studies begin appearing in a variety of second-tier medical journals and then one or two appear in first-tier journals such as the Journal of American Medical Association and the New England Journal of Medicine.
The results of these case-control studies tend to be equivocal, with studies demonstrating a positive association being criticized for harboring various biases, most notably selection bias; and studies demonstrating no association being criticized for low “power,” a condition in which too few people are studied to rule out false negative results.
Eventually, if enough interest is generated in the scientific or medical community, a large existing cohort study designed to consider a different set of risks is tapped for a nested case-control study. The results of that study, when published several years later, are criticized but generally considered to have advanced the argument about causation.
5. Defense Goals
Defending toxic tort cases is a lot like trying to take a bone from a dog: it’s never as easy as it first seems, and it’s always dangerous. Not the least of these dangers is fashioning and presenting a defense on causation. In this effort, epidemiologic studies are, at the very least, useful and, at the very most, essential. They can be used as a shield by the defense in an effort to disprove general causation. But sometimes plaintiff will proffer epidemiologic studies as a sword to prove general causation. Then the defense will need to apply its full analytic powers in scrutinizing these studies to unseam them, either to keep them from the jury or to discount their persuasiveness to the jury.
- Unpersuasive Epidemiologic Evidence
Most often, when both plaintiffs and defendants use epidemiologic evidence, an issue arises about the weight to give the respective opinions based on opposing epidemiologic studies, an issue of fact to be resolved by the jury. But, for the defense, undermining an epidemiologic study at trial is a daunting task because an adverse epidemiologic study is difficult to critique negatively to a jury. First, that critique requires explaining a series of complicated predicates, such as the subtle concepts of sampling error, systematic bias and confounding. Second, this burden of explanation is very heavy if the jury is constituted of those with a bias against corporate defendants. (This is why it is often said, wisely, that the best defense in toxic tort litigation is having a favorable venue.) So for the defense it’s best to keep adverse epidemiologic evidence from the jury.
- Legally Insufficient Epidemiologic Evidence
Sometimes, once proffered epidemiologic evidence is admitted into evidence, the court will then later rule, as a matter of law, that plaintiff’s epidemiologic evidence is insufficient to create an issue of fact on general causation for the jury to resolve. The court will either, at best, direct a verdict for defendant or, at worst, provide a limiting instruction to the jury to disregard the epidemiologic evidence in its deliberations on the issue of general causation.
- Inadmissible Epidemiologic Evidence
Occasionally, however, experts will proffer opinions either based on epidemiologic studies lacking sound methodologies or based on inferences so far removed from the epidemiologic evidence to insult credulity. Then the court will have reason to exclude that proffered evidence as inadmissible under rules of evidence, such as FRE 702. That proffered evidence can be excluded, depending on trial tactics, before trial or during trial.
- Does the Expert Have the Necessary Expertise?
FRE 702 requires that the proffered expert be “qualified” as an expert by “knowledge, skill, experience, training, or education….” The defense will want to assure itself that plaintiff’s proffered expert has the requisite epidemiologic expertise to provide an opinion based on methodologically sound data.
- Does the Expert’s Avowed Expertise Fit the Opinion?
Not only must the proffered expert have the requisite expertise but that expertise must fit or be relevant to the proffered opinion. For instance, although the expert is a qualified epidemiologists, is she qualified, for instance, to testify about medical causation? Virtually always, the defense should argue she is not so qualified.
- Is The Proffered Evidence “Knowledge?”
FRE 702 requires that proffered opinion of the expert, before being admissible, be “knowledge.” “Knowledge” is defined negatively as more than subjective belief or unsupported speculation, and positively as any body of known facts or truths accepted as such on good grounds. If the proffered evidence is “knowledge,” and if it will assist the trier of fact, a qualified expert may testify about “scientific, technical or other specialized knowledge.”
- Is The Proffered Evidence “Scientific” Knowledge?
A belief, within that set of beliefs characterized as knowledge, may also fall within that subcategory of beliefs characterized as “scientific” knowledge. Scientific knowledge is defined as belief derived by the scientific method, that kind of method based on generating hypotheses and testing them to see if they can be falsified. Expert opinions based on scientific knowledge should be evaluated by the court, to assess their admissibility, in light of those indicia of evidential reliability identified in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 US 579, 113 S. Ct. 2786, 125 L Ed 2d 469 (1993). Under that standard, scientific beliefs, besides being falsifiable, should satisfy other indicia of evidential reliability, including peer review, publication, and general acceptance in the scientific community.
Some argue that if the opinion is based on technical or other specialized knowledge, it should be submitted, if relevant, directly to the jury. Epidemiologic evidence, they may assert, is that kind of knowledge. Yet, contrary to that argument, whether epidemiologic evidence is scientific evidence or just specialized knowledge, it should be evaluated in light of the applicable indicia of Daubert and any other indicia demanded in the epidemiologic community. To argue otherwise, successfully, plaintiffs must identify a criterion that will successfully demarcate scientific knowledge from other kinds of specialized knowledge? Many brilliant people have attempted to do so. Yet most philosophers of science believe none has been successful. So, if over the years philosophers of science have been unsuccessful in finding such a criterion, neither the U.S. Supreme Court nor the various trial courts should expect to do so successfully. As John Searle, Professor of Philosophy at UC Berkeley, remarked:
“Knowledge can be naturally classified by subject matter, but there is no special subject matter called ‘science’ or ‘scientific knowledge.’ There is just knowledge, and ‘science’ is a name we apply to areas where knowledge has become systematic, as in physics or chemistry.”
The fact is, FRE 702 concerns “specialized” knowledge, under which are subsumed “scientific” and “technical” knowledge. So, absent a cogent criterion demarcating technical from scientific knowledge, what’s true for opinions based on scientific knowledge should also be true for opinions based on “technical” knowledge.
- Is the Opinion Based on Scientific Methodology?
Under Daubert, the trial judge, as gatekeeper, must screen proffered scientific evidence “to preliminarily assess whether the reasoning or methodology . . . is scientifically valid [that is, whether it is scientific knowledge] and . . . whether that reasoning or methodology properly can be applied to the facts in issue [that is, whether that scientific knowledge is relevant].” The trial court is not asked to resolve scientific issues on both sides of which exist evidentially reliable evidence.
Presumably, what the USSC would have the trial court consider with regard to methodology are the following: (1) identify the methods used; (2) identify what methods are generally accepted in the relevant scientific community; (3) assess the fit between (1) and (2). The trial court is expected to consider steps (1), (2), and (3), but not to analyze the methods generally accepted in the scientific community to assess their merits or validity. (If the USSC expects the trial court to undertake that last step, what standard does the USSC expect the trial court to employ to evaluate the methods generally accepted in the scientific community?)
The concept of “scientific method” cannot be defined by resort to a set of necessary and sufficient defining criteria, but rather to an open-ended set of meaning criteria. The scientific method is probably best defined as a set of indicia or maxims for the generation and justification of data and theories. In Daubert, the USSC, however, does offer four classes of “appropriate observations:” (1) testing; (2) peer review and publication; (3) rates of error; and (4) general acceptance in the scientific community.
(1) Testing: Can the theory or technique be tested? Has the theory or technique been tested? The USSC quotes approvingly that “‘scientific methodology today is based on generating hypotheses and testing them to see if they can be falsified; indeed, this methodology is what distinguishes science from other fields of human inquiry.’” “‘The statements constituting a scientific explanation must be capable of empirical test.’” “‘The criterion of the scientific status of a theory is its falsifiability, or refutation, or testability.’” Epidemiologists test hypotheses using the methods of epidemiology. For instance, is low frequency electromagnetic radiation associated with childhood leukemia? Using epidemiologic methods, epidemiologists will assess whether that exposure is associated with that effect.
(2) Peer Review and Publication: Has the theory or technique been subjected to peer review and publication? The USSC remarked, “submission to the scrutiny of the scientific community is a component of ‘good science,’ in part because it increases the likelihood that substantive flaws in methodology will be detected.” Yet the USSC qualifies this criterion by noting that publication is not a sine qua non of admissibility for several reasons: (i) it does not necessarily correlate with reliability; (ii) sometimes well-grounded but innovative theories will not have been published; and (iii) some propositions are too particular, too new, or too limited in interest to be published.
(3) Rates of Error: What is the particular technique’s known or potential rate of error? For instance, if an epidemiologists bases her conclusion that eating bran flakes is negatively associated with colon cancer on the results of case-control studies, she would want to assess the rate of error of case-control studies?
(4) General Acceptance in the Scientific Community: Is the theory or technique generally accepted within the relevant scientific community? What methods do the well-recognized epidemiologic treatises require?
(5) Other Factors: Importantly, the USSC emphasizes that, “many factors,” not just these four “will bear on the inquiry” into whether or not “the reasoning or methodology underlying the testimony is scientifically valid.”
In the context of litigation, proffered opinions are more likely to have evidentiary reliability if they satisfy some additional criteria: (1) the opinion is based on research conducted independent of litigation; (2) the opinion was not developed expressly for purposes of testifying in court, and (3) the proffered opinion was based directly on legitimate pre-existing research unrelated to the litigation. These are particularly important criteria in assessing the reliability of epidemiologic studies.
“The overarching subject,” says the USSC, “is the scientific validity of the principles . . . that underlie a proposed submission.” That is, the system of beliefs underlying the expert’s opinion must be “valid” and “reliable.” “Validity” and “reliability” are the twin concepts in light of which scientists evaluate data, theories, principles, procedures, laws, instruments and devices. “Validity” refers to accuracy. That is, does the theory explain what it is purported to explain, the datum, principle or law is what it is purported to be, the procedure produces what it is purported to produce and the instrument or device detects what it is purported to detect? In epidemiology, the validity of an epidemiologic study has two components: internal validity and external validity. Internal validity is the validity of the inferences from the samples which the epidemiologist assembles and evaluates. External validity is the validity of the inferences about the population from which the samples were drawn.
“Reliability” refers to consistency among those who assess accuracy. That is, “reliability” refers to whether or not, inter-subjectively, those who take measurements agree from observation to observation or measurement to measurement. Reliability implies that different epidemiologists performing the same kind of epidemiologic study would obtain similar results. Yet reliability does not imply validity. Simply, although phenomena such as symptoms are observed consistently, nothing guarantees that these symptoms in fact measure what they purport to measure. Even so, “reliability” is an indicia of “validity.” That is, if a symptom or sign is unreliable, it has questionable validity. For example, if our clocks did not generally agree with one another, we could not use them to measure time. Our clocks would not measure falsely; they would not measure at all. Reliability, then, is a necessary but not a sufficient criterion for validity.
The defense will want to argue that the proffered epidemiologic evidence is unreliable if a series of epidemiologic studies exist on an issue and the studies in that series reach different results. For example, the results of case-control studies on the same topic are apt to vary, some with a positive association and some with no association. On this account, case-control studies could be criticized as lacking “reliability,” that all important component of validity. But to argue that all case-control studies should be inadmissible because case-control studies are generally unreliable is to throw the baby out with the bathwater. After all, the defense will want to arm itself with those case-control studies showing no association. This is a sound tactic so long as the power of these particular studies is sufficient to detect an odds ratio of at least two, and the studies are in other respects more methodologically sound than those that have resulted in a positive association. Otherwise the baby best go out with the bathwater.
Determining “validity” usually requires a more epistemologically secure standard of reference against which to measure the accuracy of an observation or technique of measurement. For example, the diagnosis of a brain tumor on the basis of symptoms of impaired neurological function can be verified by identifying the tumor through MRI, CT scan or biopsy (surgery and histological inspection). The logical direction of this process of validation is that the relatively non-specific, but more obvious phenomena either act as a symbol for or point to less obvious but more specific pathognomonic phenomena. In epidemiology, each piece of data used in a study should be valid and the kind of epidemiologic study in which that data are embedded, should generate results that are valid. The validity of a particular kind of epidemiologic study would be gauged by that kind of study considered to be the gold standard, in epidemiology that is usually the prospective cohort study.
So to claim that a procedure is valid is to be able to provide justifications that the validator is more epistemologically secure than the procedure itself. This appeal to firmer ground is an aspect of empiricism. So validation is apt not to have occurred if there is no descent to something more observationally direct; if there is no ascent to something more intuitively solid or plausible; or if there is no traverse either to something conceded to be less impeachable or to a series of logically and epistemically interconnected somethings within which that which is to be validated harmoniously meshes. An epidemiologic study would be validated to the extent it agrees with that kind of study considered to be the gold standard or, if the epidemiologic study is incomparable, to the extent it lacks internal errors.
The “validation dilemma” is that although something or some method may be a purported validator, often it has no surer epistemic foundation that than which it purports to validate. Simply, consider validation as a process of finding firmer epistemic ground. For example, an epidemiologist cannot very well validate the quality of the methodology of the case-control study through reference to the results of another case-control study. But she could validate the methodology of the case-control study by reference to the large prospective cohort study.
How many so-called “external validators”, when analyzed, are nothing more than the consensus of experts—“general acceptance in the relevant scientific community.” What could be the problem with validation based on the consensus of experts? Suppose Dr. Bob is an expert in naturopathic medicine. He believes that the ginseng root, if rubbed on the scalp, will result in prolific hair growth. This he believes not because he performed experiments with the root, but because Dr. Billy Bob, his venerated teacher in naturopathic school, told him so. Dr. Billy Bob, in turn, believed it was so because when he was a student he read that it was so in the treatise, “Ancient Cures for Baldness” by the late Dr. Graymalkin of Edinburgh, Scotland. No one knows why Dr. Graymalkin believed it to be so because she has been dead for three hundred years. The moral of this tale: the validity of a particular method of validation should never be taken for granted.
Yet there are limits to critique: “All testing, all confirmation and disconfirmation of a hypothesis takes place already within a system. *** The system is not so much the point of departure, as the element in which arguments have their life.” That is, the basic assumptions of the system are not put to the test, not backed up by evidence. This is what Wittgenstein meant when he said: “Of course there is justification; but justification comes to an end.” So the results of a case-control study may be verified by the results of a large prospective cohort study whose results in some circumstances, could be verified by a controlled clinical trial, whose results in the final analysis could only be verified by a program of such studies over a course of time.
What are examples of methodological flaws in proffered epidemiologic evidence that warrant a finding that the proffered evidence is, under FRE 702, inadmissible? First would be that “sampling error” likely accounts for the putative positive association between exposure and effect. Second would be that “systematic bias” likely accounts for the putative positive association between exposure and effect. Third would be that “confounding” likely accounts for the putative positive association. Fourth would be that the study has not been published in a peer-reviewed journal.
- Is the Opinion Sufficiently Determined by the Data?
A basic problem, then, in any process of validation is this: the theory, principle or procedure investigated will always be underdetermined by the data. As a result, a logical and epistemic space exists in which will be at play that activity called “interpretation.” That means that process of validation is a process of assessing an interpretation in light of other interpretations. Usually in that process, a number of interpretations may be offered. And, unfortunately, no one will have an impeccable gold standard by which to assess the merits of those various and often competing interpretations. So the process of validation becomes a process of circumferentially narrowing that logical and epistemic space through a process of argumentation or persuasion. That process is always open-ended. And so validity is always a matter of degree. The lesson is that validation is a process that is both scientific and rhetorical, and so requires both evidence and argument.
Epidemiologists can be challenged most often, if allowed to testify about causation, when offering opinions about general causation. Often those opinions will be significantly underdetermined by the underlying epidemiologic evidence. In those instances, as the United States Supreme Court has held, nothing in either Daubert or the Federal Rules of Evidence requires a trial court to admit opinion evidence which is connected to existing data only by the ipse dixit of the expert. A court may conclude that the analytical gap between the data and the opinion proffered is simply too great, and refuse to admit the opinion into evidence.
To enforce these requirements, the defense will need to engage a first rate epidemiologist to find every weakness in the epidemiologic study or studies on which plaintiffs’ experts base their opinions, to prove general causation. Once that’s done, the defense should require a hearing under FRE 104 to present the epidemiologist’s critique in order to block admission of that study or those studies into evidence. This will not be an easy task. The defense is apt to be required to demonstrate that not only does the study have errors, but that these errors account for the positive association. As a practical matter, opinions based on published epidemiologic studies other than case-control studies are unlikely to be ruled inadmissible. Battles over the admissibility of epidemiologic evidence are likely to be fought over two kinds of analysis: “meta-analysis and reanalysis.” Both are easily manipulable and hence tools of abuse in the hands of the unscrupulous forensic epidemiologist.
- Does the Proffered Evidence “Fit” the Issues of the Case.
Proffered epidemiologic evidence must also have a tendency to prove an issue of fact at issue in the litigation. If it does not, it is irrelevant and hence inadmissible. For instance, epidemiologic evidence does not fit the issues in the litigation if it demonstrates that exposure to benzene is significantly associated with leukemia when the issue of fact is whether or not silicone breast implants cause autoimmune disease.
- Opinion Based on Hearsay Reasonably Relied Upon by Experts in the Field in Forming Opinions.
By evidentiary rules such as FRE 703, an expert may base her opinion on facts or data not admissible in evidence if of a type reasonably relied upon by experts in the particular field in forming opinions or inferences upon the subject. Epidemiologists will often base their opinions on facts or data which would be considered hearsay and so not admissible in evidence.
- Probative Value of Evidence Outweighed By Danger of Unfair Prejudice, Confusion of Issues, or Misleading the Jury
Even if the proffered evidence satisfied the criteria of evidentiary rules such as FRE 702 and 703, it may still be ruled inadmissible under a rule such as FRE 403. FRE 403 provides that “although relevant, evidence may be excluded if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury, or by considerations of undue delay, waste of time, or needless presentation of cumulative evidence.” This rule is used rarely. Yet it might be used appropriately to exclude certain epidemiologic studies, those case-control studies, for instance, either with an odds ratio less than 2.0 or with a noticeable bias.
II. Expertise in Epidemiology
Epidemiologists are usually physicians with a Master of Public Health degree, those with a Sc.D. in epidemiology, or those with a Ph.D. in epidemiology.11 Those with PhD’s are said to have had, as a rule, more rigorous schooling on the principles of epidemiology.12
But establishing that a proffered expert has had formal training in epidemiology should be the beginning not the end of an inquiry into her expertise. What’s most important, because the field of epidemiology like the field of law is so varied, is whether or not the proffered expert has some practical experience about the disputed issue of general causation. Simply, an epidemiologist needs some experience with that issue to acquire an understanding of the variety and complexity of independent variables that might correlate with the relevant effect. In this regard, the epidemiologist should provide, or the defense should otherwise obtain, a list of what epidemiologic studies she has published. These studies will disclose her primary interests. These studies should also be read with a jeweler’s glass for statements about what epidemiologic methods are required for valid and reliable results. With those statements, the defense can then compile “a list of reminders” for the expert when she becomes tempted, in order to advance plaintiff’s litigation, to stray from the truth.
Early in the history of mass toxic tort litigation, plaintiffs tended to neglect epidemiologic evidence and, giving it short shrift, would engage and proffer physicians or others without expertise in epidemiology to testify about the relevant epidemiology.13 Typically, that testimony would describe the findings of the epidemiologic studies without analyzing the relative merits of those studies. As litigation of mass toxic tort cases has matured and the courts have recognized the importance of epidemiologic evidence, plaintiffs now typically engage epidemiologists to testify about epidemiology. Wisely, plaintiffs usually engage epidemiologists with solid sympathies for plaintiffs, someone such as Shauna Swan, Ph.D., who repeatedly testified for plaintiffs in the litigation over Bendectin and silicone breast implants.14 Yet, once the credibility of such an expert becomes exhausted, as it invariably does, plaintiffs will have to find another epidemiologist, with the lure of either substantial retainers or the opportunity to grind ideological axes.
But epidemiologists available to testify in court are rare. Most do not want to be forensic experts. For some reason, they do not seek the thrill of matching wits with trial lawyers. Nor do they relish having their names appear in opinions by trial or appellate courts, where they may be derogated for having some kind of commercial or ideological bias. Those few who do brave these hazards and testify are often not the epidemiologists involved in conducting the epidemiologic studies offered as evidence on the issue of general causation. As a result, what they say about an epidemiologic study by way of critique, if they have not reviewed the raw data, is often somewhat conjectural. “The results of this study may have been influenced by selection bias because the controls were selected from this particular population rather than a population similar to the cases.”
Given the small pool of forensic epidemiologists, physicians without special training in epidemiology may be asked to testify about the significance of epidemiologic studies relevant to the issue general causation.15 Usually the physician merely describes rather than critiques the epidemiologic evidence. Yet, whatever expert is proffered to testify, she should be able to analyze the statistical models or techniques used in those studies. She should be able to say whether or not they are appropriate.16 That is a litmus test. For if the so-called expert cannot do that, she cannot assess whether or not sampling error accounts for the association.
III. Exposure
1. Consensus Definition of Exposure
The term ‘exposure” refers to that situation in which people come into contact with something potentially harmful.17 These potentially harmful things can range from electromagnetic radiation to a variety of biologics, such as viruses and bacteria, and to various drugs and devices. Exposures of historical interest in the context of litigation have included injection of a flu vaccine; ingestion of L-tryptophan manufactured through recombinant DNA; use of tampons; ingestion of Bendectin; implantation of silicone breast implants; inhalation of tobacco smoke; and inhalation of asbestos fibers.
At times, an exposure may be described at the level of cells or even molecules. This level of description would provide a degree of precision most helpful in explaining how biologically plausible is a particular theory of causation. But it’s not required for purposes of epidemiology. For that purpose, what’s required is a phenomenological description of the exposure. For instance, did this insomniac ingest L-tryptophan manufactured by this company; or did this shipyard worker inhale asbestos in this ship thirty years ago; or did this pregnant woman ingest Bendectin to palliate symptoms of morning sickness? Even though this description need be only phenomenological, it should be as precise as possible to ensure uniformity in identifying exposures.18 For instance, in epidemiologic studies considering the association between use of a intrauterine contraceptive device (IUD) and pelvic inflammatory disease, one study defined exposure as use of an IUD within one month of hospital admission while another study used three months.19 This problem can be compounded if, within the same study, different definitions of exposure are applied to members of the samples in the study. Precision also enables other epidemiologists to replicate the study. It further prevents an epidemiologist, during analysis of the data, from redefining the definition of exposure in a way that partitions the data most favorably for the research hypothesis.20
If consensus is not reached and the various studies each apply a different measure of exposure or effect, the results could be said not to have been replicated from study to study.
2. Measures of Exposure
Ideally, the epidemiologist should specify precisely what variables are to be measured to establish an exposure. Exposure can be measured on a variety of variables, such as (1) intensity or dose of exposure; (2) duration or frequency of exposure; (3) route of administration of the putative toxin; and (4) timing of exposure.21
Information about these measurable variables can be obtained from a variety of sources. An epidemiologist may directly observe an exposure or she may obtain information about an exposure through interviews or questionnaires either with the person exposed or with witnesses.22 For instance, an epidemiologist may ask a Vietnam veteran whether he saw one or more of his comrades being killed in order to assess posttraumatic stress disorder. The epidemiologist may also review records to obtain information about exposure. For instance, she may review medical records to determine whether or not a patient received silicone breast implants. Or she may review prescription records to determine whether a woman bought Bendectin.
Whatever methods are adopted to assess exposure, the epidemiologist should attempt to minimize errors of measurement by obtaining information on exposure from more than one source.23 As Sir Austin Bradford Hill remarked, “one must go seek more facts, paying less attention to techniques of handling the data and far more to the … methods of obtaining them.”24
For the defense, ascertaining the method of assessing exposure is vital. What an epidemiologist asserts about an exposure must be valid. It must be in fact what it is purported to be. So always ask, what have the epidemiologists conducting the study done to insure validity.25 For example, if the method of ascertaining exposure is a questionnaire, obtain a copy and review it for the variety of problems that might impair its effectiveness and objectivity. Does it contain ambiguous questions or leading questions or restricted categories of possible responses? Odds are, the questionnaire has significant flaws, often serious enough to compromise the validity of the results of the epidemiologic study.
Data on exposures are usually obtained from the following:
- Face to face interviews;
- Existing records;
- Self-administered questionnaires;
- Telephone interviews; and
- Databases.
3. Biological Markers of Exposure
At times, biological markers are used to identify exposure. These markers are called “exposure markers.”26 For instance, when the ambient exposure fluctuates over time, measurement of a more stable biological marker will provide a more reliable measure of exposure. For example, measuring levels of fasting plasma glucose to estimate hyperglycemia is unreliable because those levels vary daily; however, that variability is avoided by measuring instead nonenzymatically glycosylated hemoglobin, an index of glycemia stable over several weeks.27
Even so, this method of identification has potential for error. This is so simply because tests for biological markers rarely predict perfectly whether someone has the marker. These tests for biomarkers are often merely economical or efficient surrogates for a gold standard test. As a result, they have a certain “sensitivity” (defined as the true positive rate) and “specificity” (defined as the true negative rate). Obviously, the more sensitive the test, the more effectively it will accurately identify the marker of exposure. But to say that a test, say with 95% accuracy, is positive is to say only that, assuming this patient has the target marker, this test will be truly positive 95 times in a hundred tests. What the positive test does not do is identify this patient as having a 95% probability of having the target marker.
What is wanted, given a positive test, is the probability that the patient has the target marker. This is called the “predictive value” of the test. To determine the predictive value, the epidemiologist needs to know the prevalence of the marker in the relevant population.
Prevalence of Marker x True Positive Rate
PPV = (Prevalence x True Positive Rate) + (1 – Prevalence x False Positive Rate)
For example, if the prevalence of this marker was 1%, and the true positive rate was 95%, and the false positive rate 5%, the predictive value would be 1.5%. That is, if the patient has a positive test, he has a 1.5%, not a 95% probability of having the target marker.
So when an exposure has been measured through a biological marker, the defense needs to become aware of the sensitivity and specificity of the test, of the gold standard used to establish sensitivity and specificity, and of the prevalence of the marker in the relevant population. Knowing the gold standard test is important because it provides the most accurate measurement of exposure. If use of the gold standard test is not that uneconomical, the epidemiologist should explain why the less accurate surrogate test was used in lieu of the gold standard test.
The trend of misclassification of exposure would be to have more people test positive for the marker than truly had the marker. That results in more people being classified as having been exposed. This is a trend the defense would want to limit. To limit this potential error, the epidemiologist should insure that such biological markers fulfill the following criteria: (1) be specific for a given exposure; (2) persist or degrade over time in a way preserving the order of cumulative exposures; (3) be detectable with accurate and reliable assays; and (4) have a small ratio of intra-subject to inter-subject variation.44
4. Misclassification Bias
If what counts as an exposure is not defined carefully and in a way permitting reasonably reliable identification, then what is apt to result is misclassification of what constitutes exposure and non-exposure. That is, a true exposure will be classified as a non-exposure and a true non-exposure will be classified as an exposure.
Of these possible kinds of misclassification, the kind most likely to occur, outside litigation, is that in which true exposures will be classified as non-exposures. But in litigation, the reverse is apt to occur because litigants will lie about the nature of their experience in order to become classified as exposed. In either event, the rate of misclassification differs between cases and controls. In the latter example, for instance, controls are more often misclassified as cases than cases are misclassified as controls. For the defense, this is a dangerous phenomenon. It is known as “differential misclassification.”45
Plaintiff will have the effect. When the effect is not a signature disorder, then proof of exposure becomes critically important. But often, plaintiff will have difficulty proving exposure. If the stakes in proving exposure are great, plaintiff can be expected to lie, testifying to an exposure.
5. Case Control Studies
In case-control studies, controls, when selected, should be classified as exposed or unexposed. Cases should be classified at the time of diagnosis. The processes for identifying the rate of exposure for cases should be comparable to those for identifying the rate of exposure for controls.46 This is vital. For instance, if the subjects are being interviewed, the interviewer should be blinded to disease status. The interviewer should also use a “structured” interview, asking each subject exactly the same question in the same manner. If a record is being reviewed, the information should be equally likely to have been recorded for both cases and controls. Questionnaires are very often used to obtain information about exposures. Yet this method of gathering information is somewhat imprecise. As a result, the epidemiologist should conduct sub-studies to validate the self-reports of those responding to the questionnaire.
Accurately identifying the exposure is critical to the validity of these studies. The major threat to validity is differential misclassification of exposure. That kind of misclassification is an important problem for the defense if cases are classified as exposed when, in fact, they are not exposed. When members of the study groups are in litigation, obtaining information about exposure through interview or questionnaire is fraught with a potential for differential misclassification of exposure. Yet, in litigation, many investigators approach subjects in the study, naively as though the subjects had no motive to malinger. Anyone in litigation has at least a monetary interest in being classified as someone exposed. So when an epidemiologist seems unreasonably naïve, her motives should be questioned. For instance, when the study base is small, epidemiologists would obtain more precise information through structured interviews and clinical examinations. If, instead, they obtain information through less precise methods such as questionnaires, ask them what justifies use of the less precise method.
In case-control studies, identifying an exposure often occurs not through interviews with or questionnaires to the subjects but through a review of the subjects’ medical records. This process can result in differential misclassification of exposure, biasing the results of the study in favor of the defendant. For example, in the silicone breast implant cases, a plaintiffs’ epidemiologist argued that, in an important case-control study favoring defendants, exposed women might be misclassified as unexposed. In that study, determination of exposure was made strictly by reviewing Mayo Clinic medical records. But women with records there may have had their SBIs implanted at other hospitals without that fact being recorded in the Mayo Clinic records. As a result, women who had medical problems and SBIs may have been misclassified as not having SBIs, thereby underestimating their risk from SBIs.
IV. Effect
1. Identify and Define the Effect
Identifying accurately the effect of an exposure is critical. Often the effect will be easily identified. For instance, if the exposure is alleged to cause a form of cancer, that form of cancer can be identified by trained pathologists. Yet, at other times, the effect can be identified only provisionally. This typically occurs when the effect is a syndrome constituted of unspecified subsets of a vast array of non-specific symptoms. Then the defense should be very much concerned with ensuring that the diagnostic criteria for such a syndrome be adequately defined before the epidemiologist begins the study.31
To adequately define the diagnostic criteria, qualified clinicians must identify a cluster of related signs and symptoms with a characteristic evolution indicative of pathology. To begin, they will consider plaintiffs’ alleged complaints. Usually plaintiffs allege a variety of complaints. Invariably, these complaints are subjective and “non-specific.” That is, they are symptoms which may result from any number of ordinary and extraordinary causes.
Plaintiffs typically implicate scores of symptoms from abasia to zoster. Implicating so many symptoms is troublesome in that it proves too much. With that many symptoms, trillions of combinations of symptoms exist, trillions of varied manifestations of a so-called disorder such that every one of us could have it as long as we were exposed. The possible combinations of signs and symptoms, if n represents the total number of signs and symptoms, is 2n-1. To back out the empty subset, subtract 1 so that the relevant possible combination of signs and symptoms is 2n-l – 1. Obviously, if the only thing that distinguishes those with the alleged disorder and those without it is having the exposure, no effect has been identified in a cause and effect relationship; all that has been identified is a potential cause.
To identify a unique pathology, plaintiffs need to do some more analysis. First, they should analyze all these symptoms and identify which, if any, tend to occur together. “There are statistical means,” said a prominent plaintiffs’ epidemiologic expert, “to look for clusters of symptoms and patterns in symptoms.” If these symptoms occur together in some pattern, they are events which are “statistically dependent.” That is, when a symptom occurs, its occurrence predicts, by a value greater than chance, the presence of another symptom and another, and so on. The extent to which a symptom predicts another can be determined through “correlation studies.” These studies will generate a quantitative measure of the strength of that relationship called the “correlation coefficient.” Of course, that these variables correlate does not necessarily mean they are caused by the exposure. They could be caused by other events each plaintiff has in common. Not surprisingly, plaintiffs’ experts will probably not have conducted correlation analyses of the signs and symptoms allegedly associated with the exposure. Indeed, until forced by a court to do so, plaintiffs’ experts will probably not even attempt to fashion a hypothesis about what is that alleged disorder, let alone test it.
Some plaintiffs’ experts may concoct a hypothesis about the diagnostic criteria of the alleged disorder. But even this hypothesis is apt to be ignored by many plaintiffs’ experts; they will continue as though no hypothesis exists. So, as a practical matter, each expert has concocted his own private set of criteria. (This is reminiscent of Wittgenstein’s remark about a silly method of verification: “Imagine someone saying, ‘But I know how tall I am!’ and laying his hand on top of his head to prove it.”)32 As a result, this alleged “disorder” always will be an “essentially contested concept” like the concepts “social justice,” “the good life,” and “the most beautiful dog.”33 All are “soft end points” – notions that lack sufficient and necessary criteria or that cannot otherwise be defined by reference to facts about the world.
Issues about case definition:
- Does the health condition exist along a continuum of severity;
- Is the disease label a catchall for variety of conditions, possibly of a different origin;
- Is there disagreement over criteria for diagnosis of the disease;
- Does the disease develop over a short or long period of time; and
- Whether subjective and/or objective criteria are used to diagnose disease.
2. Diagnostic Reliability and Validity
Reliable and valid diagnosis of the effect or disorder is key to a valid epidemiologic study.34 Yet achieving that is notoriously difficult. First, the process of diagnosis is not particularly reliable. The same diagnostician may diagnose the same phenomena correctly one day, but incorrectly the next. Or different diagnosticians may diagnose the same phenomena differently the same day or the next day. Indeed, diagnosticians often do disagree about all aspects of the process of diagnosis: They disagree about the same patient’s history, the physical findings on examination, the interpretation of diagnostic tests and the diagnosis.35
For example, when two experienced surgeons, using the same diagnostic criteria, independently interviewed the same group of patients who had had operations for peptic ulcer, they agreed on whether the operation had been successful in less than two-thirds of the cases. In another instance, when three cardiologists interviewed the same 57 men with chest pain, at least one clinician judged 54% to have angina pectoris. Yet all three cardiologists agreed about the history in only 75% of cases, and when one cardiologist concluded that a given patient had angina pectoris, the other two agreed with him only 55% of the time.36
For a diagnosis to be reliable, the same diagnostician should reach the same diagnosis for the same clinical phenomena at different times (inter-diagnostic reliability) and different diagnosticians should reach the same diagnosis for the same clinical phenomena (inter-diagnostic reliability). Yet, obviously, this is merely a utopian dream, an ideal. The more clinical phenomena to assess, the more likely a diagnostician will evaluate that phenomena differently one day to the next. The more diagnosticians involved in assessing a single set of clinical phenomena, the more likely each will reach a different diagnosis. Reliability could be increased if the diagnosticians were limited. Yet, even then, if a limited set of diagnosticians were hired, say, by plaintiffs’ attorneys, and if they agreed upon the diagnostic criteria to apply to the clinical phenomena, while reliability is thereby increased, the risk increases for bias. Then, although the reliability of the diagnoses may increase, the validity of these diagnoses may be altogether absent.
Second, apart from the reliability of the diagnosis, the diagnostic criteria used to identify a disorder may not be particularly valid, even though all the diagnosticians agree on the diagnosis. Establishing the validity of diagnostic criteria is complicated.37 Strategies available for establishing the validity of a clinical syndrome include the following: (1) identification and description of the syndrome either by clinical intuition or by cluster analysis; (2) demonstration of boundaries between related syndromes by techniques such as discriminant function analysis and latent class analysis; (3) follow-up studies establishing a distinctive course or outcome; and (4) therapeutic trials establishing a distinctive response to treatment.38
Often, no reliable gold standard exists to verify the classification criteria. For example, no reliable gold standard validates the classification of certain rheumatic diseases owing to the lack of a set of unique diagnostic findings. In that event, diagnostic criteria would have to be developed with prospective studies and through methods using statistics and consensus (for example, the Delphi method or the expert panel). This process often involves the following steps: (1) a committee of experts isolates a set of historical, physical and laboratory features for further consideration; (2) the committee then determines the sensitivity and specificity of these features by the Delphi method, a technique designed to use consensus of expert opinion in situations of uncertainty by assuring anonymity, feed back, and iteration; (3) the committee then conducts a prospective study enrolling patients diagnosed with the target disorder and a comparison or control group with “confusable” signs and symptoms.
Initially, the committee uses a set of diagnostic criteria greater than those assessed through the Delphi method. This enlarged set of criteria is used by clinicians, blinded to the status of the case or control group, to diagnose the target disorder. A criterion is included in subsequent analyses if it discriminates between the target disorder and the comparison group. (Alas, the gold standard is the clinical diagnosis of the target disorder.) Using this approach, the committee narrows the enlarged set of criteria to be further considered as important in the diagnosis of the target disorder; (4) the committee then develops classification criteria using two different statistical techniques: (a) stepwise logistic regression and Boolean algebra and (b) classification trees or recursive partitioning.39
When the classification criteria are established, they will reflect a trade off between stringency and laxity. For the defense, the cutline is important. Stringent criteria are apt to misclassify some mild cases as normals. This is what the defense would want. Less stringent criteria may misclassify some normals as diseased. This is what plaintiffs want.
3. Timing of Diagnosis
If, after an exposure, the subject in a study group does not manifest symptoms of the disorder for some time, the process of identifying cases may be imprecise and result in misclassifying cases as non-cases. (Generally, this is not a problem for the defense.) As a result, an epidemiologic study should consider this possible bias by considering the “induction period” and the “latent period” of the disorder. The “induction period” is the period of time from causal action until initiation of disease. The “latency period” is the interval of time between occurrence of the disease and detection of the disease.40
If the latency period is long, then early case-control studies will misclassify cases as controls and the results of the study will be biased, demonstrating falsely that the exposure is not associated with the effect. Sometimes, the latency period can be reduced by improved methods of detecting disease. As a result, if many diagnosticians are involved in identifying cases, some with more sophisticated ways to detect disease than others, the result is an uneven identification of cases, and that too can result in bias.
4. Misclassification Bias
If what counts as an effect is not defined carefully and in a way permitting reasonably valid and reliable identification, then what is apt to occur is misclassification of what constitutes an effect.41 That is, a true case will be classified as a non-case and a true non-case will be classified as a case. In the context of litigation, the likelihood is that a true non-case will be classified as a case. This is termed “differential misclassification.” For example, when a physician is uncertain about which of several possible diagnoses is appropriate for a patient, she may select that diagnosis widely publicized as having resulted from a notorious exposure. She may do so simply because of the siren song of the publicity and not because of sound clinical inference.42 For the defense, this is a problem. Simply, this bias will result in a finding that the effect is associated with the exposure. Indeed, if the effect is rare, misclassifying non-cases as cases will increase the strength of that spurious association significantly.
5. Prevalence of Effect
Generally, the prevalence of an effect is defined as the ratio of (1) the number of those manifesting the effect in a specified population at a specified time to (2) the population at that specified time.43 Two species of prevalence are “point prevalence” and “period prevalence.” “Point prevalence” refers to a static picture or snapshot of the number of persons who have the disease in a population at one point in time.44 “Period prevalence” refers to the number of people who have the disease in a population during a specified period of time. (This measure is infrequently used because it is said to be no longer useful.)45 It is customary to use the actual or estimated size of the population at the midpoint of the period of time.
Point Prevalence = Number of Existing Cases
Total Population
Period Prevalence = All Cases Existing During the Period
Total Population at Midpoint of Period
6. Incidence of Effect
The “incidence” of an effect is an important concept in cohort studies. For it is the variable needed to compute measures of association. The incidence of an effect is defined broadly as the number of new instances of the effect occurring during a specified period of time in a specified population.46 Incidence can be measured in four basic ways:
Incidence as a Count of Events. Incidence may be measured as merely the number of newly observed events. If n represents the number of newly observed events, the “incidence” is n.
Incidence as Events per Unit of Time. Incidence may be measured as the ratio of the number of events to the time of observation for those events. If n represents the number of observed events, and t represents the period of time for observation, then the incidence is the quotient of n divided by t.
Incidence as Events per Unit of Amount of Observation. Incidence may be measured as the ratio of number of events observed to the time of observation for those events and the number of observed people in the population. If n represents the number of observed events, and t represents the period of time for observation and e represents the number of observed people in the population, then the incidence is the quotient of n divided by the product of e and t. Because this is a measure of the density of events occurring during the observation period, it is often referred to as the “incidence density.” Following are ways of computing this ratio:
Incidence as a Probability
Incidence may be measured as the ratio of the number of events in a given period to the number of observed people at risk in the population. This is the probability of the event occurring to an element of the observed people at risk in the population over a given interval. It’s known as the “cumulative incidence rate.” If n represents the number of events in a given period and e represents the number of observed people at risk in the population, then the incidence is the quotient of n divided by e.
V. Association between Exposure and Effect
1. Association Defined.
An “association” is defined as a relation between an exposure and an effect such that the exposure and effect occur together more (or less) frequently than they would strictly by chance. For example, in 1981, in the Women’s Health Study, a relative risk of 1.6 (95% CI 1.4 – 1.9) was reported for users of intrauterine devices (IUD) compared to non-IUD users for pelvic inflammatory disease (PID). This finding is a report of an association, beyond that due to chance, between use of IUDs and PID.47
That an exposure is statistically associated with an effect is a necessary, but not a sufficient condition for inferring that the exposure causes the effect. For example, in 1994, an epidemiologic study reported a relative risk of 1.38 for breast cancer from exposure to electromagnetic fields. But, today, few would conclude from this association that electromagnetic fields cause breast cancer.
2. Measures of Association
Epidemiologists have two basic measures of association: quotients and differences.
Quotients
Measures of association expressed as a quotient are the relative risk (“rate ratio” and “risk ratio”) and the “odds ratio.”48 The relative risk is used as a measure of association in cohort studies.49 The odds ratio is used as a measure of association in both cohort and case-control studies.50
In epidemiologic studies, the data needed to measure an association are often represented in a 2 by 2 (or fourfold) table. A 2 by 2 table consists of two columns representing the presence or absence of the effect (disease or disorder) and two rows representing the presence or absence of the exposure.
Diseased Healthy
Exposed
A
B
Total Exposed A+B
Unexposed
C
D
Total Unexposed C+D
A + C
Total Diseased
B + D
Total Healthy
Relative Risk (Risk Ratio and Rate Ratio)
The relative risk is defined as the ratio of two incidence rates. An “incidence rate” is the expected proportion of a fixed population-at-risk that develops the disease over some specified period. So the rate ratio is the ratio of the incidence of the effect in those exposed to the incidence of the effect in those unexposed.
Incidence of effect in exposed
Relative Risk = ————————————————————–
Incidence of effect in unexposed
When “incidence density rates” are compared, the quotient is referred to as the “rate ratio.” When “cumulative incidence rates” are compared, the quotient is referred to as the “risk ratio.” Both the rate ratio and the risk ratio are referred to as the relative risk.51 The domain of values for the variables in this formula will result in a quotient equal to 1 or greater than 1 or less than 1. If the quotient is equal to 1, the exposure is not associated with the effect. If the quotient is greater than 1, then the exposure is positively associated with the effect. If the quotient is less than 1, then the exposure is negatively associated with the effect.
Relative risk is distinguished from “absolute risk.” “Absolute risk” is the incidence of disease in a population.52 Measures of absolute risk include, for instance, incidence rates and prevalence. The absolute risk for the exposed is A/[A+B]. The absolute risk for the unexposed is C/[C+D]. The relative risk is the ratio of these two risks, and is expressed as follows:
RR = A/A+B
C/C=D
The odds ratio is the ratio of the odds of the effect in the exposed to the odds of the effect in the unexposed.
The odds of the effect of in the exposed is as follows:
A/B =
A/(A+B)
B/(A+B)
The odds of the effect in the unexposed is as follows:
C/D =
C/(C+D)
D/(C+D)
So the ratio of these two odds (the odds ratio) is expressed as follows:
OR = A/B
C/D
The odds ratio often overstates the relative risk, especially if the effect is common.53 In case-control studies, subjects are sampled conditioned on whether or not they have the effect. As a result, in case-control studies, one cannot obtain a direct estimate of relative risk. Yet sometimes the odds ratio does adequately approximate the relative risk. This occurs if the cases are “incident” cases and the controls are concurrently selected from the same study group.54 More specifically, it occurs: (1) when the cases are representative as to history of exposure of all with the effect in the population from which the cases were drawn; (2) when the controls are representative as to history of exposure of all without the effect in the population from which the cases were drawn; and (3) when the disease or effect does not occur frequently; however, if people who become cases over the risk period are also eligible for inclusion in the control group, the rare-disease assumption can be discarded.71
When those conditions exist, the relative risk will approximate the odds ratio as A and C approach 0.
- Differences
The basic measure of association expressed as a difference is the attributable proportion of risk (or attributable risk).72 The attributable proportion of risk is the incidence rate among those exposed less the incidence rate amount those not exposed:
A/[A+B] – C/[C+D].
The attributable risk indicates that amount of the total risk attributable to the exposure. For example, if the attributable risk was 0.60 for myocardial infarction from being a defense attorney, then being a defense attorney is the exposure or risk which accounts for 0.60 of the total risk for having a myocardial infarction.
3. Strength of Association
When the quotient of a ratio measure of association is greater than 1, the exposure is positively associated with the effect. But because the range of values for positive associations is from a value just greater than 1 to infinity, not all positive associations are the same. As a rule, the greater the value of the quotient, the stronger the association. The stronger the association, the less likely that it is due to confounding variables.57 And the more persuasive it is as a necessary but not a sufficient indicia of general causation.58 Unfortunately, no reliable way exists of identifying a dividing line between significant and insignificant associations. Some epidemiologists would offer the following guidelines:
0.9-1.1 No effect
1.2-1.6 Weak hazard
1.7-2.5 Moderate hazard
> 2.6 Strong hazard
Some would require a quotient of 2 or more to credit an association as an indicia of causation.59 Yet many epidemiologists would suggest that as a rule, only a quotient of three or four is significant.60
Study period in j intervals.
to ti t 2 t3 t j-1 t j
—–|—-|—-|—–|——|———-|——- →
Start Finish
Under incidence-density sampling, one or more controls are selected for each case from those at risk at the time of onset of the case (to).
Under cumulative-incidence sampling, controls are selected from those still unaffected at time t1. Cases are selected from those developing the disease in same population over a period of time (to to t1).
“Incidence-density ratio” is the ratio of two incidence densities. The “incidence density” is the expected number of new cases per unit of person-time at risk.
“Risk ratio” is the ratio of two cumulative incidence rates. A “cumulative incidence rate” is the expected proportion of a fixed population-at-risk that develops the disease over some specified period.
Under incidence-density sampling, as opposed to under cumulative-incidence sampling, the odds ratio is generally a better approximation to the risk ratio.
Incidence-density sampling is a more appropriate model in chronic disease epidemiology.
Others would suggest that as a general rule of thumb, only a ratio of three or four is noteworthy. Some courts require a ratio of 2 or more in order to credit the association as an indicia of causation.
Study period in j intervals.
to ti t2 t3 t j-1 t j
—–|—-|—-|—–|——-|———|———→
Start Finish
Under incidence-density sampling, one or more controls are selected for each case from those at risk at the time of onset of the case (to).
Under cumulative-incidence sampling, controls are selected from those still unaffected at time t1. Cases are selected from those developing the disease in same population over a period of time (to to t1).
Under incidence-density sampling, as opposed to under cumulative-incidence sampling, the odds ratio is generally a better approximation to the risk ratio.
VI. Systematic Error or Bias
1. Systematic Error Defined
Systematic error or bias is defined generally as “any process at any stage of inference which tends to produce results that differ systematically from the truth.”61 So when error occurs randomly, it is called “measurement error.” But when it occurs non-randomly, that is, systematically, it is called “systematic error.”
The “validity” of an epidemiologic study is determined by the degree to which it lacks systematic error. As a result, the dogged hunt for systematic error is the crux of evaluating epidemiologic studies. For the defense, this is particularly important. In litigation, plaintiffs will often support their conclusions about causation with the results of case-control studies, a form of epidemiologic study highly susceptible to systematic error. These are the studies the defense will be vetting for systematic errors that result in a positive association. Errors to find are those which overstate the numerator and understate the denominator in the odds ratio.
Unfortunately, many kinds of systematic error may infect an epidemiologic study. And, often, from merely reading the study, these biases will not be revealed. For instance, bias can occur when the epidemiologist reads up on the subject of interest; or specifies and selects the study sample; or executes the study; or measures exposures or effects; or analyzes the data; or interprets the analysis. To identify these biases, the defense will need the raw data of the study and will need to depose, whenever possible, the epidemiologist who conducted the study. Always remember in epidemiology, the devils in the details.
2. Confounding
Systematic bias, besides being distinguishable from measurement error, is also distinguishable from “confounding.” A confounder is a variable causally related to the effect and associated with but not a consequence of the exposure. So to be a confounder, a factor must satisfy these three criteria: (1) it must be a risk factor for the effect; (2) it must be associated with the exposure; and (3) it must not be affected by the exposure or the effect.62 For instance, suppose that women infected with Lyme’s disease at younger ages tend to have longer incubation times. Suppose also that the women in the study groups are, on average, younger than women in the general population. These members of the study groups will, on average, have longer incubation times than members of the general population. In that event, age is a confounder for estimating risk in the general population, and confounding by age would result in underestimation of the proportion of women in the general population who will develop Lyme’s disease.
Following are three phenomenon which may be confused with confounding: “intervening variables,” “effect modification” and “interaction.”
- Intervening Variables
Intervening or intermediate variables are defined as variables associated with the exposure in the causal pathway from exposure to the ultimate effect.63 These intervening variables are not confounders.
- Effect Modification
“Effect modification” (statistical interaction) is different from “confounding” and occurs when the magnitude of the rate or odds ratio varies owing to the value of a third variable.64 For example, suppose that eating ten donuts a day elevated the myocardial infarction rate of men in the study by 1.70, but elevated the myocardial infarction rate of women by only 1.20. Gender modified the rate ratio. So gender is the effect modifier.
- Interaction
Interaction is also different from confounding, and is defined as that situation in which the incidence rate of disease resulting from two or more risk factors differs from the incidence rate expected to result from those risk factors individually.65 The exposure is associated with an effect, but the effect is lesser or greater depending on the interaction of the variables.
3. Major Types of Bias
The major types of bias are selection bias, information bias, and uncontrolled confounding.66
- Selection Bias
Selection bias can occur when the samples are selected for the epidemiologic study. When the procedure for selecting the samples is not random, the result is apt to be selection bias. Selection bias results in an observed relation between exposure and effect that is different among those in the study from among those who would have been eligible but were not chosen for the study.67 For instance, suppose an epidemiologist wanted to determine the proportion of people in the general population who enjoyed professional baseball games. To determine this, she stood at the turnstile of a baseball park and asked every other person through the turnstile whether or not they enjoyed professional baseball games. Most would say they enjoyed them. But they are not likely representative of the general population.
Be careful to distinguish selection bias from information bias. In a case-control study, selection bias is a product of the process of selecting cases and controls. Information bias is the product, once the relevant samples are selected, of determining who was exposed before onset of the effect. In cohort studies, selection bias is the product of the process of selecting those who were exposed and those who were unexposed. Information bias is the product, once the relevant samples are selected, of determining who developed the effect during the period of follow up. Selection bias results if, for example, the control group was selected in such a way that it would have proportionately fewer members exposed than a sample of controls drawn randomly from the population. Selection bias results if, for example, the case group was selected in such a way that it would have proportionately more members exposed than a sample of cases drawn randomly from the population. For example, selection bias will occur when the exposure in question becomes a defining characteristic of the effect. For instance, in the silicone breast implant litigation, plaintiffs and their experts fashioned a definition of the effect–“silicone breast implant disease”–which included the criteria of having silicone breast implants.
- “Berkson’s Bias”
Berkson’s bias, a species of selection bias, occurs when a group of cases is drawn from a hospital population if that group has a greater probability of having more than one medical problem than a group drawn from the general population.68 If the epidemiologist decided to select controls from only one of the following three populations: (1) hospital population; (2) general population or (3) military population, the result may be selection bias. The hospital population has the least healthy members. The military population has the most healthy members. So if the epidemiologist selects the unexposed group from the hospital population, the incidence of the effect may be disproportionately smaller.
- “Neyman’s Fallacy”
Neyman’s fallacy, another species of selection bias, occurs when “prevalent cases” are used as representative of “incident cases.” Prevalent cases, unlike incident cases, are affected by the duration of the disease which in turn is affected by treatment, cures and mortality.69
- Post Hoc Selection Bias
Post hoc selection bias results after the samples have been selected for the epidemiologic study.70 As a result of events that occur during the study, the composition of those samples is altered. For instance, members of a sample may drop out of the study or die or fail to respond to requests for information relevant to the study, or they may disclose information during the study which the epidemiologist needed to know before the study was underway.
- Ascertainment Bias
Ascertainment bias results when the epidemiologist fails to ascertain cases or controls who were exposed at the same rate as would have occurred had the cases or controls been selected randomly from the population.71 For instance, ascertainment bias occurs if the epidemiologist selects the cases from the clinic of a physician to whom plaintiffs’ counsel have referred all their prescreened clients with symptoms of fatigue and silicone breast implants, but then selects the controls from a clinic in a Shaker community.
- Diagnostic Bias
Diagnostic bias occurs when, in selecting cases for a case-control study, the clinician fails to diagnose cases at the same rate as would have occurred had true cases been randomly selected from the study population.72 Watch for the physician who cannot evaluate whether or not the patient has the effect when blinded to whether or not the patient was exposed. In that event, the result is apt to be over-diagnosis of that effect putatively associated with the exposure, that is, misclassifying a non-case as a case.
- Response Bias
Response bias occurs when those who agree to participate are different from those who decline to participate on the characteristic of exposure or effect.73 For instance, when those asked to be controls frequently decline to participate, the result may be selection bias if they have a greater prevalence of the effect than those who agree to participate. This kind of selection bias will harm the defendant. Simply, it will increase the strength of an association between exposure and effect.
- Information Bias
Once the appropriate samples have been selected, the epidemiologist must obtain information from the members of these samples to determine, for example, whether, in case-control studies, a member of the sample is exposed or unexposed or whether, in cohort studies, a member has the effect or does not have the effect.90 Errors in obtaining that kind of information result in misclassifications, and are called “misclassification errors.” Misclassification errors are of two types: differential and non-differential.
- Differential Misclassification
Differential misclassification occurs when the subjects who fall into one category are misclassified at a rate greater than are the subjects who fall into another category.75 That is, in case-control studies, cases are misclassified as exposed at higher rates than controls are misclassified as exposed. And in prospective cohort studies, the exposed are misclassified as cases at higher rates than the unexposed are misclassified as cases. For example, in a prospective cohort study, the effect was under-diagnosed in the unexposed group because, relative to the exposed group, members of that group visited physicians less often.
* Differential Misclassification of Exposure: This kind of misclassification results, for example, when cases are misclassified as being exposed more often than controls are misclassified as exposed. For instance, in a case-control study, epidemiologists determined exposure to silicone breast implants (SBIs) by reviewing medical records from a particular clinic. Yet, arguably, women may have had SBIs without that fact having been recorded in that clinics medical records. As a result, women with medical problems and SBIs may have been misclassified as not having SBIs, thereby underestimating the odds ratio.
* Differential Misclassification of Effect: This kind of misclassification results, for example, when exposed people are misclassified as having the disorder more often than unexposed people are misclassified as having the disorder. For instance, women with silicone breast implants (SBI’s) were referred for diagnosis to physicians hired at considerable expense to testify for plaintiffs because these physicians believed a priori that SBIs caused autoimmune disease. These physicians diagnosed all women with SBIs with “SBI disease” and, of course, diagnosed none of the women without SBIs with the disease.
- Non-differential Misclassification
Non-differential misclassification is misclassification that is random.76 As a result, if the samples are large enough, the proportions of subjects misclassified is approximately equal in all study groups. Non-differential misclassification of exposure tends to shift the relative risk or odds ratio towards 1. In studies with a measure of association much greater than 1, concern about non-differential misclassification of exposure or effect is rarely warranted. This is so because the estimate of exposure or effect, without misclassification, could be even greater provided the misclassification probabilities apply uniformly to all subjects.
*Non-differential Misclassification of Exposure: This kind of misclassification results, for example, when some in the study fail to candidly acknowledge they were exposed. As a result, they are classified as non-exposed.
*Non-differential Misclassification of Effect: This kind of misclassification results, for example, when some in the study fail to acknowledge symptoms that would cause them to be diagnosed with the effect. As a result, they are classified as controls.
- Interviewer Bias
Interviewer bias occurs when an epidemiologist fails to gather information in the same way from the different study groups.77 In case-control studies, this bias is particularly at play in determining exposure. Simply, when the subject has the effect, that fact may prompt the interviewer to probe the subject’s past more thoroughly to find an exposure. For instance, in a case-control study, nurses reviewing the medical records knew which women had silicone breast implants (SBIs). As a result, if, on one hand, these nurses were biased against the idea that SBIs cause disease, they would look less intently for disease. If, on the other hand, they had sympathy for women with SBIs, they may have looked more intently for medical problems in the women with SBIs than in women without SBIs.
- Detection Bias
Detection bias occurs when in a cohort study, for example, those who were exposed are more likely to be examined for and diagnosed with the effect than those who were unexposed.78 For instance, women who took estrogen (the exposure of interest) might develop uterine bleeding as a result of the estrogen, and thereby visit a physician more often than those not taking estrogen. Because those taking estrogen have more gynecological examinations, they are more likely to be diagnosed with endometrial cancer (the effect of interest).
- Recall Bias
Recall bias occurs when members of the study groups recall past events with different rates of accuracy or completeness.79 In case-control studies, for example, epidemiologists need to assess whether or not the subjects were exposed. In that situation, recall bias is particularly likely when subjects are asked to recall whether or not they were exposed, especially if the subject is in litigation. This bias creates the potential for differential misclassification. For instance, if a member of the case group with the effect of fatigue was asked to recall whether she ingested an over-the-counter laxative in the last year, she may recall that she did more often than a subject in the control group without fatigue or any other worrisome symptoms.
- Reporting Bias
Reporting bias results when those in one group tend to be more or less likely to report information than those in another group.80 For instance, if those in the case group tend to report an exposure more often than those in the control group, the result may be reporting bias if the case group and the control group have the same prevalence of exposure. The different rates of reporting exposure could be due to the fact that the case group is in litigation and so has more carefully searched the past for exposure.
- Response Bias
Response bias results when those in one group tend to fail to respond to requests for significant information more often than those in another group.81 For instance, if those in the exposed group tend to fail to respond to questions on a questionnaire more often than those in the unexposed group, the result may be response bias.
- Uncontrolled Confounding
A confounder is a variable causally related to the effect and associated with the exposure in the study population, but not a consequence of the exposure.82 Confounders, when identified, are often controlled through the design of the study or through analysis of data. But when not identified, they remain uncontrolled, and become a source for a false positive association.
For example, in the polio-vaccine trials, the incidence of polio was clearly lower among unvaccinated children whose parents refused permission for injection than among children who received the placebo injection after their parents gave permission. Was the placebo injection of a purportedly harmless substance causing polio?
The answer is, it was not because a confounder was responsible for the difference. As it turned out, families who gave permission differed from those who did not in ways related to susceptibility to polio. Here are the reasons: (1) higher-income parents would consent to an injection more often than lower-income parents, and (2) children of higher-income parents are more vulnerable to polio than children of lower income parents. This is because polio is a disease of hygiene. Children who live in less hygienic surroundings tend to contract mild cases of polio early in childhood, while still protected by antibodies from their mothers. After being infected, they generate their own antibodies which protect them against more severe infection later. Children who live in more hygienic surroundings do not develop these protective antibodies.
4. Methods For Controlling Bias
For a valid epidemiologic study, it is essential to control for bias and confounding. But doing so effectively is difficult. More than technical expertise is required; also required is substantial insight into how the variables being studied might interact. So each epidemiologic study is only as reliable as the level of insight possessed by the epidemiologists who conduct that study. Unfortunately, before beginning a study, epidemiologists cannot always effectively specify all the variables they should control. As a result, they usually control for biases and potential confounders not through the design of the study but through what is called “analysis” of the data.83 For instance, they may analyze the data using techniques called “stratification” and “adjustment.” But even then they may find that their analyses work only when they have additional data beyond those from the study. Those additional data are usually absent or very limited. To meet this problem, they may have to resort to less satisfactory partial analyses or forgo any analysis for errors beyond sampling errors.
- Study Design
Before a study begins, experienced epidemiologists recognize that they need to control some potential confounders, such as gender or age. Usually, they will attempt to control these variables through the “design” of the study. For instance, they may design the study using “randomization,” “restriction,” or “matching.”
- Randomization
“Randomization” is a process by which subjects are “randomly” assigned from the population to the relevant study group.84 A random sample is one drawn in such a way that each member of the total group to be sampled has an equal chance of being selected. The goal of randomization is to create samples representative of the population. For instance, in cohort studies, randomization involves randomly assigning members who have been exposed to the exposed group and members who have not been exposed to the unexposed group. In case-control studies, randomization involves randomly assigning members who have the effect to the case group and members who have do not have the effect to the control group.
Unfortunately, “randomization” has limitations. For instance, it is usually effective only in large studies. That is, the distribution of risk factors tend to become identical in the various study groups only as the size of the study groups increases.
- Restriction
“Restriction” is a technique to control for potential confounding by admitting only certain subjects into the study.85 For example, if the potential confounder is gender, only females might be admitted into the study. Unfortunately, like randomization, restriction also has limitations. For instance, it can compromise the external validity of a study. Simply, it tends to homogenize the study group. This tendency limits the ability to apply the results of the study beyond the study group to the more diversely constituted general population.
- Matching
“Matching” is the process of selecting controls in a case-control study (or unexposed subjects in a cohort study) so that they are similar to the cases (or exposed subjects in a cohort study) on characteristics considered to be potential confounders.86 Matching may be of two types: (1) group (or frequency) matching and (2) individual matching. In group matching, the controls are selected in a way that the proportion of controls with a certain characteristic is identical to the proportion of cases with that characteristic. In individual matching, a control is selected similar to the case with respect to the characteristics considered to be potential confounders. The result of matching is a series of groups of cases and controls, each group matched on potential confounders.
Matching also has limitations. First, and most importantly, what characteristics are matched should be, most probably, risk factors for the effect. Beyond those characteristics, matching results in a problem called “overmatching.”87 That problem can underestimate the relative risk or odds ratio. For the defense, this result is generally not a problem. Second, in case-control studies, matching introduces a bias toward the null value (OR=1). That problem can only be removed through stratification. For the defense, this result is also generally not a problem.
- Analysis of Data
The analysis of raw data has several stages.88 First, the data are edited. Editing involves checking the data for accuracy, consistency and completeness. Second, the data are summarized. Summarizing usually involves pigeonholing the data using 2 by 2 tables. Third, measures of effect or association are estimated. This involves using confidence intervals and significance testing. It also involves controlling for systematic error and potential confounders, using techniques known as “stratification” and “adjustment.”
- Stratification
“Stratification” is a way to assess the effects of possible confounders, to control for certain forms of selection bias, and to evaluate and describe effect-measure modification.89 Stratification involves grouping the data from subjects into sub-samples based on whether a subject has or does not have a characteristic that is a potential confounder. For example, if the epidemiologic study is attempting to ascertain the association between diet and heart attacks, and gender is considered a potential confounder, the data can be stratified on the basis of gender. So all males will be in one stratum; all females will be in another stratum. For each stratum is calculated the measure of association between diet and heart attacks. That measure of association is termed “stratum-specific.” Contrast that measure with the measure of association calculated from the unstratified data, termed the “crude” or “pooled” estimate of association.
- Adjustment
“Adjustment” is a statistical procedure to minimize the effects of differences in the composition of the samples in the study.90 Adjustment usually occurs by means of “standardization” or “regression modeling.”
- Standardization
“Standardization” is a way to remove, to some extent, the effects of confounding variables, such as age or gender, when comparing two or more populations.91 For instance, suppose an epidemiologist wants to compare the risk of death in various groups of people in different geographical areas while controlling for the difference in the risk of death resulting when one group in one geographical area (Palm Springs, California) is older than those in another group in a different geographical area (Portland, Oregon). To do this, the epidemiologist standardizes the distributions of age in both Palm Springs, California and Portland, Oregon by using the distribution of age in a “standard area” instead of the actual distributions of age in Palm Springs and Portland. This process “adjusts” the rates of death in those two geographical areas by controlling the confounder of age. The adjusted rates of death are then compared, and any difference in the rates is attributed to something other than age.
- Regression Modeling
“Regression modeling” is a statistical technique used to control confounding when stratification becomes impractical.91 For instance, as the number of strata increase, the subjects in the sample may become sparsely distributed across each stratum. When this occurs, regression modeling becomes a more practical way to estimate association. In case-control studies, for example, the most popular regression model is “linear logistic regression.”93 Linear logistic regression quantifies the association between an exposure and an effect after adjusting for other potential confounders. That is, it finds an equation of those independent variables that best predicts the effect.
In the logistic model, the conditional probability that disease (D) will occur given an exposure (E) is represented by the following linear function:
where logit is an abbreviation of logarithmic unit; ln is the natural logarithm; P(DE) is the probability of disease given exposure; a,β are the regression coefficients; and E may be dichotomized as 0 for no exposure and 1 for exposure, or expressed as a continuous variable. From the preceding equation is derived the following expression for the odds ratio:
Λ
OR = eβ
This simple model can be extended to account for potential confounders (Vι) and effect modifiers (Wj) :
where a and β are the regression coefficients; γj is the regression coefficient for potential confounders and δj is the regression coefficient for effect modifiers. (These regression coefficients may be estimated by either of two approaches: discriminant analysis or, the preferred approach, maximum likelihood estimation.)
From the preceding equation is derived the following expression for the odds ratio:
OR = exp [b]
The odds ratio is expressed as the exponential of the sum of the main effect and the effect modifiers.
- Post-Study Reanalysis
After a study is completed, an epidemiologist (usually other than the one who originally conducted the study) may reanalyze the data to correct for bias or confounders.94 For example, reanalysis may involve a change of control groups, stratification or re-stratification of the data, a change from a two-tailed to a one-tailed significance test, or an analysis of additional confounders. Reanalysis does require access to the raw data. That access will often be denied for a variety of reasons, including the need to preserve the independence of scientific investigation, confidentiality, and the need to preserve the integrity of the data.
Analysis of data is as much art as it is science. This is so because no one has a gold standard for choosing between alternative explanations of that data. One epidemiologist will favor one analysis of the data; another epidemiologist will favor another analysis. As a result, reanalysis of the same set of data in epidemiology is, some say, similar to replication of experiments.95 Obtaining similar results helps validate the results. Obtaining different results from one analysis to another may be due to different data or it may be due to flaws in the original analysis. Reanalysis is a way to ferret out those analytical errors. Those errors include, for example, failure to control for certain confounders; failure to test for interactions of variables and failure to use samples of sufficient size to provide sufficient power to detect weak associations. Unfortunately, conflicting analysis of the same data will cast doubt on all analyses of those data.
In litigation, plaintiffs are fond of reanalysis. These re-analyses are rarely published, and so rarely subjected to peer-review. So for the defense, when plaintiff mentions the word “reanalysis”, suspect the worst.96 Even so, re-analysis of data is not per se objectionable. For example, an epidemiologist may validate a reanalysis by explaining why she chose only certain of the available data and disregarded the remainder. But, although not objectionable per se, reanalysis is objectionable when used as an epidemiologic sleight-of-hand to bamboozle a judge or jury. In that context, it often becomes a form of “data dredging.” That is, plaintiff’s forensic epidemiologist reanalyzes the data in a variety of permutations until an association is revealed (as it always will be) and then stops, declaring stentorially the importance of this association in proving plaintiff’s claims.
VII. Sampling Error
1. Sample and Population
Epidemiologists strive to identify what exposures or risks in the population are associated with what effects or diseases. Obviously, if they could measure each relevant variable in every member of the population, the result would be relatively precise epidemiologic conclusions about what exposures or risks are associated with what effects in the population. But, unfortunately, they usually cannot practicably measure every member of the population. It’s too expensive, and not everyone would co-operate. And so, epidemiologists look at subsets or samples of the population and measure instead the aspect of interest in just members of those samples. From that limited number of measurements, they then attempt to generalize those findings to the population.
2. Sampling Error
If every one in the population were the same (such as the population of electrons), what could be said about anyone could also be said about everyone. But experience demonstrates that this is rare. Usually, in some respects, one member of a population differs from every other member. As a result, epidemiologists recognize that when members of the population are randomly sampled, each member of the sample may differ from each other member of the sample and from the remaining members of the population. And if a number of samples are randomly selected from the population, each sample will differ, in some respects, from each other sample.
For example, of two hundred fifty random samples (each of 100 people) drawn from a population constituted of 46% of men and 54% of women in a health study, the number of men in each sample was as follows:
51 40 49 34 36 43 42 45 48 47 51 47 50 54 39 42 47 43 46 51 43 53 43 51 42 49 46 44 55 36 49 44 43 45 42 45 43 55 53 49 46 45 42 48 44 43 41 44 47 54 39 52 43 36 39 43 46 47 44 55 50 53 55 45 43 47 40 47 40 51 45 56 40 49 47 45 49 41 43 45 54 49 50 44 46 48 52 45 47 50 43 46 44 47 46 54 42 44 47 36 52 50 51 48 46 45 54 48 46 41 49 37 49 45 50 43 54 39 55 38 49 44 43 47 51 46 51 49 42 50 48 52 54 47 51 49 44 37 43 41 48 39 50 41 48 47 50 48 46 37 41 55 43 48 44 40 50 58 47 48 45 52 35 45 41 35 38 44 50 44 35 48 49 35 41 37 46 49 42 53 47 48 36 51 45 43 52 46 49 51 44 51 39 45 44 40 50 46 50 49 47 45 49 39 44 48 42 47 38 53 47 48 51 49 45 42 46 49 45 42 45 53 54 47 43 41 49 48 35 55 58 35 47 52 43 45 44 46
That each sample did not contain 46 men is due to a phenomenon called “chance”, the luck of the draw, that part of experience which is unpredictable. From this example is drawn an important lesson: a measurement on members of a sample will differ from a measurement on the members of the population. The difference is called “sampling error.”97 Sampling error is equal to the sample value minus the population value.
Most, if not all, epidemiologic studies involve measurements on some aspect of members of a sample and not on all members of the population. So if inferences from those measurements are to be accurate, an adjustment must occur for sampling error. If no adjustment occurs, an epidemiologist may find an association when, in fact, none exists (a false positive finding) or may fail to find an association when, in fact, one does exist (a false negative finding).
Even so, it is important to recognize that statistical techniques designed to assess sampling error do not assess whether the claimed results are due to systematic error or bias. As one epidemiologist remarked, “full epidemiologic analysis assesses bias, confounding, causation and chance. Of these, chance is least important but still receives most attention.”
When considering sampling, identify the following:
- Sample size;
- Number of samples of that sample size drawn from the population; and
- All possible random samples of sample size x from a population of size y.
3. Probability
Accounting for sampling error involves the concept of “probability.” Namely, what is the “probability” that what is true for the sample, given the phenomenon of sampling error, is also true for the population? But pinning down the meaning of the concept of probability is somewhat problematic. “Probability,” as a concept, has two basic conceptions: the conception of probability known as “relative frequency” and the conception of probability known as “subjective probability.”98 Each describes a method for assigning a probability to an event.
The conception of probability known as “relative frequency” is a method for assigning a probability to an event based on how often that particular event occurs in a process that generates a variety of events.99 Statisticians who favor this conception of probability are sometimes called “frequentists.” Frequentists believe that if a process (called a statistical experiment) is repeated n times and an event A is observed f times, then the probability of A is the quotient of f divided by n. But this quotient, under this conception of probability, is only an approximation. Fortunately, the approximate value is considered to approach the actual value if the statistical experiment is repeated many times. This is said to occur owing to the “Law of Large Numbers.” This law states that the probability the arithmetic mean Sn/n will differ from its expected value μ by more than ε approaches zero as n → ∞.
But what is the use of the frequentist conception of probability when the process of sampling is not repeated or not repeated often enough? For example, what is the probability that plaintiff Smith will lose her lawsuit against defendant Jones? A clever response to this concern is the conception of probability known as “subjective probability.” By this conception of probability, the probability assigned to an event is based not on the law of large numbers, but on subjective judgment or experience, limited, importantly, by the basic laws of probability.100 Those basic laws include, for instance, that the probability of an event lies in the range 0 to 1 and that the sum of the probabilities of all simple events is 1. Statisticians who favor this conception of probability are sometimes called “subjectivists” or, more esoterically, “Bayesians.”
“Probability is the very guide of life.”
Cicero, De Natura
4. Type I Error
Epidemiologists, to carry out their work, compare samples to determine whether these samples are different. That is, in case-control studies, they compare samples of cases with samples of controls. And in cohort studies, they compare samples of exposed subjects with samples of unexposed subjects.
In assessing these comparisons, epidemiologists can decide as follows:
(1) No difference exists, beyond that due to chance, between the samples. This assessment can be correct or incorrect. If it is correct, the epidemiologist has committed no error. If it is incorrect, the epidemiologist has committed what is called a Type I error.101 That is, the epidemiologist has falsely concluded that no difference exists, beyond that due to chance, between the samples. This is a problem plaintiffs typically dislike. The probability of committing this type of error is called “alpha.”
alpha = Pr (H is rejected | H is true)
Epidemiologists conventionally set limits on that value of alpha which they consider acceptable for purposes of drawing valid inferences from the sample about the population. The value of alpha is conventionally set at .01, .025, .05 or .10. Most epidemiologists favor an alpha set at .05 owing to an overriding desire to avoid false positive results.102 Those who want alpha to be set at .10 or even .20 are more concerned with avoiding false negative results, and more tolerant of the risk of false positive results–that is, that sampling error will be counted as falsely representing that the exposure is positively associated with the effect. This is a risk plaintiffs are, of course, happy to accept.
(2) A difference exists, beyond that due to chance, between the samples. This assessment can be correct or incorrect. If it is correct, the epidemiologist has committed no error. If it is incorrect, the epidemiologist has committed what is called a Type II error.103 That is, the epidemiologist has falsely concluded that a difference exists, beyond that due to chance, between the samples. This is a problem defendants typically dislike.
5. Type II Error
A Type II error occurs, as mentioned, when the epidemiologist erroneously fails to reject the null hypothesis. The probability of committing this type of error is called “beta.”104
Beta = Pr(H is not rejected | H is false)
The values of alpha and beta are interdependent. More precisely, they are inversely proportional. So, for a fixed sample size, the value of alpha cannot be lowered without raising the value of beta, and the value of beta cannot be lowered without raising the value of alpha. Beta is conventionally set at .20, a value higher than the value conventionally set for alpha (.05). This means it is more difficult to commit a Type I error than it is to commit a Type II error. Obviously, this is a situation defendants favor and plaintiffs disfavor.
Following is an illustration of a Type I and Type II error. Plaintiff’s expert assayed the serum of a case group of 249 women with silicone breast implants (SBIs), and two control groups, one of 47 healthy women without SBIs, and another of 39 women with autoimmune disorders without SBIs. The expert claimed that 9 of the 249 women with SBIs in the case group had elevated levels or “titers” of antibodies to “protein-silicone complexes,” but none of the women in the control groups did. From these results, he concluded that he had developed a serologic test that detected antibodies to protein-silicone complexes.
Was this expert’s conclusion supported by the results of nine positive results out of 249 women with SBIs and zero positive results in his control groups? Actually, it was not. The expert failed to account for the effect of chance in interpreting the results of his serology test. He could have had nine positive results by chance alone from a sample size of 249. He could have had no positive results by chance alone in the control groups owing to their much smaller sizes. Because the sample size of the control groups was so small, the statistical test on these samples had too low a “power” to rule out the possibility that the negative results were false negatives.
6. Assessing Sampling Error
Epidemiologists assess sampling error in two basic ways: (1) through “hypothesis testing” and (2) through “estimation.”105
- Hypothesis Testing
Hypothesis about the role of sampling error may be “tested” using either “critical values” or “probability values.”106
- Critical Values
Hypothesis testing using critical values is analogous to a test graded as either pass or fail. If sampling error likely accounts for the results of the study, the results are reported as “not statistically significant,” a failing grade. If sampling error does not likely account for the results of the study, the results are reported as “statistically significant,” a passing grade.
Hypothesis testing is designed to determine statistically whether the two samples (the case and controls groups or the exposed and unexposed groups) are from the same population or, instead, from different populations. (If they are from the same population, the exposure is not associated with the effect). This task is complicated by the phenomenon of “sampling error.” That is, two samples may appear different, yet be the same with the apparent difference being due to sampling error. The epidemiologist tackles this problem by making an assumption: she assumes that the two samples are simply random samples from the same population. That is, she assumes, as true, the “null hypothesis.”
To test this assumption, she will then compute what is called a “test statistic.” This test statistic, a random variable, is a function of two basic variables (1) the difference between the relevant values observed in the sample in the study and those values that would have been expected if the null hypothesis were true and (2) the amount of variability of the values from the sample in the study.107 For example, when rates and proportions are analyzed, involving discrete, binary variables (as they are in cohort and case-control studies) the test statistic is either chi-square, when the expected frequency of observations in each cell of the fourfold table is at least 5, or the total number of exposed cases observed in the study in the Fisher exact test, when the expected frequency of observations in each cell is less than 5.108
χ² = Sum of (observed – expected number of individuals in cell) ²
Expected number of individuals in cell
This “test statistic” is used to test the null hypothesis. First, in the process of this test, the epidemiologist will construct a “sampling distribution” of the “test statistic.” To do this, she will draw all samples that can be possibly drawn from the population, calculating the test statistic for each such sample. So for that population, this sampling distribution will represent the range of values of the test statistic. When the epidemiologist plots the values of the test statistic, for the sampling distribution, the y axis is calibrated with the frequency of a particular value of the test statistic (from 0 to 1) and the x axis with the value of the test statistic. (This plotted distribution will indicate the probability of obtaining a particular value of a test statistic on the basis of sampling error alone since all samples are assumed to be drawn from the same population.)
This range of values provides a standard of comparison. Against this standard is compared the value of the test statistic computed from the samples being investigated, for example, the case and control groups in the study. This is called the “observed value” of the test statistic. (Again, the epidemiologist is comparing the observed test statistic for the sample to determine whether it likely came from the same population or from a different population.) If the value of this test statistic is “small,” the epidemiologist concludes that the sample giving rise to that test statistic was drawn from the same population and so accepts the null hypothesis. If the value of the test statistic is “big,” she concludes that the sample giving rise to that test statistic was not drawn from the same population and so rejects the null hypothesis.
What is the boundary between small and big? This boundary is established by convention. The epidemiologist selects a cut-off value on the x axis of this sampling distribution. The points on the x axis on that side of the critical value nearest the tail of the sampling distribution is called the “rejection region.” This region usually contains only 2.5% of the possible values of the test statistic plotted in the sampling distribution. This rejection region represents a probability value called “alpha.” (Commonly used values for alpha are .01, .025, .05 and .10.) A value of a test statistic (plotted on the x axis) is considered “big” if it is larger than this cut-off value (also plotted on the x axis) and falls within this rejection region. For instance, if the cut off value is 1.9 and the value of the observed test statistic is 1.2, the test statistic is considered “big.” Then the epidemiologist would conclude that the value of the observed test statistic was “statistically significant,” and reject the null hypothesis.
On what is meant by “statistically significant” and “statistically non-significant,” consider the remark of a renown statistician:
“The proper inference from a statistically significant result is that a nonzero association or difference has been established; it is not necessarily strong, sizable or important, just different from zero. Likewise, the proper conclusion from a statistically non-significant result is that the data have failed to establish the reality of the effect under investigation. Only if the study had adequate power would the conclusion be valid that no practically important effect exists. Otherwise, the cautious “not proven” is as far as one ought to go.”109
- Probability Values
Besides testing hypotheses with critical values, an epidemiologist or statistician may test hypotheses using the “probability-value” (or P-value).110 This method is similar to the preceding method using critical values. In this process, the first step is to state the null and alternate hypotheses. The second step is to select the appropriate sampling distribution for the test statistic. The third step is to determine the rejection and non-rejection regions by setting “alpha.” The fourth step is to calculate the value of the test statistic from the observed sample, and translate that value on the x axis into a probability or P-value. The P-value is the probability of obtaining a value of the test statistic as large as, or larger than, the one computed from the data when the null hypothesis is true. The fifth step is to reject the null hypothesis if the P-value is less than the value of alpha or accept the null hypothesis if the P-value is greater than the value of alpha. With this method, the decision to reject or accept the null hypothesis is based on comparing the P-value (a probability value for the test statistic computed from the observed sample) with alpha (another probability value conventionally established before the comparison).
Hypothesis testing is widely used in epidemiologic studies, but it is subject to the following criticisms: (1) hypothesis testing using critical or P-values is all or nothing decision-making, (2) the critical and P-values convey no information about the extent to which two groups differ or two variables are associated, (3) highly significant P-values can accompany negligible differences (if the sample sizes are large) and unimpressive P-values can accompany strong associations (if the sample sizes are small).111
- One-Sided or Two-Sided Hypothesis Test
The hypothesis test may be either one-sided (one-tailed) or two-sided (two-tailed).112 If the hypothesis test is one-sided, the alternative hypotheses would be (1) the null hypothesis that either a negative or no association exists between exposure and effect; (2) the alternate hypothesis that the exposure is positively associated with the effect. In testing these hypotheses, if alpha is set at .05, the entire .05 would be allocated to right side or “tail” of the sampling distribution.
If the hypothesis test is two-sided, the alternative hypotheses would be (1) the null hypothesis that no association exists between exposure and effect and (2) the alternate hypothesis that an association, either positive or negative, exists between exposure and effect. In testing these hypotheses, if alpha is set at .05, the .05 would be split with .025 being allocated to the left side or tail of the sampling distribution and .025 being allocated to the right side of the sampling distribution. If alpha is set at .10, the .10 would be split, with .05 being allocated to the left tail and .05 being allocated to the right tail. Some epidemiologists may set alpha at .10 considering that at that level it is equivalent to a one-sided test with alpha set at .05.
If the direction of the tested association can be predicted in advance, use of the one-sided test is justified. Otherwise, the epidemiologist should use the two-sided test. The defense prefers the two-sided test, particularly when the outcome of interest is strictly one-sided. Plaintiffs, of course, prefer the one-sided test, arguing that no one in the litigation is asserting that the exposure has a potentially helpful only a harmful effect. For the defense, this can be a sensitive issue when the result of the epidemiologic study is a positive association, statistically significant with a one-sided test but not with a two-sided test. In that event, the reasons for using the two-sided test should be at the ready.
- Estimation
Estimation is a statistical procedure to account for sampling error. By means of estimation, a value of a sample statistic is used to estimate the corresponding value of the population parameter.113 An estimate may be a “point estimate” or an “interval estimate.” A “point estimate” is the value of the measure of association calculated from the samples used to estimate the measure of association for the population. For instance, the point estimate of the odds ratio for the sample might be 2.67. The point estimate, being an estimate of the value for the population or “parameter,” may or may not be accurate. It has a margin of error.
To quantify this margin of error, the epidemiologist constructs around the point estimate an estimated “interval.” This interval, called the “confidence interval,” brackets the parameter, to some degree of probability. This degree of probability is called the “level of confidence.” The level of confidence is expressed as a percentage as (1 minus alpha) 100%. This confidence interval provides a range of values among which it is hoped is the true value of the parameter. For instance, a 95% confidence interval for the odds ratio might be 1.03 – 6.9. The confidence level of 95% gives the percentage of all such possible intervals that will actually include the true parameter. Commonly used confidence levels are 99%, 95% and 90%. The width of the confidence interval (1.03 to 6.9) reflects the precision of the estimate. The narrower the interval, the more precise the estimate. The wider the interval, the less precise the estimate. The width of the confidence interval will narrow if the confidence interval is lowered or if the size of the sample is increased. Of course, the preferred alternative, to improve precision, is not to lower the confidence level but to increase the size of the sample.
The following formula indicates how many possible intervals exist for a population of a certain size with a sample of a certain size being drawn from that population:
Population size!
Sample size! (sample size – 1)!
The specific 95% confidence interval associated with a given set of data may or may not actually include the true parameter. No one can know for sure one way or the other. The specific 95% confidence interval obtained depends on the specific random sample drawn from the population. So each sample will have a different 95% confidence interval. But one can be sure that in the long run, 95% of all possible 95% confidence intervals will include the true parameter.
For example, if the exposure is benzene and the effect is leukemia, the results of a case-control study may be as represented in the following 2 by 2 table:
Benzene
Leukemia
Exposed
Not Exposed
Total
Yes
24
6
30
No
90
60
150
Total
114
66
For these data, the estimated odds ratio is (24)(60)/(90)(6) = 2.67. The sampling distribution of the estimated odds ratio will be non-normal with a positive skewness. For skewed distributions, the log transformation will result in a distribution with a more normal shape. For this reason ln (OR) is used for calculating confidence intervals.114
A confidence interval estimate for the population odds ratio is obtained by first taking the loge of the estimated odds ratio, using the standard error of the ln (OR) to construct a confidence interval for ln (OR), and finally exponentiating to obtain a confidence interval for the odds ratio.
The log odds is ln (2.67) = 0.981. The estimated variance of the log odds is :
Var [ln (OR)] = 1/24 + 1/90 + 1/6 + 1/60 = 0.236.
The 95% confidence interval for ln (OR) is:
.0981 – 1.96√0.236 ≤ ln (OR) ≤ 0.981 + 1.96 √ 0.236
0.029 ≤ ln (OR) ≤ 1.93
Exponentiating to retransform the odds ratio to original units:
e 0.029 ≤ OR ≤ e 1.93
1.03 ≤ OR ≤ 6.9
Confidence intervals can also be used to test hypotheses.115 (This is an alternative to hypothesis testing). If the confidence interval associated with a set of data includes the null value for that statistic (e.g. 95% CI .90 –2.5), the epidemiologist concludes that insufficient evidence exists to reject the hypothesis of no effect with P > alpha. If the confidence interval does not include the null value for that statistic (e.g. 95% CI 2.5-6.9), the epidemiologist concludes that sufficient evidence exists to reject the hypothesis of no effect with P< alpha. Even so, the epidemiologist cannot say where within that interval lies the true parameter. So, if the confidence interval contains a null value for that statistic, the evidence is not sufficient to rule out the possibility that the null hypothesis is true.
Two advantages exists for testing hypotheses with confidence intervals: (1) the null hypothesis can be rejected when the confidence interval does not include the null value of the statistic; and (2) information about the magnitude of the value of the statistic is provided. It identifies small but non-null values for the statistic arising from large samples.116
Confidence intervals have a limitation. Namely, they are based on the assumptions of the relative frequency conception of probability. That is, for example, if the observation in the study was repeated 100 times, 95 of the observations would produce 95% confidence limits bracketing the true parameter. So the confidence limits of a single study do not allow one to say that the 95% confidence limits contain the true parameter with 95% probability.117
Today, most epidemiologists provide both confidence intervals and P-values. For example, in a retrospective cohort study about the relationship between exposure to silicone in silicone breast implants and sclerodema, the statistical values were reported as follows:
Relative Risk
95% Confidence Interval
P
1.84
0.98 – 3.46
.060
This study indicates that the 95% confidence interval contains the null value of the statistic (RR=1) and so the null hypotheses cannot be rejected. The study also indicates that the relative risk of 1.84 was not statistically significant with a P-value of .060.
Each epidemiologic study should describe its statistical methods. Yet, typically, this description will be brief, too sketchy perhaps to help assess whether or not those methods are appropriate. Even with more detail, the reader may not be able to double check the statistical analysis without having the raw data of the study. In the typical case-control study, for example, the description of the statistical method is elliptical. For instance, in a study exploring the relationship between coffee consumption and pancreatic cancer, the authors described the statistical method as follows:
“Tests of significance and estimates of adjusted relative risks and their confidence limits were derived with the method of Mantel and Haenszel and its extension. That data were stratified by age in 10-year groups and by sex where appropriate. All confidence limits are 95 per cent intervals.”118
The test statistic is the Mantel-Haenszel test statistic following a chi-squared distribution with one degree of freedom.
Logarithms are used when the values of the independent variables are so significantly larger than the values of the dependent variables that it would be difficult to work with resulting disparately different magnitudes between the scales on the x and y axes.
The likelihood, when evaluated for a particular value of the parameter, can turn out to be a very small number. So it is more convenient to use the natural logarithm of the likelihood in place of the likelihood.
Hypothesis testing of the odds ratio using the chi-squared statistic follows a chi-squared distribution with one degree of freedom. That distribution is non-normal and positively skewed. If alpha is set at .05 the hypothesis test would be one-sided.
The confidence limits of an estimated odds ratio can be calculated using one of several methods.
To recap, with this method, the decision to reject or accept the null hypothesis is based on comparing the value of the test statistic for the observed sample with the value of the test statistic at that point on the x axis of the sampling distribution known as the “critical value.” The critical value sets a boundary point on the x axis. From that critical value along the x axis to the closest tail of the sampling distribution represents the values for which the null hypothesis will be rejected. These values on the x axis lie under the area of the sampling distribution called “alpha.” The area of the sampling distribution is called “alpha.” If the value of the test statistic computed from the observed sample falls on that portion of the x axis in the rejection region, then the epidemiologist will reject the null hypothesis. If otherwise, she will conclude the value of the observed test statistic was not statistically significant, and not reject the null hypothesis.
7. Power and Sample Size
Sometimes, epidemiologic studies will have the weakness of low “power.” “Power” is defined as the probability of correctly rejecting the null hypothesis.119 It is equal to one minus “beta.” Beta is the probability of erroneously accepting the null hypothesis. The power of a statistical test should be as high as possible. The higher the power, the greater likelihood small effects, if they exist, will be detected.120
Power can be increased by reducing beta. The smaller the value of beta, the greater the value of one minus beta, or power. But because the value of beta is interdependent with the value of alpha, given a fixed sample size, reducing the value of beta increases the value of alpha. The greater the value of alpha, the greater the probability of erroneously rejecting the null hypothesis. This is a dilemma, but a dilemma with a solution. The solution is to increase the size of the sample. Only by doing that can power be increased without sacrificing precision.121 Yet herein lies a further dilemma. When the effect and the exposure are both rare, achieving adequate power requires very large samples.122
The optimal size of a sample is usually determined by formulas. To determine the smallest value for a measure of association that can be detected with a specific power, given a fixed sample size, consult S.D. Walter, Determination of Significant Relative Risks and Optimal Sampling Procedures in Prospective and Retrospective Comparative Studies of Various Sizes. Am. J. Epidemiology, 105:387-397 (1977). The size of the sample should be assessed not only at the time of its selection but also throughout the course of the study.
An epidemiologic study will often refer to its power to detect quotients of various strengths. For instance, in an epidemiologic study on breast implants was the following remark about the power of the study:
“For breast implants, considering only white cases and controls, we had 80% power to detect an OR of 4.0, assuming alpha = 0.05 (2 sided) and a prevalence of breast implants of 10/1000. When considering all cases and controls in the analysis, we had 80% power to detect an OR of 2.0 or greater for a given risk factor, or an OR of 0.29 or smaller for a given protective factor, assuming a prevalence of exposure of 5/100 among controls (e.g., the approximate prevalence we observed for exposure to silicone caulk, glues, or sealants was 4/100).”
What is the appropriate ratio of controls to cases? When the number of cases is limited, increases in the number of controls increases power until a ratio is reached of 4 to 1 or 5 to 1. After that, gains in power usually become too small to be worthwhile. Under equal allocations to the case and control groups, when the power is extremely small (<0.1) or large (>0.9), increasing the number of controls will be unhelpful.123
8. Meta-Analysis
To overcome the flaw of low power, an epidemiologist may resort to a technique that combines the results of several epidemiologic studies. This technique is called “meta-analysis.”124 Meta-analysis involves the following six basic steps: (1) defining the problem and the criteria for admitting epidemiologic studies into the meta-analysis; (2) locating these epidemiologic studies; (3) classifying and coding important characteristics of those studies; (4) quantitatively measuring those characteristics on a common scale; (5) aggregating the findings of those studies and relating those findings to the characteristics of the studies; and (6) reporting the results of the meta-analysis. Good examples of meta-analysis are the studies by Heyland, D.K. et. al. Total Parenteral Nutrition in the Critically Ill Patient. JAMA, 280:2013-2019 (1998) and by Janowsky E.C. et al. Meta-Analyses of the Relation Between Silicone Breast Implants and the Risk of Connective-Tissue Diseases. NEJM, 342:781-190 (2000).
Statistical analysis of the results of those various studies would include the following: (1) calculating summary descriptive statistics across the epidemiologic studies and averaging those statistics; (2) calculating the variance of a statistic across studies; (3) correcting the variance by subtracting sampling error; (4) correcting the mean and variance for study artifacts other than sampling and measurement errors; and (5) comparing the corrected standard deviation to the mean to assess the size of potential variation across those studies.125
Meta-analysis of observational studies has its detractors.126 As a prominent epidemiologist remarked, “the meta-analysis of non-randomized observational studies resembles the attempt of a quadriplegic person to climb Mount Everest unaided.” The reason for this pessimism is that in observational studies, particularly case-control studies, are likely found biases and unmeasured confounders. These serious deficiencies cannot be overcome by carefully combining the data. Moreover, in meta-analysis, it’s important to have identified all relevant studies. That task, unfortunately, cannot always be accomplished owing to what is called “publication bias.”127 Publication bias refers to the phenomenon in which studies with positive results are more likely to be published than those with null or negative results. This bias concerns the defense. Simply, it tends to result in finding, through meta-analysis, that the exposure is positively associated with the effect.
The accuracy of meta-analysis can be evaluated by comparing its results to a gold standard. In medical research, the gold standard is the large randomized, controlled clinical trial. So meta-analysis would become more credible if the results of meta-analyses agreed with the subsequent outcomes, on the same topic, of large randomized, controlled clinical trials. That prospect was recently studied, and it was found that meta-analyses failed to predict the outcomes of twelve large randomized, controlled trials 35% of the time.128 Obviously, that result raises serious questions about the accuracy of meta-analyses.
Yet that study does not entirely undermine use of meta-analysis. It does suggest that summarizing information from these various epidemiologic studies into a single odds ratio may be unproductive. But meta-analysis may be productive to the extent it facilitates a careful analysis of the relevant epidemiologic studies, evaluates the consistency of their results, and finds these results to be consistent.129 So the defense should expect an epidemiologist conducting a meta-analysis to individually evaluate each study for possible biases and confounders, and, on the basis of those evaluations and the quality of the study, consider the evidential weight of each study in the overall analysis and then identify the probable direction and magnitude of the biases in the estimate of the summary odds ratio or relative risk.
9. “Alpha” and the Burden of Proof
The level of alpha is sometimes equated with the legal burden of proof.130 Those who equate the two often complain that to set alpha at .05 is too stringent a requirement in light of a much less stringent requirement for the legal burden of proof set at .5+. But this criticism misconstrues both alpha and the legal burden of proof. The two simply do not equate.131
The legal burden of proof is that weight of evidence, of all the evidence introduced into evidence, which plaintiff must have in its favor to prevail on that issue of fact. In a civil trial, that weight is to be “the greater weight.” So if the total weight of all the evidence introduced is 100% of the weight, then the greater weight is more than 50% of that total weight.
That legal burden of proof entails a measure of probability which the jury applies subjectively. But the jury does not apply that measure according to the conception of probability known as “subjective probability.” Nor does it apply that burden to each piece of evidence. Instead, it applies that standard after all the evidence is introduced, and then only to the corpus of evidence for each party. Moreover, the jury is typically instructed not to base its decision on speculation.
The level of alpha is different. Alpha is a variable in a calculation to determine sampling error. It means, if set at .05, that if the exposure is assumed not to cause the effect, less than a 5% chance exists of observing data generating this particular value of the test statistic. Obviously, then, a concern about alpha is a concern about the extent to which sampling error (that is, speculation) underlies a conclusion about an association between exposure and effect. Again, in that context, alpha is a concern about one piece of evidence. If many pieces of evidence were introduced into evidence, each with alpha set at .05, the corpus of evidence would most likely be utterly speculative. That is, if the probability of any piece of evidence being due to chance or speculation is .05, the probability of the corpus of evidence being due to chance is [1-(.95)n]with n representing the total of the various pieces of evidence.
Sampling error, moreover, is merely one variable among many that may affect the validity of a conclusion about an association and, derivatively, about causation. Other variables affecting validity include, for example, systematic bias, confounding, and the use of inappropriate statistical analyses. The legal burden of proof is concerned indirectly with alpha, but also with these other variables. So the legal burden of proof is better equated with the probability of having certain values for all these variables than solely with the value of alpha.
VIII. Causation
Once an exposure has been revealed to be associated with an effect, epidemiologists are often the interested in building upon that association in an effort to argue that the exposure caused the effect. In this effort epidemiologists are often guided by criteria known as the Bradford-Hill criteria. If an epidemiologic study reports that the exposure is positively associated with the effect, plaintiff will certainly assert that this reported association demonstrates that the exposure causes the effect. If the study is a case-control study, that assertion will almost always be premature and probably ultimately incorrect. Of course, if an epidemiologic study reports that the exposure is not positively associated with the effect, defendant will assert that the study does truly demonstrate that the exposure does not cause the effect. In so doing, defendant will likely be justified.
1. Causation Defined
Causation is a concept with two important conceptions: the legal and the scientific conceptions of causation.
- Legal Conception of Causation
A prima facie element of all personal injury cases, to be proved by a preponderance of the evidence, is that the alleged tortious conduct caused the alleged harm or injury. Legal causation is assessed with either the “but for” or the “substantial factor” rule.
- “But For” Rule
The “but for” rule provides that defendant’s conduct is a cause of the event if that event would not have occurred but for defendant’s conduct, that is, if that event would not have occurred without defendant’s conduct.148 The “but for” rule has uncertain application when, for example, two defendants each initiate a cause and those two causes concur to bring about an event and neither cause operating alone would have been sufficient to cause that event.
- Substantial Factor Rule
The “substantial factor” rule was developed to avoid that uncertainty. It provides that defendant’s conduct is a cause of the event if that conduct was a material element and substantial factor in bringing it about.149 This rule can be also stated as providing that when the conduct of two or more actors is so related to an event that their combined conduct, viewed as a whole, is a “but for” cause of the event, and application of the “but for” rule to them individually would absolve all of them, the conduct of each is a cause of the event.
- Scientific Conception of Causation
The scientific conception of causation is varied and invariably complex.150 Given that, the aim here is not to outline in detail various philosophically sophisticated explanations of the concept of causation. Instead, it is merely to provide notice that the concept of causation in science, because it is varied and complex, should not be taken for granted. Yet, however nuanced, that scientific conception certainly contemplates that the cause of any event or effect must consist of a constellation of components acting in concert. A cause of an effect, then, is a set of minimal conditions and events that inevitably produce the effect.151
- Conditional Theory of Causation
Some conceive of causation in terms of “sufficient” and “necessary” conditions. Cause A is a “necessary” condition of effect B if it is not the case that A exists, then it is not the case that B exists. Cause A is a “sufficient” condition of effect B if it is the case that A exists, then it is the case that B exists. A causes B just when A is necessary and sufficient for B. On its face, this definition of causation has the merit of logical precision but, ultimately, it is too narrow.152
- J. L. Mackie’s Modification of the Conditional Theory
J.L. Mackie, a philosopher at Oxford, refined the conditional theory of causation.153 He proposed that A is the cause of B when, given certain other conditions, A is sufficient for B. That is, “in the circumstances,” A is sufficient for B. While A is itself not necessary for B, it is a necessary part of a wider condition. And while A is not sufficient for B, this wider condition is sufficient for B. That is, A is an insufficient but necessary part of a set of conditions which is sufficient though not necessary for B.
- Counterfactual Theory of Causation
Some want to highlight the fact that an event could be caused by any number of other events, but that that event would not have occurred in the circumstances at hand had a particular event not occurred.154 This particular causative event is a necessary condition in these circumstances for the effect. This necessary condition in these circumstances is termed a “counterfactual” condition. That is, if A caused B, then, given the circumstances, if A had not occurred, B would not have occurred. A caused B when a chain of counterfactually dependent events link A and B. An epidemiologist would say, then, that “a cause of a disease event is an event, condition, or characteristic that preceded the disease event and without which the disease event either would not have occurred at all or would not have occurred until some later time.”
The “cause” of an effect is an antecedent event, condition, or characteristic necessary for the occurrence of the effect given that other conditions are fixed.
“Causal co-action” or “joint action” is the participation of two component causes in the same sufficient cause to produce the effect.
The cause is a part of a wider set of conditions which suffices for its effect.
The cause of any effect must consist of a constellation of component causes acting in concert. For biological effects, most of the components are unknown.
A “sufficient” cause is a set of minimally necessary conditions and events inevitably producing the effect.
The “causal complement” of a factor is the set of conditions necessary and sufficient for the factor to produce the effect.
2. General and Specific Causation
“General causation” is distinguished from “specific” causation. General causation refers to the causal relationship between the exposure and the effect: Does the exposure cause this effect in those exposed? For instance, does L-tryptophan manufactured by Showa Denko KK cause connective tissue disorders in people who ingested Showa Denko KK’s L-tryptophan? Proof of general causation often requires epidemiologic evidence.155 When epidemiologic studies are proffered to prove general causation, plaintiff must show that these studies “fit” the issues of causation in this particular case.156 That is, she must establish, for example, that she was exposed to the same substance as the subjects in the epidemiologic studies, that the exposure or dose levels were comparable, the exposure occurred before onset of the disorder and onset of the injury was similar to those in the studies.
“Specific causation” refers to the causal relationship between an exposure and an effect, given a relationship of general causation, in a particular individual: Did this particular exposure of this plaintiff cause this particular effect? For instance, did the L-tryptophan that plaintiff bought from this store and consumed by her on this date cause these particular signs and symptoms diagnosed by her physician as the connective tissue disease known as scleroderma?
Epidemiological studies cannot be proffered to directly prove “specific causation.”157 This is so simply because epidemiologic studies indicate what on the average is occurring in the study population. In that population, for some people, the exposure is associated with the effect; for others, it is not. That is, some people have a predisposition which, when they are exposed, results in their susceptibility to produce the effect. Others lack that predisposition. Indeed, some would have developed the effect even if not exposed.
These kinds of distinctions for specific people are not sorted out in epidemiologic studies. Instead, epidemiologic studies can only demonstrate that, generally, an exposure is associated or is not associated with an effect. So, in keeping with that, an epidemiologist should not be allowed to testify about specific causation except to acknowledge that she has no expertise in clinical medicine to assess whether or not a particular exposure caused a particular effect.
3. Proof of General Causation
Proof of general causation involves two basic considerations. First is, what rules of inference are appropriate in assessing whether the epidemiologic data are a sign that this kind of exposure can cause this kind of effect? Second is, at trial, what legal rules apply in proving causation?
- Rules of Inference
How does the epidemiologist bridge the gap between the results of an epidemiological study reporting a positive association and the conclusion that the exposure caused the effect? One way is by the process of induction.158 Adherents of this process are called “inductivists.” Another way is by the process of deduction.159 Adherents of this process are called “deductivists.” Epidemiologists, in approaching the issue of assessing causation, tend to be either inductivists or deductivists.
- Inductive Criteria
Inductivism is the doctrine that science begins with observations and then moves to generalizations about those observations in the form of laws and theories and then to predictions entailed by those theories and then to tests of those predictions with further observations to determine whether the theory is valid. For example, after observing that all ravens they have ever seen are black, ornithologists infer the law that all ravens are black, and predict that all ravens anyone will ever see will be black, and then confirm or disconfirm that prediction through further observations of ravens.
To infer causation, epidemiologists often rely upon inductive criteria. The original inductive criteria fashioned to establish causation from exposure to biological agents are known as the Henle-Koch postulates. They are: (1) the parasite occurs in every case of the disease in question and under circumstances which can account for the pathological changes and clinical course of the disease; (2) it occurs in no other disease as a fortuitous and nonpathogenic parasite; and (3) after being fully isolated from the body and repeatedly grown in pure culture, it can induce the disease anew.160
Over the years these Henle-Koch postulates were adapted to include a variety of exposures. Currently they have been generalized into widely used set of inductive criteria termed the “Bradford-Hill” criteria.161 Here are those nine criteria:
(1) Strength of Association: The stronger the association, the more likely the exposure caused the effect. A strong association is unlikely due to one weak unmeasured confounder or other source of modest bias. This criterion is neither necessary nor sufficient for causation.
(2) Consistency: Repeated observation of an association in different populations in different circumstances suggests causation. Ideally, many studies with different architecture should produce results that converge.162 When that occurs, the corpus of epidemiologic evidence can be said to be reliable. Consistency, some say, is not a necessary criterion of causation, but serves only to rule out hypotheses that the association is attributable to some factor that varies across studies.
(3) Specificity: Specificity requires that an exposure, if causal, produce a single effect, not multiple effects. This criterion, many epidemiologists recognize, is neither necessary nor sufficient for causation. Simply, single events or conditions may have many effects.
(4) Temporality: The cause must precede the effect. This criterion is necessary for causation. But it is not sufficient: As Shakespeare wrote, “I have heard the cock, that is the trumpet to the morn, doth with his lofty and shrill-sounding throat awake the god of day… .”
(5) Biologic Gradient: Biologic gradient refers to the presence of a monotonic, that is, unidirectional, dose-response curve. This criterion is neither necessary nor sufficient for causation.
(6) Plausibility: Plausibility refers to the biologic likelihood that the exposure caused the effect. This criterion is neither necessary nor sufficient for causation.
(7) Coherence: Coherence, like the criterion of bioplausibility, requires that the hypothesis about causation not conflict with what is known about the natural history and biology of the disease. This criterion is neither necessary nor sufficient for causation.
(8) Experimental Evidence: Experimental evidence from available sources corroborates the hypothesis of causation. This criterion is neither necessary nor sufficient for causation.
(9) Analogy: Analogy refers to the similarity between the association at issue and other associations which are considered more firmly to be cause-and-effect relationships. This criterion is neither necessary nor sufficient for causation.
Some epidemiologists have taken these Bradford-Hill criteria and categorized by them by the weight of their importance.163 This scheme of categorization is as follows:
“Guidelines for Evaluating the Evidence of a Causal Relationship. (In Each Category, Studies are Listed in Descending Priority Order).
1. Major Criteria
a. Temporal Relationship: An intervention can be considered evidence of a reduction in risk of disease or abnormality only if the intervention was applied prior to the time the disease or abnormality would have developed.
b. Biological Plausibility: A biologically plausible mechanism should be able to explain why such a relationship would be expected to occur.
c. Consistency: Single studies are rarely definitive. Study findings that are replicated in different populations and by different investigators carry more weight than those that are not. If the findings of studies are inconsistent, the inconsistency must be explained.
d. Alternative Explanations (confounding): The extent to which alternative explanations have been proposed is an important criterion in judging causality.
2. Other Considerations
a. Dose-response relationship: If a factor is indeed the cause of a disease, usually (but not invariably) the greater the exposure to the factor, the greater the risk of the disease. Such a dose-response relationship may not always be seen because many important biologic relationships are dichotomous, and must reach a threshold level for observed effects.
b. Strength of the Association: The strength of the association is usually measured by the extent to which the relative risk or odds depart from unity, either above 1 (in the case of disease-causing exposures) or below 1 (in the case of preventive interventions).
c. Cessation Effects: If an intervention has a beneficial effect, then the benefit should cease when it is removed from a population (unless a carryover effect is operant).”
“Despite the apparent simplicity of many of these criteria, many epidemiologists would probably agree that [the Bradford-Hill] criteria are not totally adequate, that they provide few hard and fast rules for making causal inferences.”
- Deductive Criteria
Deductivism is the doctrine that science begins with hypotheses and then moves to observations that can either confirm or disconfirm those hypotheses. For instance, the most influential modern deductivist, Karl Popper, believed that scientists postulate an hypothesis, an uncorroborated conjecture, and then compare its predictions with observations obtained through testing to see whether it is confirmed or disconfirmed. If the test produces data inconsistent with the conjecture, then the conjecture is refuted or falsified. If the test produces data consistent with the conjecture, then scientists continue to favor itnot as proven, but not as refuted.164 This notion is the crux of deductivism. Simply, deductivism aims at finding the truth only in a limited sense. That is, it only proceeds to rule out false theories. According to Popper, it is conjectures all the way down. So what matters in science is not the foundation of a conjecture but the quality of a conjecture. And what distinguishes conjectures in science from conjectures in other disciplines is that conjectures in science can be falsifiable.
Some epidemiologists, the inductivists, resort to positive evidence to establish causation. When an epidemiologist has controlled for all sources of error she can identify, she then faces the possibility of unidentified sources of error. Positive evidence in the form of inductive criteria such as the Henle-Koch or Bradford-Hill criteria address these unspecified sources of error in the process of assessing causation. But these inductive criteria, while they help evaluate the likelihood of such unspecified bias, cannot eliminate that possibility. Instead, they are designed merely to help evaluate the likelihood of such bias. So, in the face of unspecified sources of error, these inductive criteria help provide a basis for concluding the exposure causes the effect.
But, say the deductivists, these positive criteria are nothing more than untestable tautologies.165 This is so because any statement about the existence of an unspecified source of error is untestable. Both inductivists and deductivists would agree that inductive criteria cannot establish that an association is certainly causal. But inductivists would part ways with deductivists when deductivists maintain that inductive criteria cannot even establish that an association is even probably causal, owing to the fact that the inductive criteria are nothing more than untestable tautologies. To infer probabilistically, argue deductivists, an epidemiologist needs to know something about the universe or sample space to which the inference applies; in this case, that universe would be all possible associations between exposures and effects. How could any epidemiologist hope to ascertain that? Assume all possible associations have been identified, how could the epidemiologist then determine which are causal and which non causal? She cannot point to a list of “established” causal associations to assess causation. Obviously, if she could, she would not need the inductivists criteria as a surrogate for her gold standard. Given this apparent dilemma, deductivists are quite skeptical of any claim that the results of an epidemiologic study demonstrate that an exposure caused an effect.166 Of course, this skepticism is quite disconcerting to plaintiffs seeking to prove that an exposure caused an effect.
But , as a concern to the defense, just as no positive evidence exists to support an assertion of causation, no “negative” evidence exists, say the deductivists, to support an assertion that an association is not causal. Suppose an epidemiologic study results in a positive association between exposure and effect. The deductivist would assert that this association may be due to errors in the study. But the deductivist would further assert that the mere fact that error is possible is not a basis for concluding that no true association exists. Some of these errors may be subsequently identified. But even if further errors are identified, that discovery is not a basis for concluding that no true association exists. The direction of those errors would also need to be identified. But even then, that the direction of the error is identified is not a basis for concluding no true association exists. The effect of the error on the magnitude of the odds ratio would also need to be ascertained. But even then, that the magnitude of the error is identified to be insufficient to account for all the magnitude of the association is not a basis for concluding that an association is thereby proof of causation. Unidentified errors may still exist in the study. Even so, say the deductivists, this possibility is not a basis for concluding no true association exists or that the association is not representative of causation.
The deductivist’s alternatives to the inductive criteria of Bradford-Hill are simply “predictability” and “testability.”167
(1) Predictability: Predictability means that once a hypothesis about causation has been proposed, certain kinds of predictions can be deduced from it in order to compare those predictions with empirical observations.
(2) Testability: Testability means that the predicted consequences of the hypothesis are capable of conflicting with observations and that everything has been done to improve the opportunity for those conflicts.
But again all these criteria can hope to accomplish, if satisfied, is to allow one to say that a theory is not an untestable tautology but instead a conjecture that can be falsified if the relevant data came to be observed.
- Legal Rules
As a matter of policy, courts must adopt the view of the inductivists in order to enable plaintiffs an opportunity to prove causation through epidemiologic studies and other types of evidence. But what a court may grant as a matter of policy in one realm it may take away as a matter of public policy in another. An increasing number of courts have ruled that the results of an epidemiologic study cannot be admitted into evidence unless the measure of association has a quotient greater than 2.168 They reason that when that quotient is greater than two, the probability is “more likely than not,” in keeping with plaintiff’s burden of proof, that the exposure “caused” the effect.
This judicial inference needs parsing. A quotient (RR or OR) greater than 2 indicates that of the total number of individuals with the effect subject to study, more than half (51%) have effects associated with the exposure and less than half (49%) have effects associated with other background risks. Obviously, epidemiologic evidence satisfying this requirement should not be considered, by that fact alone, “sufficient” proof of general causation. It should only be considered, by that fact alone, a “necessary” constituent of a greater body of evidence introduced to prove general causation.
This judicial rule excluding epidemiologic studies with a quotient less than 2+ is considered a boon to the defense. It keeps from the jury epidemiologic studies with weak associations which may otherwise be difficult to rebut. Yet the inferential leap implicit in this judicial rule – equating a measure of association with a quotient greater than two with the standard of proof by a preponderance of the evidence – carries with it some subtle assumptions, assumptions which the defense needs to occasionally challenge. A working assumption is that the study is internally valid–that is, the strength of the association is not due to sampling error, systematic bias or unaccounted for confounders. Certainly the defense will argue that one or more of these problems is at play, and so the quotient cannot be taken at face value. Another subtle assumption is that a quotient greater than two is the sine qua non indicia of causation. This rule, if the jury becomes aware of it, also tends to lend the imprimatur of the court to the validity of the epidemiologic study: “This study has met the benchmark of the court and merits your undivided attention as proof of causation.” Neither of these assumptions the defense should let go unchallenged. Another nettlesome assumption is that a quotient of 2.1 is more than adequate proof of causation. Yet many epidemiologists believe that the quotient should be three or four or even greater in order to suggest that an association indicates causation. This is a chorus which the defense should join.
And although this judicial rule is overall a boon to the defense, it rests more on judicial policymaking than on logic. Simply, the standard of proof in the courtroom is not conceptually equivalent to a measure of association with a quotient greater than 2.169 Proponents of this judicial rule appear to equate the conception of probability inherent in the particular standard of proof known as “by a preponderance of the evidence” with the conception of probability which some extract from a measure of association with a quotient greater than two.
Neither conception appears to track with any formal conception of probability. First, plaintiff’s standard of proof is not semantically equivalent to any formal conception of probability. Jurors, when instructed on the concept of burden of proof, are not told to apply that concept in a way consistent with “subjective” (or Bayesian) probability. The jury instruction on the burden of proof is couched in a general way: for instance, “the preponderance of evidence is such evidence that, when weighed with that opposed to it, has more convincing force and is more probably true.” In that context, the jury considers plaintiff’s evidence relative to defendant’s evidence not relative to a wider context of beliefs. The jury is instructed to consider only that information “in evidence.” No jury is instructed on Bayes’ formula170 or about the basic rules of formal probability such as (1) that the probability of an event occurring measured by some number 0 and 1 should be equal to 1 minus the probability of the event not occurring or (2) that the probability of several independent events all occurring is the product of their respective probabilities.
Second, a measure of association with a quotient greater than two does not equate, by itself, with any formal conception of probability. (It could with additional data become a variable in an assessment of subjective probability). By itself, it merely means that of the total number of individuals with the effect subject to study, more than half (51%) have effects associated with the exposure and less than half (49%) have effects associated with other background risks. That quotient from this particular epidemiologic study is but one piece of evidence out of many pieces the jury will likely consider. The only relation of the strength of an association to the task of the jury in determining whether the plaintiff’s evidence on general causation is more probably true than not true is that, as one indicia among the many of Bradford-Hill criteria, the greater the strength of association, the more likely that the association is not due to systematic bias or unaccounted for confounders. But, as a renown epidemiologist notes, that single indicia is neither necessary nor sufficient for causation:
“The fact that an association is weak does not rule out a causal connection. A commonly cited counterexample is the relation between cigarette smoking and cardiovascular disease: One explanation for this relation being weak is that cardiovascular disease is common, making any ratio measure of effect comparatively small compared with ratio measures for diseases that are less common. Nevertheless, cigarette smoking is not seriously doubted as a cause of cardiovascular disease. * * * A strong association serves only to rule out hypotheses that the association is entirely due to one weak unmeasured confounder or other source of modest bias.” 171
Despite these criticisms of the judicial rule, it is a rule the defense should vigorously continue to support. An epidemiologic study reporting a weak association can be unduly persuasive to a jury. Simply, a jury is apt to equate a finding of a positive association, no matter how weak or unreliable, with a finding of causation. Sadly, it is the lax kind of thinking to which juries are prone.172
The standard for admitting proffered evidence into evidence is different from the standard of proof for the jury in resolving factual issues.
Most courts instruct the jury not to decide the issues of fact on the basis of guesswork, conjecture or speculation.
IX. Endnotes
1. Liddell, H.G. and Scott, R. Greek-English Lexicon (Oxford, 1983) Some would say it is the study (“logos”) of what is among (“epi”) the people (“demos”).
2. Epidemiology is the study of the incidence, prevalence, distribution and etiology of states of health in the population. Lilienfeld, D.E. Definitions of Epidemiology. Am.J. of Epidemiology, 107: 87-90 (1978); Last, J. L. A Dictionary of Epidemiology (2d ed Oxford, 1988).
3. Haggerty v. Upjohn Co., 950 F. Supp. 1160 (S.D. Fla. 1996) (“epidemiological studies analyze the incidence, distribution and etiology of diseases in the human population, and are an important factor in determining the admissibility of an expert’s opinion on causation”).
4. Epidemiologic evidence is not armor plated; it has chinks that at times make it less than reliable or persuasive. The gold standard epidemiologic study is the prospective cohort study. Unfortunately, this kind of study is expensive and requires lengthy follow up. This is especially true if the exposure and the effect are both rare. Then the size of the cohort will have to be extremely large. And that means enormous and probably prohibitive expense. So, in lieu of the prospective cohort study, the results of case-control studies are apt to be proffered. But this kind of epidemiologic study is notoriously unreliable, and often has power too low to rule out false negative findings. To solve the problem of low power, an epidemiologist might undertake meta-analysis, but this kind of analysis is also unreliable, as demonstrated by the comparison of its results with those of the platinum standard of the controlled clinical trial.
5. Hoffman, R.E. The Use of Epidemiologic Data in the Courts. Am. J. of Epidemiology, 120: 190-202 (1984); Kubs v. United States, 537 F. Supp. 560 (E.D. Wis. 1982) (no epidemiologic studies established a relationship between the swine flu vaccine and polymyalgia rheumatica; plaintiffs failed to prove causation by a preponderance of the evidence); Sorenson v. Shaklee Corp., 31 F.3d 638, 643 n.8 (9th Cir. 1994) (epidemiology is an accepted scientific discipline dealing with the integrated use of statistics and biological/medical science”); DeLuca v. Merrell Dow Pharm., Inc., 911 F2d 941, 954 (3d Cir. 1990), aff’d, 6 F.3d 778 (3d Cir. 1993), cert. denied, 510 U.S. 1044 (1994) (“the reliability of expert testimony founded on reasoning from epidemiological data is generally a fit subject for judicial notice; epidemiology is a well-established branch of science and medicine, and epidemiological evidence has been accepted in numerous cases”); Wilson v. Merrell Dow Pharmaceuticals, Inc., 893 F.2d 1149, 1154 (10th Cir. 1990) (epidemiologic evidence is the best evidence of general causation in mass toxic tort cases); Hall v. Baxter Healthcare, Corp., 947 F.Supp. 1387 (D. Or. 1996) (epidemiology is the medical science devoted to determining the cause of disease in human beings; the existence or non existence of relevant epidemiology can be a significant factor in proving general causation in toxic tort cases); Kelly v. American Heyer-Schulte Corp., 957 F. Supp. 873 (W.D. Tex. 1997) (stating that “while epidemiological evidence is not a necessary element in every toxic tort case, it is certainly a very important element, especially when there is no evidence of the biological mechanism which links the product to the complained-of condition”); Hopkins v. Dow Corning Corp., Inc., 33 F3d 1116 (9th Cir. 1994) (proof of general causation may be based on animal studies and biophysical data absent a solid body of epidemiological data).
6. National Bank of Commerce v. Dow Chemical Co., 965 F. Supp. 1490 (E.D. Ark. 1996) (defining cohort and case-control studies); Szklo, M. Design and Conduct of Epidemiologic Studies. Preventive Medicine, 16: 142-149 (1987).
7. Abramson, J.H. Cross-sectional Studies in Detels, et al. (eds) Oxford Textbook of Public Health, chpt. 8 (3d ed. Oxford, 1997).
8. Feinleib, M. et. al. Cohort Studies in Holland, W.W. et. al. (eds) Oxford Textbook of Public Health, chapter 11 (2d ed. Oxford Univ. Press, 1991).
9. Ibrahim, MA and Spitzer, WO. The Case-Control Study: the Problem and the Prospect. J Chronic Dis, 32: 139-144 (1979); Cole, P. The Evolving Case-Control Study. J Chronic Dis, 32: 15-27 (1979); Lilienfeld, AM and Lilienfeld, DE. A Century of Case-Control Studies: Progress? J Chronic Dis, 32: 5-13 (1979); Feinstein, AR. Methodologic Problems and Standards in Case-Control Research. J Chronic Dis, 32: 35-41 (1979); Schlesselman JJ. Case-control Studies: Design, Conduct and Analysis. (Oxford University Press, 1982.); Breslow, N. Design and Analysis of Case-Control Studies. Ann Rev Public Health, 3: 29-54 (1982); Weinberg, CR and Wacholder, S. The Design and Analysis of Case-Control Studies with Biased Sampling. Biometrics, 46: 963-975 (1990); Austin, H. et. al. Limitations in the Application of Case-Control Methodology. Epidemiologic Reviews, 16: 65-76 (1994); Greenberg, R.S. and Ibrahim, M.A. The Case-Control Study in Holland, W.W. et. al. (eds) Oxford Textbook of Public Health, Chapter 9 (2d ed. Oxford Un. Press, 1991); the gold standard for determining the effect of an exposure when the exposure is potentially harmful is not the controlled clinical trial but the prospective cohort study. Yet when the effect or disease is rare, a cohort study, with adequate power to detect an appropriate relative risk, would require very sizable samples. The cost of such a study is apt to be prohibitively expensive and so impracticable. As a result, as a practical matter, no gold standard is apt to exist for case-control studies assessing rare exposures and rare potentially harmful effects.
10. Mayes, L.C. et. al. A Collection of 56 Topics with Contradictory Results in Case-Control Research. International J. of Epidemiology, 17: 680-685 (1988); Esdaile, J.M. & Horwitz, R.I. Observational Studies of Cause-Effect Relationships: An Analysis of Methodologic Problems As Illustrated By Conflicting Data for the Role of Oral Contraceptives in the Etiology of Rheumatoid Arthritis. J. Chronic Dis., 39: 841-852 (1986); Demissie, K. et al. Empirical Comparison of the Results of Randomized Controlled Trials and Case-Control Studies in Evaluating the Effectiveness of Screening Mammography. J. Clin. Epidemiol., 52: 81-91 (1998).
11. Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 895 (N.D. Iowa 1982), aff’d, 724 F.2d 613 (8th Cir. 1983) (when the epidemiologic experts use accepted statistical procedures and methods but their opinions differ, then the proper course is to limit the studies and leave the weight of the testimony to the jury; epidemiologic studies prepared by professional, disinterested public officials according to statistical research techniques accepted in the field of epidemiology fall within the public records hearsay exception); Lakie v. Smithkline Beecham, 965 F.Supp. 49 (D.D.C. 1997) (a crucial distinction exists between the admissibility of expert scientific testimony and the weight such testimony should be afforded by the trier-of-fact).
12. Sutera v. Perrier Group of America, Inc., 986 F Supp. 655 (D. Mass. 1997) (motion for summary judgment for defendant because, among other reasons, no epidemiologic evidence links exposure to low levels of benzene to acute myeloid leukemia).
13. Note, Confronting the New Challenges of Scientific Evidence. 108 Harvard Law Review 1481-1605 (1995). Given the same architecture, epidemiologic studies with results of a positive association are more persuasive to a jury than studies with results of no association. As a result, for the defense the preferred strategy is to find a reason to have those epidemiologic studies ruled inadmissible.
14. If a particular trial court is generally averse to excluding proffered evidence before trial, the defense may tactically prefer to wait until trial is underway before challenging the admissibility of proffered epidemiologic evidence under FRE 703. The timing of the challenge helps prevent plaintiff’s experts from tailoring their testimony to circumvent defendants critique of that proffered evidence. That is, had defendants challenged this proffered evidence before trial and that challenge failed owing to the trial court’s judicial philosophy, then plaintiff’s experts would likely have had time, when it was their turn to testify, to adjust their opinions to conform to the evidentiary requirements of FRE 702.
15. Daubert v. Merrell Dow Pharm. Inc., 509 US 579, 113 S Ct 2786, 125 LE 2d 469 (1993).
16. Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 113 S Ct 2786, 125 L Ed 2d 469 (1993); just as an opinion based on a methodology that is inherently unreliable should be inadmissible so should an opinion based on a methodology that, although generally accepted in the scientific community as reliable, is imperfectly executed. E.g., Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F3d 1311 (9th Cir. 1993).
17. E.g., McKendall v. Crown Control Corp., 122 F3d 803 (9th Cir. 1997).
18. See Tyus v. Urban Search Management, 102 F3d 256, 263 (7th Cir. 1996), quoting from Braun v. Lorillard, Inc., 84 F3d 230, 234 (7th Cir. 1996).
19. Searle, J. The Construction of Social Reality. p. 151 (Free Press, 1995).
20. Greenland, S. Concepts of Validity in Epidemiological Research in Holland, W.W. et al. (eds) Oxford Textbook of Public Health, chapter 14 (2d ed. Oxford Un.Press, 1991); Rose, G and Barker, DJP. Epidemiology for the Uninitiated: Repeatability and Validity. Br Med J, 2: 1070-1071 (1979); Lust v. Merrell Dow Pharmaceuticals, Inc., 89 F3d 594 (9th Cir. 1996) (inadmissible under FRE 702 was testimony of epidemiologist who proposed to testify that clomid causes hemifacial microsomia based on his published epidemiological research which was prepared in anticipation of trial but which was not peer-reviewed).
21. Mayes, L.C. et. al. A Collection of 56 Topics with Contradictory Results in Case-Control Research. International J. of Epidemiology, 17:680-685 (1988). Ideally, from the perspective of the defense, no epidemiologist should be allowed to offer an opinion that the exposure is associated with the effect based on the results of a single case-control study. Simply, case-control studies tend to be unreliable. Because case-control studies tend to be unreliable—susceptible to bias and confounders—an epidemiologist should be allowed to offer such an opinion only on the basis of a series or program of case-control studies conducted by different epidemiologists operating independently, when the results of those studies converge to the conclusion that the exposure is positively associated with the effect.
22. Wittgenstein, L. On Certainty 243 (Harper Torchbooks, 1969). For instance, the case-control study, as a methodology, is accounted to have a significant rate of error. Ordinarily quantifying the rate of error requires a gold standard against which to compare the results of the method being assessed. In clinical medicine, the gold standard is the controlled clinical trial. So to quantify the rate of error of case-control studies, an investigator would need to compare the results of a controlled clinical trial with the results of a series of case-control studies on the same topic.
23. Malcolm, N. The Groundlessness of Belief in Malcolm, N. Thought and Knowledge (Cornell, 1977).
24. Id.
25. General Electric Company v. Joiner, 522 US 136, 118 S Ct 152, 139 L Ed 2d 508 (1997).
26. Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 US 579, 113 S Ct 2786, 125 LE2d 469 (1993).
27. Black, B, et. al. Guide to Epidemiology in Black, B & Lee, P.W. (editors) Expert Evidence: A Practitioners’ Guide to Law, Science, and the FJC Manual (West, 1997).
28. Id. at 112.
29. Padgett v. United States, 553 F. Supp. 794 (W.D. Tex 1982) (an economist could not testify about epidemiology).
30. Daubert v. Merrell Dow Pharmaceuticals, Inc. 509 US 59, 113 S Ct. 2786, 125 LEd 2d 469 (1993); Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F3d 1311 (9th Cir. 1995).
31. Walls v. Armour Pharmaceutical Co., 832 F. Supp. 1467 (M.D. Fla. 1993) (plaintiff’s expert was a medical doctor with a specialty in infectious diseases who testified about the significance of results of epidemiologic studies; defendant’s expert was a professor of statistics who testified about the significance of the results of epidemiologic studies).
32. Sutera v. Perrier Group of America, Inc., 986 F. Supp. 655 at 667 (D. Mass 1997) (an oncologist and hematologist with no expertise in epidemiology or biostatistics and with no familiarity of the epidemiologic studies undermining his opinion of causation was ruled not qualified to opine on causation); In re Agent Orange Prod. Liab. Litig., 611 F. Supp. 1223 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2d Cir. 1987) (a medical doctor should be allowed to testify on toxic tort causation only if he can demonstrate knowledge of epidemiology; an expert’s failure to consider the most relevant epidemiologic studies and other possible causes of disease resulted in their proffered testimony being ruled inadmissible); Wells ex rel. Maihafer v. Ortho Pharmaceutical Corp., 615 F. Supp. 262 (N.D. Ga. 1985) aff’d in part and modified, 788 F2d 741 (11th Cir.); cert. denied, 479 US 950 (1986) (plaintiff’s expert replied: “I am sorry sir, I am not a statistician… I don’t understand confidence levels. I never use them. I have to use the author’s conclusions;” “it does not matter in terms of deciding the case that the medical community might require more research and evidence before conclusively resolving the question; what matters is that [the] fact finder found sufficient evidence of causation in a legal sense in this particular case”).
33. Last, J.M. A Dictionary of Epidemiology p. 46 (2d ed. Oxford, 1988); Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 48 (2d ed. Lippincott-Raven, 1998) (“exposure can refer to a behavior [(e.g., needle sharing)], a treatment [(e.g., genotype)], or an exposure in the ordinary sense [(e.g., injection of contaminated blood)].”
34. Consider, for instance, measurement of exposure to electromagnetic fields. What problems arise if all the following modes are used: (1) wire configuration codes; (2) spot or 24-hour measurements of the fields; (3) self reports of use of electrical appliances? Of course, the result would be chaos.
35. Kronmal, R.A. et. al. The Intrauterine Device and Pelvic Inflammatory Disease: The Women’s Health Study Reanalyzed. J. Clin. Epidem., 44: 109-122 (1991).
36. The concept of “analysis of the data” is discussed infra at section VI.
37. Correa, A. et. al. Exposure Measurement in Case-Control Studies: Reported Methods and Recommendations. Epidemiologic Reviews, 16: 18-31 (1994); White, E et. al. Exposure Measurement in Cohort Studies: The Challenges of Prospective Data Collection. Epidemiologic Reviews, 20: 43-56 (1998).
38. MacMahon, B. and Trichopoulos, D. Epidemiology: Principles and Methods, pps. 179-180 (2d ed. Little, Brown & Co. 1996).
39. Gordis, L. Assuring the Quality of Questionnaire Data in Epidemiologic Research. Am. J. of Epidemiology, 109: 21-24 (1979); Olsen, J. Epidemiology Deserves Better Questionnaires. Int’l J. of Epidemiology, 27:935 (1998).
40. Hill, A.B. Observation and Experiment. NEJM, 248: 995-1001 (1953).
41. Colditz, G.A. et. al. Validation of Questionnaire Information on Risk Factors and Disease Outcomes in a Prospective Cohort Study of Women. Am. J. of Epidemiology, 123: 894-900 (1986).
42. Hulka, B.S. et al. Biological Markers in Epidemiology. (Oxford U Press, 1990); Hulka, B.S. & Margolin, B.H., Methodological Issues in Epidemiologic Studies Using Biologic Markers. Am. J. of Epidemiology, 135: 200-298 (1992); Schulte, P.A. A Conceptual Framework for the Validation and Use of Biologic Markers. Environ. Res., 48: 129-144 (1989).
43. Duncan, B.B. and Heiss, G. Nonenzymatic Glycosylation of Proteins – A New Tool for Assessment of Cumulative Hyperglycemia in Epidemiologic Studies, Past and Future. Am. J. of Epidemiology, 120: 169-189 (1984).
44. Correa, A. et. al. Exposure Measurement in Case-Control Studies: Reported Methods and Recommendations. Epidemiologic Reviews, 16: 18-31 (1994); McMichael, AJ. Molecular Epidemiology: New Pathway or New Travelling Companion? Am. J. Epidemiology, 14: 1-11 (1994); Perera F.P. & Weinstein, IB. Molecular Epidemiology and Carcinogen – DNA Addict Detection: New Approaches to Studies of Human Cancer Causation. J. Chronic Disease, 35: 581-600 (1982).
45. Flegal KM, et. al. Differential Misclassification Arising from Nondifferential Errors in Exposure Measurement. Am J Epidemiol, 134: 1233-1244 (1991).
46. Poole, C. Exposure Opportunity in Case-Control Studies. Am J Epidemiol, 123: 352-358 (1986).
47. Explaining what is an “exposure” and what is an “effect” is an exercise in identification through observation and then through classification, or through classification and then through observation; it is a process often presenting the proverbial problem of which came first, the chicken or the egg? Whatever comes first, it is a process that entails a potential for error. The magnitude of error can often be reduced through consensus about what is the exposure or the effect. Hyams, K.C. Developing Case Definitions for Symptom-based Conditions: the Problems of Specificity Epidem. Reviews, 20: 148-156 (1998); Rempel, D. et. al. Consensus Criteria for the Classification of Carpal Tunnel Syndrome in Epidemiologic Studies. Am. J. of Public Health, 88: 1447-1451 (1998); Westbrook, J.I. et. al. Agreement between Medical Record Data and Patient’s Accounts of Their Medical History and Treatment for Dyspepsia. J. Clin. Epidemiology, 51: 237-244 (1998).
48. Wittgenstein, L. Philosophical Investigations ¶ 279 (trans. GEM Anscombe, MacMillan, 1953).
49. Gallie, W.B. Essentially Contested Concepts. Proceedings of the Aristotelian Society, LVI: 167-199 (1955-56); to the dissatisfaction of the defense, clinicians, as they will explain, do not strictly adhere to the classification criteria for a disorder. Some patients who fail to satisfy the criteria will be diagnosed with the disorder because, in the clinician’s judgment, their clinical profile seems most consistent with the disorder. Some patients who satisfy the criteria may not be diagnosed with the disorder because, in the clinician’s judgment, their clinical profile better fits the criteria of another disorder.
50. In re Swine Flu Immunization Prod. Liab. Litig., 508 F.Supp. 897, 903 (D. Colo. 1981), aff’d sub nom. Lima v. United States, 708 F.2d 502 (10th Cir. 1983) (the court critically evaluated a study relied on by an expert whose testimony was stricken because in that study, determination of whether a patient had Guillain-Barré syndrome was made by medical clerks, not physicians who were familiar with diagnostic criteria).
51. Sackett, D.L. et. al. Clinical Epidemiology (2d edition Little, Brown, 1991).
52. Id.
53. Wacholder, S, et. al. Validation Studies Using an Alloyed Gold Standard. Am J Epidemiol, 137: 1251-1258 (1993); Brenner, H. and Savitz, DA. The Effects of Sensitivity and Specificity of Case Selection on Validity, Sample Size, Precision, and Power in Hospital-Based Case-Control Studies. Am J Epidemiol, 132: 181-192 (1990).
54. Kendell, R.E. Clinical Validity. Psychological Medicine, 19: 45-55 (1989).
55. Bloch, D.A. et. al. Statistical Approaches to Classification. Arthritis and Rheumatism, 33: 1137-1144 (1990); Fries, JF et. al. Criteria for Rheumatic Disease. Arthritis and Rheumatism, 37: 454-462 (1994); Altman, R.D. et. al. An Approach to Developing Criteria for the Clinical Diagnosis and Classification of Osteoarthritis; A Status Report of the American Rheumatism Association Diagnostic Subcommittee on Osteoarthritis. The J. of Rheumatology, 10: 180-183 (1983).
56. Rothman, K. J. Induction and Latent Periods. Am J Epidemiol, 14: 253-259 (1981); Rothman, K. J. and Greenland, S. Modern Epidemiology (2d ed. Lippincott-Raven, 1998).
57. Brenner, H and Gefeller, O. Use of Positive Predictive Value to Correct for Disease Misclassification in Epidemiologic Studies. Am J Epidemiol, 138: 1007-1015 (1993).
58. Harvey, M. et. al. Toxic Shock and Tampons: Evaluation of the Epidemiologic Evidence. JAMA, 248: 843 (1982).
59. Gordis, L. Epidemiology, pps. 32-34 (Saunders, 1996); Last, J.M. A Dictionary of Epidemiology, p. 103 (2d ed. Oxford, 1988).
60. MacMahon, B. and Trichopoulos, D. Epidemiology: Principles and Methods, Chpt 4 (2d ed. Little, Brown & Co., 1996).
61. MacMahon, B. and Trichopoulos, D. Epidemiology: Principles and Methods, Chpt 4 (2d ed. Little, Brown & Co., 1996).
62. Flanders, W.D. and O’Brien, T. R. Inappropriate Comparisons of Incidence and Prevalence in Epidemiologic Research. Am. J. Public Health, 79: 1301-1303 (1989); Freeman, J and Hutchison, GB. Prevalence, Incidence and Duration. Am J Epidemiol, 112:707-723 (1980). MacMahon, B and Trichopoulos, D. Epidemiology: Principles and Methods, Chapter 4, (2d ed. Little, Brown & Co., 1996); Feinstein, A.R. and Esdaile, J.M. Incidence, Prevalence, and Evidence. The Am. J. of Medicine, 82: 113-123 (1987); Morgenstern, H. et. al. Measures of Disease Incidence Used in Epidemiologic Research. International J. of Epidemiology, 9: 97-104 (1980); Tapia Granados, J.A. On the Terminology and Dimensions of Incidence. J. Clin. Epidemiology, 50: 891-897 (1997); Elandt-Johnson, RC. Definition of Rates: Some Remarks on Their Use and Misuse. Am J Epidemiol, 102: 267-271 (1975).
63. Burkman, R.T. and The Women’s Health Study. Association Between Intrauterine Device and Pelvic Inflammatory Disease. Obstetrics & Gynecology, 57: 269-276 (1981).
64. Wade-Greaux v. Whitehall Laboratories, Inc., 874 F.Supp. 1441, 1485 (D. Virgin Isl. 1994) (positive epidemiologic findings are, standing alone, insufficient to permit a conclusion that a particular agent is teratogenic, the court discusses relative risk, odds ratio, case-control studies, cohort studies, confounders, and statistical evaluations); Gaul v. United States, 582 F. Supp. 1122, 1125 n. 9 (D. Del. 1984); Marshall RJ. Validation Study Methods for Estimating Exposure Proportions and Odds Ratios with Misclassified Data. J Clin Epidemiol, 43: 941-947 (1990); Godley, P. & Schell, M.J. Adjusted Odds Ratios Under Nondifferential Misclassification: Application to Prostate Cancer. J. Clin. Epidemiol., 52: 129-136 (1999); Tarone, RE. On Summary Estimators of Relative Risk. J Chronic Dis, 34: 463-468 (1981); Wallenstein, S. and Bodian, C. Inferences on Odds Ratios, Relative Risks, and Risk Differences Based on Standard Regression Programs. Am J Epidemiol, 126: 346-355 (1987); Greenland, S. and Engelman, L. Re: “Inferences on Odds Ratios, Relative Risks, and Risk Differences Based on Standard Regression Programs.” Am J Epidemiol, 128: 145 (1988); Chêne, G. and Thompson, SG. Methods for Summarizing the Risk Associations of Quantitative Variables in a Consistent Form. Am J Epidemiol, 144: 610-621 (1996).
65. Zhang, J. and Yu, K.F. What’s the Relative Risk? JAMA, 280: 1690-1691 (1998).
66. Breslow, NE. Odds Ratio Estimators When the Data are Sparse. Biometrika, 68: 73-84 (1981).
67. Siliman, A.J. Epidemiological Studies: A Practical Guide (Cambridge, 1995); Martin DO, and Austin H. Exact Estimates for a Rate Ratio. Epidemiology, 7: 29-33 (1996).
68. Last, J.M. A Dictionary of Epidemiology (2d ed. Oxford, 1988)
69. Gordis, L. Epidemiology, pps. 148-149 (Saunders, 1998).
70. Greenland, S. Interpretation and Choice of Effect Measures in Epidemiologic Analyses. Am J Epidemiol, 125: 761-768 (1987); Newman, SC. Odds Ratio Estimation in a Steady-State Population. J Clin Epidemiol, 41: 59-65 (1988); Schouten, EG, et. al. Risk Ratio and Rate Ratio Estimation in Case-Cohort Designs. Stat Med, 12: 1733-1745 (1993); Siegel, DG and Greenhouse, SW. Validity in Estimating Relative Risk in Case-Control Studies. J Chronic Dis, 26: 219-225 (1973).
71. Greenland, S. Thomas, DC, and Morgenstern, H. The Rare-Disease Assumption Revisited. A Critique of “Estimators of Relative Risk for Case-Control Studies.” Am J Epidemiol, 124: 869-883 (1986); Greenland, S. and Thomas, DC. On the Need for The Rare Disease Assumption in Case-Control Studies. Am J Epidemiol, 116: 547-553 (1982).
72. Johnston v. United States, 597 F.Supp. 374 (D. Kan. 1984) (attributable Risk Calculations [or probability of causation calculation] “while this is a proper mathematical formula for calculating the probability of events which have happened, and if well founded, * * * may be of some interest as regards the risk assessments relating to any exposure, its results are only as valid as the assumptions which go into it”); Whiting v. Boston Edison Co., 891 F. Supp. 12 (D. Mass 1995) (“excess risk is calculated by epidemiologists in the form of a ratio derived by dividing the number of cases of a disease observed within a defined group by the number of cases expected in the general population, epidemiologists generally agree that excess risks of less than 50% are difficult to interpret causally”); Whittemore, AS. Statistical Methods for Estimating Attributable Risk from Retrospective Data. Stat Med, 1: 229-243 (1982); Coughlin, SS, Benichou, J. and Weed, DL. Attributable Risk Estimation in Case-Control Studies. Epidemiol Rev, 16: 51-64 (1994); Cole, P. and MacMahon, B. Attributable Risk Percent in Case-Control Studies. Br J Prev Soc Med, 25: 242-244 (1971); Cole, P. and MacMahon, B. Attributable Risk Percent in Case-Control Studies. Br J Prev Soc Med, 25: 242-244 (1971); Greenland, S. and Robins, J. Conceptual Problems in the Definition and Interpretation of Attributable Fractions. Am J Epidemiol, 128: 1185-1197 (1988). Walter, SD. The Estimation and Interpretation of Attributable Risk in Health Research. Biometrics, 32: 829-849 (1976); Whittemore, AS. Estimating Attributable Risk from Case-Control Studies. Am J Epidemiol, 117: 76-85 (1983).
73. Rothman, K.J. and Greenland, S. Modern Epidemiology (2d ed. Lippincott-Raven, 1998).
74. Id. 24
75. E.g., Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F3d 1311 (9th Cir. 1995).
76. Angell, M. The Interpretation of Epidemiologic Studies. (Editorial) New England J. of Medicine, 323: 823-825 (1990); Taubes, G. Epidemiology Faces its Limits. Science, 269: 164-169 (1995); Greenberg, R.S. and Ibrahim, M.A. The Case-Control Study in Holland, W.W. et al. (eds) Oxford Textbook of Public Health, chapter 9 at p. 130 (2d ed. Oxford Un. Press, 1991).
77. Schlesselman, J.J. Case-Control Studies: Design, Conduct and Analysis. (Oxford Un. Press, 1982); Gordis, L. Epidemiology (Saunders, 1996).
78. Grassis v. Johns-Manville Corp., 591 A 2d 671, 675 (N.J. Super. Ct. App. Div. 1991); Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 SW 2d 706 (Tex. 1997); Wickramaratne, P.J. and Holford, T. R. Confounding in Epidemiologic Studies: The Adequacy of the Control Group as a Measure of Confounding. Biometrics, 43: 751-765 (1987); Rothman, K.J. and Greenland, S. Modern Epidemiology (2d ed. Lippincott – Raven, 1998); Miettinen, OS and Coo, EF. Confounding: Essence and Detection. Am J Epidemiol, 114: 593-603 (1981).
79. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 122 (Lippincott-Raven, 1998).
80. Greenland, S. and Morgenstern, H. Ecological Bias, Confounding, and Effect Modification. Int J Epidemiol, 18:269-274 (1989); Thompson, WD. Effect Modification and the Limits of Biological Inference from Epidemiologic Data. J Clin Epidemiol, 44: 221-232 (1991).
81. Koopman, JS. Causal Models and Sources of Interaction. Am J Epidemiol, 106: 439-444 (1977); Kupper, LL and Hogan, MD. Interaction in Epidemiologic Studies. Am J Epidemiol, 106: 447-453 (1978); Rothman KJ, et. al. Concepts of Interaction. Am J Epidemiol 112: 467-470 (1980); Greenland, S. Elementary Models for Biological Interaction. J Hazardous Materials, 10: 449-454 (1985).
82. In Re TMI Litigation Cases Consol. II, 922 F.Supp. 1038 (M.D. Pa. 1996) (an expert’s failure to include a discussion of the design of an epidemiologic study and particularly the selection criteria for the subjects studied creates an enormous potential rate of error and results in the proffered evidence being inadmissible under FRE 702); Valentine v. Pioneer Chlor Alkali Co., 921 F. Supp. 666 (D. Nev. 1996) (association is defined; relative risk is defined, and attributable proportion of relative risks is defined, cohort study defined, case-control study defined, epidemiologic study ruled inadmissible because author failed to control for important bias and confounders); Padgett v. United States, 553 F.Supp. 794 (W.D. Tex. 1982) (in evaluating causation, epidemiologists must exclude three alternative explanations: chance, confounding and bias); Sackett, DL. Bias in Analytic Research. J Chronic Dis, 32: 51-63 (1979); Feinleib, M. Biases and Weak Associations. Preventive Medicine, 16: 150-164 (1987); Kopec, J.A. and Esdaile, J.M. Bias in Case-Control Studies: A Review. Journal of Epidemiology and Community Health, 44: 179-186 (1990).
83. In re “Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 783 (E.D.N.Y. 1985), aff’d, 818 F.2d 145 (2d Cir. 1987) (the court expressed concern about selection bias); Gaul v. United States, 582 F.Supp. 1122, 1125 n. 9 (D. Del. 1984) (epidemiologist analyzes the epidemiologic data and concludes that an association is due to selection bias; “relative risk describes the relationship between the risk of an occurrence, such as contracting a disease in a population exposed to a certain stimulus, and the risk of occurrence in a population not exposed to the stimulus, it is the ratio of the former risk to the latter”); Austin H, Flanders WD, and Rothman KJ. Bias Arising in Case-Control Studies from Selection of Controls from Overlapping Groups. Int J Epidemiol, 18:713-716 (1989); Lubin, JH and Gail, MH. Biased Selection of Controls for Case-Control Analyses of Cohort Studies. Biometrics, 40:63-75 (1984); Robins, JM and Pike, M. The Validity of Case-Control Studies with Non-Random Selection of Controls. Epidemiology, 1: 273-284 (1990); Robins, J and Pike, M. The Validity of Case-Control Studies with Nonrandom Selection of Controls. Epidemiology, 1: 273-284 (1990); Wacholder S, et. al. Blind Assignment of Exposure Does to Prevent Differential Misclassification. Am J Epidemiol, 134: 433-437 (1991); Wacholder S, et. al. Selection of Controls in Case-Control Studies, III: Design Options. Am J Epidemiol, 135: 1042-1050 (1992); Wacholder S, et. al. Selection of Controls in Case-Control Studies, II: Types of Controls. Am J Epidemiol, 135: 1029-1041 (1992); Wacholder S, et. al. Selection of Controls in Case-Control Studies, I: Principles. Am J Epidemiol, 135: 1019-1028 (1992); Flanders, WD and Austin, H. Possibility of Selection Bias in Matched Case-Control Studies Using Friend Controls. Am J Epidemiol, 124 : 150-153 (1986); Lasky, T and Stolley, PD. Selection of Cases and Controls. Epidemiol Rev, 16: 6-17 (1994); Lubin, JH and Hartge, P. Excluding Controls: Misapplications in Case-Control Studies. Am J Epidemiol, 120: 791-793 (1984); Miettinen, OS. The “Case-Control” Study: Valid Selection of Subjects. J Chronic Dis, 38: 543-548 (1985); Paltiel, O. et al. Two-Way Referral Bias: Evidence from a Clinical Audit of Lymphoma in a Teaching Hospital. J. Clin. Epidemiol., 51: 93-98 (1998).
84. Walter, SD. Berkson’s Bias and its Control in Epidemiologic Studies. J Chronic Dis, 33: 721-725 (1980).
85. Neyman, J. Statistics—Servant of all Sciences. Science, 122: 401 (1955); Sackett, D.L. Bias in Analytic Research. J. Chron. Dis., 32: 51-63 (1979).
86. MacMahon, B. and Trichopoulos, D. Epidemiology: Principles & Methods, p. 190-192 (Little, Brown & Co., 1996).
87. Harvey, M. et. al. Toxic Shock and Tampons: Evaluation of Epidemiologic Evidence. JAMA, 248: 840-846 (1982)
88. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 120 (Lippincott-Raven, 1998).
89. Criqui, M.H. Response Bias and Risk Ratios in Epidemiologic Studies. Am J. Epidemiol, 109: 394-399 (1979); Greenland, S. Response and Follow-up Bias in Cohort Studies. Am J. Epidemiol., 106: 184-187 (1977).
90. Wynder, E. L. Investigator Bias and Interviewer Bias: The Problem of Reporting Systematic Error in Epidemiology. J. Clin. Epidemiology, 47: 825-827 (1994).
91. Wacholder, S. et. al. Blind Assignment of Exposure Does Not Always Prevent Differential Misclassification. Am J of Epidemiol, 134: 433-437 (1991); Barron, BA. The Effects of Misclassification on the Estimation of Relative Risk. Biometrics, 33: 414-418 (1977); Copeland, KT, et. al. Bias Due to Misclassification in the Estimate of Relative Risk. Am J Epidemiol, 105: 488-495 (1977); Dosemeci M, et. al. Does Nondifferential Misclassification of Exposure Always Bias a True Effect toward the Null Value? Am J Epidemiol, 132: 746-749 (1990); Drews, CD and Greenland, S. The Impact of Differential Recall on the Results of Case-Control Studies. Int J Epidemiol, 19: 1107-1112 (1990); Gullen, WH, et. al. Effects of Misclassification in Epidemiologic Studies. Public Health Rep, 53: 1956-1965 (1968).
92. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 127 (Lippincott-Raven, 1998).
93. Rose, G and Barker, DJP. Epidemiology for the Uninitiated. Observer and Variation. Br Med J, 2 : 1006-1007 (1978).
94. Greenland, S and Neutra, R. An Analysis of Detection Bias and Proposed Corrections in the Study of Estrogens and Endometrial Cancer. J Chronic Dis, 34: 433-438 (1981).
95. Brock v. Merrell Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989), cert. denied, 494 U.S. 1046 (1990) (discussion of recall bias among women who bear children with birth defects); Coughlin, S.S. Recall Bias in Epidemiologic Studies. J. Clin. Epidemiology, 43: 87-91 (1990); Drews, CD, Kraus, JF and Greenland, S. Recall Bias in a Case-Control Study of Sudden Infant Death Syndrome. Int J Epidemiol, 19: 405-411 (1990).
96. Swan S, et. al. J. Reporting and Selection Bias in Case-Control Studies of Congenital Malformations. Epidemiology, 3: 356-363 (1992).
97. Greenland, S. Response and Follow-Up Bias in Cohort Studies. Am J Epidemiol, 106: 184-187 (1977).
98. Kelly v. American Heyer-Schulte Corp., 957 F.Supp. 873 (W.D. Tex. 1997) (holding that it is contrary to scientific method to rely upon an epidemiologic study with a weak association and a low level of statistical significance and whose results are apt to be influenced by confounders); Stellman, S.D. Confounding. Preventive Medicine, 16: 165-182 (1987); Schlesselman, JJ. Assessing Effects of Confounding Variables. Am J Epidemiol, 108: 3-8 (1978); Greenland, S and Robins, JM. Confounding and Misclassification. Am J Epidemiol, 122: 495-506 (1985); Savitz, DA and Baron, AE. Estimating and Correcting for Confounder Misclassification. Am J Epidemiol, 129: 1062-1071 (1989); Smith, PG and Day, NE. The Design of Case-Control Studies: the Influence of Confounding and Interaction Effects. Int J Epidemiol, 13: 356-365 (1984);
Yanagawa, T. Case-Control Studies: Assessing the Effect of a Confounding Factor. Biometrika, 71: 191-194 (1984).
99. Hutchison, GB. and Rothman, KJ. Correcting a Bias? N Engl J Med, 299: 1129-1130 (1978); Rothman, K.J. and Greenland, S. Modern Epidemiology (2d ed. Lippincott-Raven 1998); MacMahon, B and Trichopoulos, D. Epidemiology: Principles and Methods (2d ed. Little, Brown & Co. 1996);
100. Weinberg, CR and Sandler, DP. Randomized Recruitment in Case-Control Studies. Am J Epidemiol, 134: 421-432 (1991).
101. Rothman, K.J. and Greenland, S. Modern Epidemiology. pp. 143-144 (Lippincott-Raven, 1998).
102. Greenland, S and Kleinbaum, DG. Correcting for Misclassification in Two-Way Tables and Matched-Pair Studies. Int J Epidemiol, 12:93-97 (1983); Miettinen OS. Matching and Design Efficiency in Retrospective Studies. Am J Epidemiol, 91: 111-118 (1970); Karon, J.M. and Kupper, L.L. In Defense of Matching. Am J Epidemiol, 116: 852-866 (1982); Greenland, S. The Effect of Misclassification in Matched-Pair Case-Control Studies. Am J Epidemiol, 116: 402-406 (1982).
103. Brookmeyer, R and Liana, KY, Linet, M. Matched Case-Control Designs and Overmatched Analyses. Am J Epidemiol, 124: 693-701 (1986).
104. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 201 (Lippincott-Raven, 1998).
105. Pike, MC et. al. Bias and Efficiency in Logistic Analyses of Stratified Case-Control Studies. Int J Epidemiol, 9: 89-95 (1980).
106. Cochran, WG. The Effectiveness of Adjustment by Subclassification in Removing Bias in Observation Studies. Biometrics, 24: 295-313 (1968); Day, NE, Byar, DP and Green, SB. Overadjustment in Case-Control Studies. Am J Epidemiol, 112: 696-706 (1980).
107. Kahn, H.A. and Sempos, C.T. Statistical Methods in Epidemiology, pp. 87-105 (Oxford, 1989).
108. Greenberg, R.S. and Ibrahim, M.A. The Case-Control Study in Holland, W.W. et. al. (eds) Oxford Textbook of Public Health (2d edition Oxford, 1991).
109. Greenland, S. Limitations of the Logistic Analysis of Epidemiologic Data. J. of Epidemiol, 110: 693-698 (1979); Rosner, B. et. al. Correction of Logistic Regression Relative Risk Estimates and Confidence Intervals for Measurement Error: The Case of Multiple Covariates Measured with Error. Am. J. Epidemiol, 132: 734-745 (1990).
110. Sempos, C.T. et. al. The Influence of Cigarette Smoking on the Association Between Body Weight and Mortality. The Framingham Heart Study Revisited. Am J Epidemiol, 8: 289-300 (1998); Flegal, K.M. Deja Vu All Over Again: The Re-Analysis of Epidemiologic Data. (Editorial) Am J Epidemiol, 8: 286-288 (1998).
111. Cohen, A.J. Replication. Epidemiology, 8: 341-343 (1997).
112. Re-analysis of epidemiologic data often occurs in anticipation of or during litigation. It usually is neither published nor peer-reviewed. As a result, it is likely to be ruled inadmissible under rules such as FRE 702 and under holdings such as Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 US 59, 113 S Ct. 2786, 125 LEd 2d 469 (1993); but see, Daubert v. Merrell Dow Pharmaceuticals, Inc. 43 F3d 1311, 1320 (9th Cir. 1995); Lynch v. Merrell-National Laboratories, 646 F.Supp. 856 (D. Mass 1986) (plaintiffs epidemiologic evidence is a re-analysis of epidemiologic studies, the court could not accept result-oriented re-analysis of epidemiological studies and criticisms of other’s methodology as reliable data upon which to base an opinion on causation, plaintiffs cannot rely on criticisms of the defendant’s studies to establish causation); Lynch v. Merrell-National Laboratories, 830 F2d 1190 (1st Cir. 1987) (a reanalysis of an epidemiologic study by plaintiffs’ epidemiologist was legally insufficient owing to selection bias and its failure to be published and subject to peer review).
113. DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F2d 941, 955 (3d Cir 1990).
114. Howson, C. Theories of Probability. Brit. J. Phil. Sci, 46 : 1-32 (1995).
115. Von Mises, R. Probability, Statistics and Truth (George Allen & Unwin, 1939); Von Mises, R. Mathematical Theory of Probability and Statistics (Academic Press, 1964).
116. Bayes, T. “An Essay Towards Solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society, 53: 370-418 (1763); deFinetti, B. Theory of Probability (Wiley, 1974); Ramsey, F.P. Truth and Probability in D.H. Mellor (ed.) Philosophical Papers (Cambridge Univ. Press, 1990); Howson, C. and Urbach P. Scientific Reasoning: the Bayesian Approach (2nd ed. Open Court, 1993).
117. Fisher, L.D. and Van Bell, G. Biostatistics, p. 108 (Wiley, 1993); Rothman, K.J. and Greenland, S. Modern Epidemiology, pps. 186-187 (Lippincott-Raven, 1998); Glantz, S. Primer of Biostatistics, pps. 105, 160-161 (3d edition McGraw Hill, 1992).
118. Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 SW 2d 706 (Tex. 1997) (significance testing requires use of alpha of .5).
119. Fisher, L.D. and Van Belle, G. Biostatistics, p. 108 (Wiley, 1993); Rothman, K.J. and Greenland, S. Modern Epidemiology, pps. 186-187 (Lippincott-Raven, 1998); Glantz, S. Primer of Biostatistics, pps. 160-161 (3d edition McGraw Hill, 1992).
120. Glantz, S.A. Primer of Biostatistics. pps. 161-165 (3d ed. McGraw Hill, 1992); Freiman, J.A. et. al. The Importance of Beta, The Type II Error, and Sample Size in the Design and Interpretation of the Randomized Controlled Trial in Bailar, J.C. and Mostellar, F. (eds) Medical Uses of Statistics. Chapter 19 (2d ed. NEJM Books, 1992).
121. Kelly v. American Heyer-Schulte Corp., 957 F. Supp. 873 (W.D. Tex. 1997) (holding that it is contrary to scientific method to rely upon an epidemiologic study with a weak association and a low level of statistical significance and whose results are apt to be influenced by confounders); In re TMI Litigation Cases Consol. II, 922 F.Supp. 997, 1016 (M.D. Pa. 1996) (significance testing in nonexperimental settings is a matter that goes to the weight of the evidence); Christophersen v. Allied-Signal Corp., 902 F2d 362 (1990), re’vd, cert. denied, 503 US 912 (1992) (plaintiffs need not have statistically significant studies to establish causation); Thompson, WD. Statistical Criteria in the Interpretation of Epidemiologic Data. Am J Public Health, 77: 191-194 (1987); Cox, DR. Regression Models and Life Tables (with discussions). J R Stat Soc B, 34: 187-220 (1972); Clayton, D. and Hills, M. Statistical Models in Epidemiology. (Oxford University Press, 1993); Breslow, NE. and Day, NE. Statistical Methods in Cancer Research. Vol II: The Design and Analysis of Cohort Studies. (IARC, 1987).
122. Mann, P.S. Introductory Statistics, pps. 432-454 (2d ed. Wiley, 1995).
123. Mantel, N. and Haenszel, WH. Statistical Aspects of the Analysis of Data from Retrospective Studies of Disease. J Nat’l Cancer Inst, 22: 719-748 (1959); Mantel, N. and Fleiss, JL. Minimum Expected Cell Size Requirements for the Mantel-Haenszel One-Degree-of-Freedom Test and a Related Rapid Procedure. Am J Epidemiol, 112: 129-134 (1980); Mantel, N. Chi-Square Tests with One Degree of Freedom: Extensions of the Mantel-Haenszel Procedure. J Am Stat Assoc, 58: 690-700 (1963).
124. Fleiss, JL. Statistical Methods for Rates and Proportions. (Wiley, 1973.); Yates, F. Contingency Tables involving Small Numbers and the Chi-Square Test. J R Stat Soc Suppl, 1: 217-235 (1934).
125. Fleiss, J.L. Significance Tests Have a Role in Epidemiologic Research: Reactions to A.M. Walker. Am. J. Public Health, 76: 559-560 (1986).
126. Ware, JH, et. al.. P Values. In: Medical Uses of Statistics. (NEJM Books, 1986.)
127. Goodman, SN. P Values, Hypothesis Tests, and Likelihood: Implications for Epidemiology of a Neglected Historical Debate. Am J Epidemiol, 137: 485-496 (1993); Gardner, MA. and Altman, DG. Confidence Intervals Rather than P Values: Estimation Rather than Hypothesis Testing. BMJ, 292:746-750 (1986); Rothman, K.J. Significance Questing. (Editorial) Am. Journal of Internal Medicine, 105: 445-447 (1986).
128. Kelly v. American Heyer-Schulte Corp., 957 F.Supp. 873 (W.D. Tex. 1997) (holding that a two-tailed significance test is required for epidemiologic studies).
129. Turpin v. Merrell Dow Pharmaceuticals, Inc., 959 F2d 1349 (6th Cir. 1992) (the concept of confidence interval is explained).
130. Breslow, NE and Day, NE. Statistical Methods in Cancer Research. Vol. I: The Analysis of Case-Control Data. (IARC, 1980).
131. Glantz, S.A. Primer of Biostatistics p. 198 (3d ed McGraw Hill, 1992); Rothman, K.J. and Greenland, S. Modern Epidemiology, pps. 189-190 (Lippincott-Raven, 1998).
132. Id.
133. Id. at 195
134. When the data are stratified on the basis of potential confounders, then the association between exposure and effect will be assessed in each strata. If the association is the same in each strata–that is, none of the presumed confounders are in fact confounders – the strata or data are considered “homogenous.” When the strata or data are homogenous, a summary measure of these stratum-specific associations can be obtained. The most popular statistical technique for obtaining this summary measure is that devised by Mantel and Haenszel. This summary measure is a weighted average of the stratum-specific values, and is called the “Mantel-Haenszel summary odds ratio.” Mantel, N. and Haenszel, W. H. Statistical Aspects of the Analysis of Data from Retrospective Studies of Disease. J. Nat’l Cancer Inst., 22: 719-748 (1959); Mantel, N. Chi-Square Tests With One Degree of Freedom: Extensions of the Mantel-Haenszel Procedure. J. Am. Stat. Assoc., 58: 690-700 (1963).
135. Turpin v. Merrell Dow Pharmaceuticals, Inc., 959 F2d 1349 (6th Cir 1992) (the concept of “power” is explained); Rosenbaum, PR. Case Definition and Power in Case-Control Studies. Stat Med, 3: 27-34 (1984); Smith, AH and Bates, M. Confidence Limit Analyses Should Replace Power Calculations in the Interpretation of Epidemiologic Studies. Epidemiology, 3: 449-452 (1992).
136. Freiman JA, et. al. The Importance of Beta, The Type II Error and Sample Size in the Design and Interpretation of the Randomized Control Trial: Survey of 71 “Negative” Trials. N Engl J Med, 299: 690-694 (1978); Greenland, S. Power, Sample Size, and Smallest Detectable Effect Determination for Multivariate Studies. Stat Med, 4: 117-127(1985); Greenland, S. On Sample-Size and Power Calculations for Studies Using Confidence Intervals. Am J Epidemiol, 128: 231-237 (1988); Walter, SD. Determination of Significant Relative Risks and Optimal Sampling Procedures in Prospective and Retrospective Comparative Studies of Various Sizes. Am J Epidemiol, 105: 387-397 (1977).
137. Howe GR and Choi BCK. Methodological Issues in Case-Control Studies: Validity and Power of Various Design/Analysis Strategies. Int J Epidemiol, 12: 238-245 (1983).
138. Schlesselman JJ. Sample Size Requirements in Cohort and Case-Control Studies of Disease. Am J Epidemiol, 99: 381-384 (1974).
139. Greenberg, R.S. and Ibrahim, M.A. The Case-Control Study in Holland, W.W. et. al. Oxford Textbook of Public Health, p. 129 (Oxford U. Press, 1991); MacMahon, B. and Trichopoulos, D. Epidemiology: Principles and Methods p. 252 (2d ed. Little, Brown & Co., 1996).
140. For instance, if both the exposure and the effect are rare, only a incredibly large prospective cohort study would have the power needed to assess whether the exposure is associated with the effect. But the cost of a cohort study that large may be prohibitive or the period of its follow up may be incredibly long, thereby making such a study impracticable. In that event, an epidemiologist may resort to “meta-analysis.” In re Paoli R.R. Yard PCB Litig. 916 F.2d 829, 856-57 (3d Cir. 1990), cert denied, 499 U.S. 461 (1991) (the court discussed the use of admissibility of meta-analysis as a scientific technique); Tobin v. Astra Pharmaceutical Prods., Inc., 993 F.2d 528, 538-39 (6th Cir. 1992), cert. denied, 114 S. Ct. 304 (1993) (identifying an error in the performance of a meta-analysis, in which the Food and Drug Administration (FDA) pooled data from control groups in different studies in which some gave the control a placebo and others gave the control group an alternative treatment); Dickerson, K. and Berlin, J.A. Meta-analysis: State-of-the-Science. Epidemiologic Reviews, 14: 154-176 (1992); Einarson, T.R. et. al. A Method for Meta-Analysis of Epidemiologic Studies. Drug Intell. Clin. Pharm., 22: 813-824 (1988); L’Abbé, KA, Detsky, AS and O’Rourke, K. Meta-Analysis in Clinical Research. Ann Intern Med, 107: 224-233 (1987); Dickersin, K. and Berlin, JA. Meta-Analysis: State-of-the-Science. Epidemiol Rev, 14: 154-176 (1992); DerSimonian, R. and Laird, N. Meta-Analysis in Clinical Trials. Control Clin Trials, 7: 177-188 (1986).
141. Oklin, I. Statistical and Theoretical Considerations in Meta-Analysis. J. Clin. Epidemiology, 48: 133-146 (1995).
142. Thompson, SG. and Pocock, SL. Can Meta-Analyses Be Trusted? Lancet, 338: 1127-1130 (1991); Shapiro, S. Meta-Analysis/ smeta-Analysis. Am J Epidemiol, 140: 771-778 (1994); Fleiss, J.L. and Gross, A.J., Meta-analysis in Epidemiology, with Special Reference to Studies of the Association between Exposure to Environmental Tobacco Smoke and Lung Cancer. A Critique. J. Clin. Epidemiology, 44: 127-139 (1991); Greenland, S. A Critical Look at Some Popular Meta-Analytic Methods. Am J Epidemiol, 140: 290-296 (1994).
143. Dear, KBG and Begg, CB. An Approach for Assessing Publication Bias Prior to Performing a Meta-Analysis. Stat Sci, 7: 237-245 (1992); Begg, CB. and Berlin, JA. Publication Bias: a Problem in Interpreting Medical Data. J R Stat Soc A,151:419-463 (1988); Dickersin, K. The Existence of Publication Bias and Risk Factors for its Occurrence. JAMA, 263: 1385-1389 (1990).
144. LeLorier, J. et. al. Discrepancies between Meta-Analysis and Subsequent Large Randomized, Controlled Trials. NEJM, 337: 536-542 (1997).
145. Slavin, R.E. Best Evidence Synthesis: An Intelligent Alternative to Meta-Analysis. J. Clinical Epidemiology, 48: 9-18 (1995); Spitzer, W.O. The Challenge of Meta-Analysis. J. Clinical Epidemiology, 48: 1-4 (1995); Greenland, S. Can Meta-Analysis be Salvaged? Am J Epidemiol, 140: 783-787 (1994).
146. Ethyl Corp. v. United States Envtl. Protection Agency, 541 F.2d 1, 28 n. 58 (D.C. Cir.), cert. denied 426 U.S. 941 (1976).
147. Allen v. United States, 588 F. Supp.247, 417 (D. Utah 1984), rev’d on other grounds, 816 F2d 1417 (10th Cir. 1987), cert. denied, 484 US 1004 (1988) (“the cold statement that a given relationship is not “statistically significant” cannot be read to mean “there is no probability of relationship,” whether a correlation between a cause and a group of effects is more likely that not-particularly in a legal sense – is a different question from that answered by tests of statistical significance, which often distinguish narrow differences in degree of probability”); Turpin v. Merrell Dow Pharmaceuticals, Inc., 959 F2d 1349 (6th Cir. 1992) (confidence intervals are not the same as the preponderance of the evidence standard of proof).
148. Hoffman v. Merrell Dow Pharmaceuticals, Inc., 857 F2d 290 (6th Cir. 1988), cert. denied, 488 US 1006 (1989) (describes “substantial factor” and “but for” criteria for proving legal causation).
149. Restatement (Second) of the Law of Torts §§ 431-433 (ALI, 1965).
150. Lewis, D. Causation. J Philos, 70: 556-567 (1973); Sosa, E & Tooley, M (ed) Causation (Oxford, 1993); Rizzi, DA. and Pedersen, SA. Causality in Medicine: Towards a Theory and Terminology. Theor Med,13: 233-254 (1992); Greenland, S. et. al. Causal Diagrams for Epidemiologic Research. Epidemiology, 10: 37-48 (1999).
151. Rothman, KJ. Causes. Am J Epidemiol, 104: 587-592 (1976); Krieger, N. Epidemiology and The Web of Causation: Has Anyone Seen the Spider? Am J Epidemiol, 39: 887-903 (1994).
152. Pear, N. White Swans, Black Ravens, and Lame Ducks: Necessary and Sufficient Causes in Epidemiology. Epidemiology, 1: 47-50 (1990).
153. Mackie, J.L. The Cement of the Universe: A Study of Causation (Oxford, 1980); Rothman, K.J. and Greenland, S. Modern Epidemiology chapter 2 (2d ed Lippincott-Raven, 1998).
154. Lewis, D. Counterfactuals (Oxford, 1973)
155. Raynor v. Merrell Pharmaceuticals, Inc., 104 F3d 1371, 1376 (D.C. Cir. 1997) (discussing the distinction between general and specific causation); Casey v. Ohio Medical Products, 877 F. Supp. 1380 (N.D. Cal. 1995) (“the term causation has two meanings…, the first is general causation…, the second is specific causation….”); Thomas v. Hoffman-LaRoche Inc., 731 F.Supp. 224, 226 (N.D. Miss 1989) (while experts testified that ingestion of Accutane caused plaintiffs’ seizures, they lacked epidemiological data or studies to support their opinions); In re Agent Orange Prod. Liab. Litig., 611 F.Supp. 1223, 1243 (E.D.N.Y. 1985), aff’d 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988) (“in a mass tort case such as Agent Orange, epidemiologic studies on causation assume a role of critical importance”); Lee v. Richardson–Merrell, Inc., 772 F.Supp. 1027 (W.D. Tenn. 1991) (fatal to plaintiff’s Benedectin case was lack of supportive epidemiologic evidence in the face of many non-supportive epidemiologic evidence); Graham v. Playtex Products, Inc., 993 F.Supp. 127 (NDNY 1998) (ruling that opinion of plaintiff’s experts on general causation was admissible despite the absence of epidemiologic evidence supporting the opinion); Benedi v. McNeil–P.P.C. Inc., 66 F3d 1378 (4th Cir. 1995) (under the Daubert standard, epidemiologic studies are not necessarily required to prove causation; as long as the methodology employed by the expert in reaching his or her conclusion is sound); In re Breast Implant Litigation, II, F. Supp. 2d 1217 (D. Colo. 1998) (the most important evidence relied upon by scientists to determine whether an agent causes disease is controlled epidemiologic studies; epidemiological studies are necessary to determine the cause and effect between breast implants and allegedly associated diseases); Bowers v. Northern Telecom Inc., 905 F.Supp. 1004, 1010 (N.D. Fla. 1995) (“a cause-effect relationship need not be clearly established by . . . epidemiological studies before a doctor can testify that, in his opinion, such a relationship exists”); Grimes v. Hoffman-LaRoche Inc., 907 F.Supp. 33, 35 n.2 (D. N.H. 1995) (“no epidemiological studies have been done which establish any relationship between Accutane and cataracts, and [plaintiff’s causation expert] did not contend that causation can be proved by anecdotal evidence alone”); Sanders, J. From Science to Evidence: The Testimony on Causation in the Bendectin Cases, 46 Stanford Law Review 1-86 (1993); Bert Black & David Lilienfeld, Epidemiologic Proof in Toxic Tort Litigation, 52 Fordham L. Rev. 732 (1984); Vincent M. Brannigan, et. al., Risk, Statistical Inference, and the Law of Evidence: The Use of Epidemiological Data in Toxic Tort Cases, 12 Risk Analysis 343 (1992); Michael Dore, A Commentary o the Use of Epidemiological Evidence in Demonstrating Cause-in-Fact, 7 Harv. Envtl. L. Rev. 429 (1983); Note, Causation in Toxic Torts: Burdens of Proof, Standards of Persuasion and Statistical Evidence, 96 Yale L.J. 376 (1986).
156. Note, Proof of Cancer Causation in Toxic Waste Litigation: The Case of Determining Versus Indeterminacy, 61 Cal. L. Rev. 2075 (1988).
157. Robinson v. United States, 533 F. Supp. 320 (E.D. Mich. 1982) (epidemiologic evidence cannot establish specific causation; “at most, one can examine statistical correlation and then, within a chosen interval of error, determine whether GBS is more likely than not associated with the swine flu vaccine in a particular period after receipt of the vaccination,” “statistical evidence cannot establish cause and effect”); Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561 (N.D. Ga. 1991) (epidemiology is based on the study of populations not individuals); DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941, 945 & n.6 (3d Cir. 1990) (“epidemiological studies do not provide direct evidence that a particular plaintiff was injured by exposure to a substance”); Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 SW 2d 706 (Tex. 1997) (epidemiologic studies cannot establish specific causation).
158. Rothman, KJ (ed.) Causal Inference. (Epidemiology Resources, 1988); Susser, M. What is a Cause and How Do We Know One? A Grammar for Pragmatic Epidemiology. Am J Epidemiol, 133: 635-648 (1991); Weed, DL. On the Logic of Causal Inference. Am J Epidemiol, 123: 965-979 (1986); Greenland, S. Probability Logic and Probabilistic Induction. Epidemiology, 9: 322-332 (1998). An argument can be made that epidemiologists cannot testify that an exposure generally causes an effect unless the epidemiologist is an expert in the potential biological mechanisms that would plausibly account for that exposure causing that effect; without that expertise, all the epidemiologist is qualified to discuss is that the exposure is or is not associated with the effect.
159. Maclure, M. Popperian Refutation in Epidemiology. Am. J. Epidemiol., 121: 343-350 (1985). This process of deduction is often called the “hypothetico-deductive” method. This method blends both the processes of induction and deduction. An initial hypothesis leads by the process of deduction to certain testable consequences. When these consequences are tested and the data from the test fail to support the deduced consequence, the initial hypothesis is modified by the process of induction. The process is then repeated.
160. Kelly v. American Heyer-Schulte Corp., 957 F.Supp. 873 (W.D. Tex. 1997) (holding that an epidemiologic study is inadmissible on the issue of causation unless it satisfies the Koch-Henle postulates); Evans, A.S. Causation and Disease: The Henle-Koch Postulates Revisited. The Yale Journal of Biology and Medicine, 49:175-195 (1976).
161. Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561 (N.D. Ga. 1991) (there are five criteria used to assess causation: (1) consistency of association; (2) strength of association; (3) specificity of association; (4) temporal relationship of association; and (5) coherence of association); Evans, A.S. Causation and Disease: A Chronological Journey. Am J Epidemiol, 108: 249-258 (1978); Hill, AB. The Environment and Disease: Association or Causation? Proc Roy Soc Med, 58: 295-300 (1965); Renton, A. Epidemiology and Causation: A Realist View. J. of Epidemiology and Community Health, 48: 79-85 (1994); Hill, A.B. The Environment and Disease: Association or Causation. Proc. Royal Society of Medicine, 58: 295-300 (1965); Morabia, A. On the Origin of Hill’s Causal Criteria. Epidemiology, 2: 367-369 (1991); Renton, A. and Whitaker, L. Proof of Causation and Relative Risk (Letter). Lancet, 339: 1058 (1992).
162. The basic unit of proof of general causation is not the single epidemiologic study but the colleciton of studies constituing the program of research on the issue of general causation.
163. Burch, P.R.J. The Surgeon General’s “Epidemiologic Criteria for Casualty.” A Critique J. Chronic Disease, 36: 821-836 (1983); Lilienfeld, A.M. The Surgeon General’s “Epidemiologic Criteria for Causality”: A Criticism of Burch’s Critique. J. Chronic Disease, 36: 837-845 (1983).
164. Maclure, M. Popperian Refutation in Epidemiology. Am J Epidemiol, 121: 343-350 (1985); Susser, M. Falsification, Verification and Causal Inference in Epidemiology: Reconsiderations in the Light of Sir Karl Popper’s Philosophy. In: Rothman KJ, ed. Causal Inference, pps. 33-57 (Epidemiology Resources, 1988); Buck, C. Popper’s Philosophy for Epidemiologists. Int J Epidemiol, 4: 159-168 (1975); Maclure, M. Popperian Refutation in Epidemiology. Am J Epidemiol, 121: 343-350 (1985); Pearce, N. and Crawford-Brown, D. Critical Discussion in Epidemiology: Problems with the Popperian Approach. J Clin Epidemiol, 42: 177-184 (1989); Popper, KR. Conjectures and Refutations (4th ed. Routledge & Kegan Paul, 1972); Popper, KR. The Logic of Scientific Discovery (2nd ed. Harper & Row, 1968); Susser, M. The Logic of Sir Karl Popper and the Practice of Epidemiology. Am J Epidemiol, 124: 711-718 (1986); Susser, M. Falsification, Verification and Causal Inference in Epidemiology: Reconsiderations in the Light of Sir Karl Popper’s Philosophy. In Rothman, KJ (ed.). Causal Inference (Epidemiology Resources, Inc., 1988); Karhausen, L.R. The Poverty of Popperian Epidemiology. Int’l J. of Epidemiology, 24: 869-874 (1995); Greenland, S. Induction versus Popper: Substance versus Semantics. Int’l J. of Epidemiology, 27: 543-548 (1998). In Daubert, the USSC ruled that “‘scientific methodology today is based on generating hypotheses and testing then to see if they can be falsified….’”, citing to K. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge 37 (5th ed. 1989). That captures the essence of the deductivist philosophy. But does that mean that in the contest between inductivism and deductivism, Daubert valorizes deductivism? If so, what becomes of the prospect of proving causation given epidemiologic studies reporting an association? No doubt the United States Supreme Court did not consider the implications of this reference to Karl Popper in the dispute between inductivists and deductivists in epidemiology.
165. Lanes, S.F. Error and Uncertainty in Causal Inference. In Rothman, K.J. (ed). Causal Inference (Epidemiology Resources, Inc., 1988).
166. Rothman, KJ. and Poole, C. Science and Policy Making. Am J Public Health, 75: 340-341 (1985); Hertz-Picciotto I. Epidemiology and Quantitative Risk Assessment: A Bridge from Science to Policy. Am J Public Health, 85: 484-491 (1985).
167. Lanes, SF. The Logic of Causal Inference in Medicine. In: Rothman K.J. (Ed.). Causal Inference (Epidemiology Resources Inc., 1988.)
168. Grassis v. Johns-Mansville Corp., 591 A2d 671 (N.J. Super Ct. App. Div. 1991) (when a group of plaintiffs fail to meet the requirement of a RR>2, an individual plaintiff may prevail by demonstrating that his or her RR is greater than 2); In Re Joint E & S. Dist. Asbestos Litig., 827 F. Supp. 1014 (S.D. NY) rev’d in part, 52 F.3d 1124 (2d Cir. 1995) ([a]t least a two-fold increase in the incidence of disease attributable to… exposure is required to permit recovery if epidemiologic studies alone are relied upon;” even though epidemiological evidence regarding the relationship between exposure to c and the development of d may fall short of the 2.0 threshold of statistical significance, if this evidence is combined with clinical or experimental evidence which eliminates confounding factors and strengthens the connection between c and d specifically in the circumstances surrounding the plaintiff’s case of d, then the plaintiff’s causation proof may be sufficient to support a jury’s finding that it was more likely than not that the plaintiff’s case of d was caused by his exposure to c; the Bradford-Hill criteria must be assessed to determine the sufficiency of epidemiologic evidence to establish causation); DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941, 955, 958 (3d Cir. 1990) (“in order to avoid summary judgment, the relative risk of limb reduction defects arising from the epidemiological data Done relies upon will, a minimum, have to exceed “2”); Pollock v. Johns-Manville Sales Corp., 685 F. Supp. 489, 491 (D.N.J. 1988) (“issues of the inherent reliability of statistics aside, a 43 percent risk factor, although tangible, is clearly not ‘more probable than not’”); Manko v. United States, 636 F. Supp. 1419, 1434 (W.D. Mo. 1986), aff’d, 830 F.2d 831 (8th Cir. 1987) (“[a] relative risk of ‘2’ means that, on the average, there is a fifty percent likelihood that a particular case of the disease was caused by the event under investigation and a fifty percent likelihood that the disease was caused by chance alone; a relative risk greater than ‘2’ means that the disease more likely that not was caused by the event”); Marder v. G.D. Searle & Co., 630 F.Supp. 1087, 1092 (D. Md. 1986) aff’d sub nom., Wheelahan v. G.D. Searle & Co., 814 F.2d 655 (4th Cir. 1987) (“in epidemiological terms, a two-fold increased risk is an important showing for plaintiffs to make because it is the equivalent of the required legal burden of proof—a showing of causation by the preponderance of the evidence or, in other words, a probability of greater than 50%”); Cook v. United States, 545 F.Supp. 306, 308 (N.D. Cal. 1982) (“whenever the relative risk to vaccinated persons is greater than two times the risk to unvaccinated persons, there is a greater than 50% chance that given GBS case among vaccines of that latency period is attributable to vaccination, thus sustaining the plaintiff’s burden of proof on causation”); Padgett v. United States, 553 F.Supp. 794, 801 (W.D. Tex. 1982) (“a relative risk of ‘2’ or greater, then means that the probability that vaccination caused a particular case of GBS is better than 50,” hence, a relative risk of 2 or greater would indicate that it was more likely than not that vaccination caused a case of GBS”); Landrigan v. Celotex Corp., 127 N.J. 404, 05 A.2d 1079, 1087 (1992) (“the significance of a relative risk greater than 2.0 representing a true causal relationship is that the ratio evidences an attributable risk of more than fifty percent, which means that more than half of the cases of the studied disease in a comparable population exposed to the substance are attributable to that exposure; this finding could support an inference that the exposure was the probable cause. . . . [However,] under certain circumstances a study with a relative risk of less than 2.0 could support a finding of specific causation. . .”); Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F.3d 1311 (9th Cir 1993) (“for an epidemiologic study to show causation under a preponderance standard, the relative risk . . . will at minimum, have to exceed 2, a statistical study showing a relative risk of less than two could be combined with other evidence to show it is more likely than not that the accused cause is responsible for a particular plaintiff’s injury”); Hall v. Baxter Healthcare, Corp., 947 F.Supp. 1387 (D. Or. 1996) (the burden of proof requires plaintiffs to demonstrate that exposure to breast implants more than doubled the risk of their alleged injuries; plaintiffs must be able to show a relative risk greater than 2.0); In Re Breast Implant Litigation, 11 F. Supp. 2d 1217 (D. Colo. 1998) (plaintiffs must present expert testimony demonstrating that exposure to breast implants more than doubled the risk of their alleged injuries).
169. Oxendine v. Merrell Dow Pharmaceuticals, Inc., 506 A2d 1100 (D.C. 1986), cert. denied 493 US 1074 (1990) (an epidemiologic study with a risk ratio less than 2 was sufficient evidence); Parascandola, M. Evidence and Association: Epistemic Confusion in Toxic Tort Law. Philosophy of Science, 63 (Proceedings): S 168 – S 176 (1996).
170. Bayes’ theorem provides a rational way to modify beliefs based on subjective conditional probabilities. It states that the conditional probability of a hypothesis, given some new piece of evidence, is equal to the product of (1) the initial probability of the hypothesis before the evidence and (2) the conditional probability of the evidence given the hypothesis, divided by (3) the probability of the new evidence.
Pr (EH1) x Pr (H1)
Pr (H1E) = __________________
n
Σ Pr (EHi) x Pr (Hi)
(i = 1)
Rosner, B. Fundamentals of Biostatistics (3d ed. Duxbury, 1990); Nozick, R. The Nature of Rationality (Princeton, 1993).
171. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 24-25 (2d ed. Lippincott-Raven, 1998).
172. Characterizing an epidemiologic study as an algorithm, an almost infallible method of processing observations into a value quantifying an association between exposure and effect, is a plaintiffs’ ploy. For if an epidemiologic study were an algorithm, it would be very persuasive, especially if the odds ratio was greater than 2.0. That result would then be argued to be diamond hard fact, resistant to the erosive forces of reasoned critique. This ploy the defense must, at every turn, subtly undermine. The fact is, an epidemiologic study, even if its methodology is impeccable, is not an algorithm, a mechanized method for processing inputs into outputs, but merely an argument resting on all the discretionary premises and inferences characteristic of all non-algorithmic arguments.
Epidemiology for the Defense
By Bill Masters
Wallace, Klor & Mann, P.C.
I. Introduction
1. Definition of Epidemiology
Epidemiology literally means, in ancient Greek, the study (“logos”) of (“epi”) the people (“demos”).1 It is, more formally, the study of states of health (or states of disease) in the population.2 For the defense, it is principally a method to assess whether or not an allegedly harmful exposure is associated with a deleterious effect or disease in order to prove or disprove causation.3
Epidemiology, as a method, essentially involves comparing attributes of groups of people–it is the art and science of comparison. It describes how to best acquire data for comparison, how to compare those data on variables of interest, and how to tease from that comparison conclusions about what variables co-vary.
But epidemiology, as both art and science, is a peculiar sort of discipline, much like economics, neither hard nor soft, but an unequal blend of both, grounded in the observations and demographic variables of social science but elevated slightly to the domain of quantitative science through use of sophisticated mathematics. As with any discipline that blends the quantitative with the primarily qualitative, epidemiology receives mixed praise about its predictive value. Some epidemiologists consider it of great value, a formidable tool in the fight against disease, helpful, for instance, in identifying the intermediate cause of cholera in the 19th century or Legionnaire’s disease in this century. Others consider it of little value, a science akin to meteorology, whose reliance on complex observations produces poor predictions.4 Even so, epidemiology is now judicially recognized as critical in proving or disproving general causation in toxic tort and other scientifically imbued litigation.5
2. Basic Ideas of Epidemiology
Epidemiology is primarily the art and science of observation, not of experiment. Experiment involves the active manipulation of certain variables to determine their specific effect. In clinical medicine, the classic kind of experiment is the randomized, controlled trial. That kind of experiment may determine, for instance, the effect of a proposed new drug or device. But, unlike experiment, observation involves comparing groups of people on some key variables, through passive observation, to determine whether or not these variables are correlated. For instance, is exposure to this toxin correlated or associated with this disease? As a rule, experiment is preferred, but at times observation is all that’s available. For example, most, if not all, studies of potentially harmful exposures in humans are observational. The reason is simple enough: ethically people should not be experimentally exposed to something potentially harmful.
Very broadly, then, an epidemiologic study involves (1) identifying a potentially harmful exposure to a group; (2) identifying whether or not a specific effect occurs in that group; (3) assessing whether or not the exposure is “associated” with the effect; and, given an association, (4) assessing whether or not the exposure “caused” the effect.
3. Study Types
In epidemiology there are three basic kinds of observational studies: (1) the cross-sectional study, (2) the cohort study and (3) the case-control study.6
The Cross-Sectional Study
In the cross-sectional study, the epidemiologist determines, simultaneously, for each subject the status for exposure and effect.7 For instance, if an epidemiologist wants to determine whether or not having silicone breast implants is associated with rheumatoid arthritis, she will survey the population of women between the ages of twenty and sixty to ascertain how many have both silicone breast implants and rheumatoid arthritis. This kind of epidemiologic study is popularly used to assess risks for diseases of slow onset and long duration for which medical care is not sought until the disease is advanced. But cross-sectional studies have two major limitations: (1) they cannot readily determine whether the exposure preceded the effect; and (2) a series of “prevalent” cases (not newly diagnosed cases) will have a higher proportion of cases with chronic disease than a series of “incident” cases (newly diagnosed cases) so that if subjects whose disease is of short duration have characteristics different from those with chronic disease, the result of the study will likely be faulty.
The Cohort Study
“In a cohort study, the epidemiologist selects a group of exposed individuals and a group of unexposed individuals and follows both groups to compare the incidence of disease in the two groups.”8 If, at the end of the period of follow up, the proportion of those exposed who have the effect is greater than the proportion of those unexposed who have the effect, the epidemiologist concludes that the exposure is positively associated with the effect. Cohort studies produce results that tend to be very reliable. Unfortunately, cohort studies are time consuming and expensive, often requiring many years of follow up and costing millions of dollars.
The Prospective Cohort Study
In a prospective cohort study, the epidemiologist measures exposure before manifestation of the effect. This kind of study usually requires many subjects and a lengthy follow up. This is especially true if the effect is rare. As a result, a prospective cohort study is unlikely to appear in the context of litigation, at least for several years. Even so, it is the gold standard for determining the effect of an exposure when the exposure is potentially harmful. Good examples of this kind of study are those by Stampfer et. al. Vitamin E Consumption and the Risk of Coronary Heart Disease in Women. NEJM, 328: 1444-1449 (1993) and by Fuchs, C.S. et. al. Dietary Fiber and the Risk of Colorectal Cancer and Adenoma in Women. NEJM, 340: 169-176 (1999), noting that the results of retrospective studies were inconclusive whether dietary fiber protects against colorectal cancer, but that the results of a large prospective cohort study (the Nurses’ Health Study) do not support a protective effect of dietary fiber.
The Retrospective Cohort Study
In a retrospective cohort study, the epidemiologist measures exposure after manifestation of the effect. In the context of litigation, most cohort studies are apt to be retrospective. Good examples of population based retrospective cohort studies are those by Gabriel et. al. Risk of Connective-Tissue Diseases and Other Disorders After Breast Implantation. NEJM, 330: 1697-1702 (1994) and by Hennekens et. al. Self-Reported Breast Implants and Connective-Tissue Diseases in Female Health Professionals. JAMA, 275: 616-621 (1996).
The Case-Control Study
In the case-control study, the epidemiologist identifies a group of people with the disease (“cases”) and a group of people without the disease (“controls”) and then determines what proportion of the cases were exposed and what proportion were unexposed.9 If the proportion of cases exposed is greater than the proportion of controls unexposed, then the epidemiologist concludes that the exposure is positively associated with the disease or effect. Most litigation over harmful exposures will involve case-control studies. For instance, case-control studies are the kind of epidemiologic study plaintiffs used to assert that toxic shock syndrome was positively associated with use of tampons; that eosinophilia myalgia syndrome was positively associated with one manufacturer’s L-tryptophan supplements; and that silicone breast implants were sometimes positively associated with a variety of ill-defined symptoms sometimes disingenuously called “silicone breast implant disease.” An example of a population-based case-control study is that by Burns et. al. The Epidemiology of Scleroderma among Women: Assessment of Risk from Exposure to Silicone and Silica. Journal of Rheumatology, 23:1904-1911 (1996).
Three designs are available for case-control studies. First, in the “traditional” case-control design, the control group is a sample of the population remaining at risk at the end of the risk period, the period over which incident cases are ascertained and enrolled for the study. Next, sometimes a case-control study will be constructed within a prospective cohort study. This is called a “nested” case-control study. In that kind of study, people in the cohort who develop the disorder become cases, and random sampling of exposed and unexposed people in the cohort free of disease become controls. Next, in the “case-cohort” design, the control group is a sample of the entire population-at-risk at the start of the risk period, excluding then existing cases at that time. Finally, in the “incidence-density” design, the controls are sampled longitudinally throughout the risk period. Some of the sampled controls might develop the disease after their selection and so the final control group might contain some incident cases. Such cases would be entered in both the case and control groups.
Case-control studies have several advantages over cohort studies. First, they are well suited to study rare effects or disorders. Second, they are also well suited to study disorders with long “induction” periods. That is, rather than waiting years for the prospective accrual of cases, the epidemiologist may compress time by using historical documents to evaluate earlier exposures. Third, case-control studies are much less expensive to conduct than cohort studies because they require fewer controls and no lengthy follow-up.
Unfortunately, case-control studies also have some nettlesome disadvantages. They are not well suited for detecting weak associations (odds ratio < 1.5). But this is not a worry for the defense. More importantly, they are also susceptible to a variety of systematic errors or biases.10 These errors cause the results of the study to be ambiguous and often unreliable. This is a very serious concern for the defense.
4. Natural History of Epidemiology in Litigation
Most often, epidemiology is encountered in litigation of “toxic tort” cases. “Toxic Tort cases” are those cases in which plaintiffs assert claims for damages for problems with their health owing to purported exposure to “immunogens” [molecules that generate an immune response], “antigens” [foreign substances that induce a “specific” immune response], “toxins” [a poisonous substance], “teratogens” [substances that result in the development of an abnormal embryo resulting in deformed fetus] or “carcinogens” [substances that increase the risk of development cancer], that is, any substance that can cause humans acute or chronic injury, either directly by altering or destroying cells or indirectly through stimulation of a harmful immune response. In the past, toxic tort cases have involved, for example, Thalidomide, DES {diethylstilbestrol), Agent Orange, IUDs (Dalkon Shield), asbestos, Bendectin, L-tryptophan, and the silicon in silicone breast implants.
Toxic tort cases fall into two basic categories: (1) those in which the toxin produces a “signature disease,” such as mesothelioma from asbestos; and (2) those in which a putative toxin allegedly produces a constellation of non-specific symptoms not considered to be a signature disease. Ordinarily this second type of toxic tort case arises in advance of proper scientific data either for or against plaintiffs’ claims. These cases are propelled along as a result of mass hysteria fueled by sensationalist journalism and nurtured by “uninformed” plaintiffs’ lawyers.
To better understand why, first consider what is occurring before anyone is exposed to the putative toxin or immunogen. At any given time, a portion of those who will eventually be exposed to the toxin or immunogen either will have some non-specific symptoms that mimic symptoms of a rheumatologic disorder and that develop quite apart from their eventual exposure to the putative toxin or immunogen, or a proportion of the people will develop—quite apart from their exposure to the putative toxin or immunogen —rheumatologic disorders.
Eventually someone who is exposed to the putative toxin or immunogen decides that she has symptoms associated with this exposure. That person sees her doctor who, after examining her, may publish a case report describing this “curious association” between the putative toxin or immunogen and these reported symptoms.
Other physicians read this case report and, if they have patients who have been exposed to the putative toxin or immunogen, will query them about their symptoms. “Do you have fatigue? Do you have muscle aches and pains?” “Mrs. Macbeth, are you remembering information as well as you used to?” “If not, maybe it’s due to your use of this drug.” Of course, consistent with human nature, some of these physicians will have fantasies of being immortalized in medical history as having discovered a new syndrome or disease: “Dim’s disease” or “Grub’s disease” or “Crock’s syndrome.” They too write up their case findings for publication. Soon the medical journals contain a number of “case reports” about this “curious association.”
Enter the news media, ever the public watchdog. The news media, of course, have eager, ambitious journalists assigned to read key medical journals for developing medical issues that might capture the public’s interest or fears—the more the theme of the story taps into populist prejudices, the more newsworthy the story. Unfortunately, those who present the news, while having a keen nose for ratings, have no appreciation for scientific method: “Case reports or randomized controlled double-blinded clinical trials, what’s the difference?”
So typically the news media publishes an article or airs a nationally televised news program, highlighting the “curious association” between the putative toxin or immunogen and these symptoms. As would be expected, some readers or viewers will have been exposed to the putative toxin or immunogen, and a subset of these people will have either symptoms of a rheumatologic disorder or symptoms that mimic a rheumatologic disorder. Couple that fact with the phenomenon known as “effort after meaning” – when someone develops symptoms, look for an explanation for them – and the news media has served them with a nice explanation for their woes.
Enter the plaintiffs’ lawyer, ever ready, ever eager to assist the “downtrodden.” These plaintiffs’ lawyers begin advertising and begin receiving referrals from attorneys who lack the resources or expertise to prosecute these cases. These plaintiffs’ lawyers are usually very rich because cases like these are expensive to work up and try.
These wealthy plaintiffs’ lawyers, recognizing the merits of certain economies of scale, organize into national steering committees to investigate and formulate the various issues that will constitute a generic case against the manufacturer and suppliers of the putative toxin or immunogen. These committees assign teams of lawyers to cover issues of liability and issues of general causation. When these teams of lawyers finish, they have a script with modest variations to apply to whatever plaintiff is ready, willing and able to go to trial.
Soon plaintiffs are encouraged by their lawyers to attend support groups, purportedly created to comfort plaintiffs, these alleged victims, in their time of need. But, more realistically, these groups are congregations to mammonism, disingenuously designed to capitalize on plaintiff’s suggestibility and need to belong by suggesting to them that if they want to receive what they need, they need to have symptoms S1, S2, S3, . . . Sn. As Montaigne remarked,
“A woman, thinking she had swallowed a pin with her bread, was screaming in agony as though she had an unbearable pain in her throat, where she thought she felt stuck; but because externally there was neither swelling nor alteration, a smart man, judging that it was only a fancy and notion derived from some bit of bread that had scratched her as it went down, made her vomit, and, on the sly, tossed a crooked pin into what she threw up. The woman, thinking she had thrown it up, felt herself suddenly relieved of her pain.”
These support groups are usually headed by someone who excels at fanning the flames of the “victim’s” prejudices toward the putative corporate victimizer who manufactured the putative toxin or immunogen. This person helps perpetuate the mass hysteria triggered by the news media and manipulated by plaintiffs’ lawyers.
At these group meetings, plaintiffs are often re-routed from their treating physician to the expert witnesses plaintiffs’ attorneys have retained. These expert witnesses may be highly trained clinicians and experimenticians, but typically most are no more than professional witnesses.
At this point, these experts begin preparing and publishing reports of “case series,” discussing the signs and symptoms developing in their cohort of “patients” and relating those signs and symptoms to the alleged toxic or immunogenic exposure so prominently discussed in the media.
These reports of case series are not epidemiologic or comparison studies because they lack adequate control groups. So at this point, various epidemiologists, seeing a need for epidemiology, become interested and begin designing and conducting small case-control studies. The results of these case-control studies begin appearing in a variety of second-tier medical journals and then one or two appear in first-tier journals such as the Journal of American Medical Association and the New England Journal of Medicine.
The results of these case-control studies tend to be equivocal, with studies demonstrating a positive association being criticized for harboring various biases, most notably selection bias; and studies demonstrating no association being criticized for low “power,” a condition in which too few people are studied to rule out false negative results.
Eventually, if enough interest is generated in the scientific or medical community, a large existing cohort study designed to consider a different set of risks is tapped for a nested case-control study. The results of that study, when published several years later, are criticized but generally considered to have advanced the argument about causation.
5. Defense Goals
Defending toxic tort cases is a lot like trying to take a bone from a dog: it’s never as easy as it first seems, and it’s always dangerous. Not the least of these dangers is fashioning and presenting a defense on causation. In this effort, epidemiologic studies are, at the very least, useful and, at the very most, essential. They can be used as a shield by the defense in an effort to disprove general causation. But sometimes plaintiff will proffer epidemiologic studies as a sword to prove general causation. Then the defense will need to apply its full analytic powers in scrutinizing these studies to unseam them, either to keep them from the jury or to discount their persuasiveness to the jury.
- Unpersuasive Epidemiologic Evidence
Most often, when both plaintiffs and defendants use epidemiologic evidence, an issue arises about the weight to give the respective opinions based on opposing epidemiologic studies, an issue of fact to be resolved by the jury. But, for the defense, undermining an epidemiologic study at trial is a daunting task because an adverse epidemiologic study is difficult to critique negatively to a jury. First, that critique requires explaining a series of complicated predicates, such as the subtle concepts of sampling error, systematic bias and confounding. Second, this burden of explanation is very heavy if the jury is constituted of those with a bias against corporate defendants. (This is why it is often said, wisely, that the best defense in toxic tort litigation is having a favorable venue.) So for the defense it’s best to keep adverse epidemiologic evidence from the jury.
- Legally Insufficient Epidemiologic Evidence
Sometimes, once proffered epidemiologic evidence is admitted into evidence, the court will then later rule, as a matter of law, that plaintiff’s epidemiologic evidence is insufficient to create an issue of fact on general causation for the jury to resolve. The court will either, at best, direct a verdict for defendant or, at worst, provide a limiting instruction to the jury to disregard the epidemiologic evidence in its deliberations on the issue of general causation.
- Inadmissible Epidemiologic Evidence
Occasionally, however, experts will proffer opinions either based on epidemiologic studies lacking sound methodologies or based on inferences so far removed from the epidemiologic evidence to insult credulity. Then the court will have reason to exclude that proffered evidence as inadmissible under rules of evidence, such as FRE 702. That proffered evidence can be excluded, depending on trial tactics, before trial or during trial.
- Does the Expert Have the Necessary Expertise?
FRE 702 requires that the proffered expert be “qualified” as an expert by “knowledge, skill, experience, training, or education….” The defense will want to assure itself that plaintiff’s proffered expert has the requisite epidemiologic expertise to provide an opinion based on methodologically sound data.
- Does the Expert’s Avowed Expertise Fit the Opinion?
Not only must the proffered expert have the requisite expertise but that expertise must fit or be relevant to the proffered opinion. For instance, although the expert is a qualified epidemiologists, is she qualified, for instance, to testify about medical causation? Virtually always, the defense should argue she is not so qualified.
- Is The Proffered Evidence “Knowledge?”
FRE 702 requires that proffered opinion of the expert, before being admissible, be “knowledge.” “Knowledge” is defined negatively as more than subjective belief or unsupported speculation, and positively as any body of known facts or truths accepted as such on good grounds. If the proffered evidence is “knowledge,” and if it will assist the trier of fact, a qualified expert may testify about “scientific, technical or other specialized knowledge.”
- Is The Proffered Evidence “Scientific” Knowledge?
A belief, within that set of beliefs characterized as knowledge, may also fall within that subcategory of beliefs characterized as “scientific” knowledge. Scientific knowledge is defined as belief derived by the scientific method, that kind of method based on generating hypotheses and testing them to see if they can be falsified. Expert opinions based on scientific knowledge should be evaluated by the court, to assess their admissibility, in light of those indicia of evidential reliability identified in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 US 579, 113 S. Ct. 2786, 125 L Ed 2d 469 (1993). Under that standard, scientific beliefs, besides being falsifiable, should satisfy other indicia of evidential reliability, including peer review, publication, and general acceptance in the scientific community.
Some argue that if the opinion is based on technical or other specialized knowledge, it should be submitted, if relevant, directly to the jury. Epidemiologic evidence, they may assert, is that kind of knowledge. Yet, contrary to that argument, whether epidemiologic evidence is scientific evidence or just specialized knowledge, it should be evaluated in light of the applicable indicia of Daubert and any other indicia demanded in the epidemiologic community. To argue otherwise, successfully, plaintiffs must identify a criterion that will successfully demarcate scientific knowledge from other kinds of specialized knowledge? Many brilliant people have attempted to do so. Yet most philosophers of science believe none has been successful. So, if over the years philosophers of science have been unsuccessful in finding such a criterion, neither the U.S. Supreme Court nor the various trial courts should expect to do so successfully. As John Searle, Professor of Philosophy at UC Berkeley, remarked:
“Knowledge can be naturally classified by subject matter, but there is no special subject matter called ‘science’ or ‘scientific knowledge.’ There is just knowledge, and ‘science’ is a name we apply to areas where knowledge has become systematic, as in physics or chemistry.”
The fact is, FRE 702 concerns “specialized” knowledge, under which are subsumed “scientific” and “technical” knowledge. So, absent a cogent criterion demarcating technical from scientific knowledge, what’s true for opinions based on scientific knowledge should also be true for opinions based on “technical” knowledge.
- Is the Opinion Based on Scientific Methodology?
Under Daubert, the trial judge, as gatekeeper, must screen proffered scientific evidence “to preliminarily assess whether the reasoning or methodology . . . is scientifically valid [that is, whether it is scientific knowledge] and . . . whether that reasoning or methodology properly can be applied to the facts in issue [that is, whether that scientific knowledge is relevant].” The trial court is not asked to resolve scientific issues on both sides of which exist evidentially reliable evidence.
Presumably, what the USSC would have the trial court consider with regard to methodology are the following: (1) identify the methods used; (2) identify what methods are generally accepted in the relevant scientific community; (3) assess the fit between (1) and (2). The trial court is expected to consider steps (1), (2), and (3), but not to analyze the methods generally accepted in the scientific community to assess their merits or validity. (If the USSC expects the trial court to undertake that last step, what standard does the USSC expect the trial court to employ to evaluate the methods generally accepted in the scientific community?)
The concept of “scientific method” cannot be defined by resort to a set of necessary and sufficient defining criteria, but rather to an open-ended set of meaning criteria. The scientific method is probably best defined as a set of indicia or maxims for the generation and justification of data and theories. In Daubert, the USSC, however, does offer four classes of “appropriate observations:” (1) testing; (2) peer review and publication; (3) rates of error; and (4) general acceptance in the scientific community.
(1) Testing: Can the theory or technique be tested? Has the theory or technique been tested? The USSC quotes approvingly that “‘scientific methodology today is based on generating hypotheses and testing them to see if they can be falsified; indeed, this methodology is what distinguishes science from other fields of human inquiry.’” “‘The statements constituting a scientific explanation must be capable of empirical test.’” “‘The criterion of the scientific status of a theory is its falsifiability, or refutation, or testability.’” Epidemiologists test hypotheses using the methods of epidemiology. For instance, is low frequency electromagnetic radiation associated with childhood leukemia? Using epidemiologic methods, epidemiologists will assess whether that exposure is associated with that effect.
(2) Peer Review and Publication: Has the theory or technique been subjected to peer review and publication? The USSC remarked, “submission to the scrutiny of the scientific community is a component of ‘good science,’ in part because it increases the likelihood that substantive flaws in methodology will be detected.” Yet the USSC qualifies this criterion by noting that publication is not a sine qua non of admissibility for several reasons: (i) it does not necessarily correlate with reliability; (ii) sometimes well-grounded but innovative theories will not have been published; and (iii) some propositions are too particular, too new, or too limited in interest to be published.
(3) Rates of Error: What is the particular technique’s known or potential rate of error? For instance, if an epidemiologists bases her conclusion that eating bran flakes is negatively associated with colon cancer on the results of case-control studies, she would want to assess the rate of error of case-control studies?
(4) General Acceptance in the Scientific Community: Is the theory or technique generally accepted within the relevant scientific community? What methods do the well-recognized epidemiologic treatises require?
(5) Other Factors: Importantly, the USSC emphasizes that, “many factors,” not just these four “will bear on the inquiry” into whether or not “the reasoning or methodology underlying the testimony is scientifically valid.”
In the context of litigation, proffered opinions are more likely to have evidentiary reliability if they satisfy some additional criteria: (1) the opinion is based on research conducted independent of litigation; (2) the opinion was not developed expressly for purposes of testifying in court, and (3) the proffered opinion was based directly on legitimate pre-existing research unrelated to the litigation. These are particularly important criteria in assessing the reliability of epidemiologic studies.
“The overarching subject,” says the USSC, “is the scientific validity of the principles . . . that underlie a proposed submission.” That is, the system of beliefs underlying the expert’s opinion must be “valid” and “reliable.” “Validity” and “reliability” are the twin concepts in light of which scientists evaluate data, theories, principles, procedures, laws, instruments and devices. “Validity” refers to accuracy. That is, does the theory explain what it is purported to explain, the datum, principle or law is what it is purported to be, the procedure produces what it is purported to produce and the instrument or device detects what it is purported to detect? In epidemiology, the validity of an epidemiologic study has two components: internal validity and external validity. Internal validity is the validity of the inferences from the samples which the epidemiologist assembles and evaluates. External validity is the validity of the inferences about the population from which the samples were drawn.
“Reliability” refers to consistency among those who assess accuracy. That is, “reliability” refers to whether or not, inter-subjectively, those who take measurements agree from observation to observation or measurement to measurement. Reliability implies that different epidemiologists performing the same kind of epidemiologic study would obtain similar results. Yet reliability does not imply validity. Simply, although phenomena such as symptoms are observed consistently, nothing guarantees that these symptoms in fact measure what they purport to measure. Even so, “reliability” is an indicia of “validity.” That is, if a symptom or sign is unreliable, it has questionable validity. For example, if our clocks did not generally agree with one another, we could not use them to measure time. Our clocks would not measure falsely; they would not measure at all. Reliability, then, is a necessary but not a sufficient criterion for validity.
The defense will want to argue that the proffered epidemiologic evidence is unreliable if a series of epidemiologic studies exist on an issue and the studies in that series reach different results. For example, the results of case-control studies on the same topic are apt to vary, some with a positive association and some with no association. On this account, case-control studies could be criticized as lacking “reliability,” that all important component of validity. But to argue that all case-control studies should be inadmissible because case-control studies are generally unreliable is to throw the baby out with the bathwater. After all, the defense will want to arm itself with those case-control studies showing no association. This is a sound tactic so long as the power of these particular studies is sufficient to detect an odds ratio of at least two, and the studies are in other respects more methodologically sound than those that have resulted in a positive association. Otherwise the baby best go out with the bathwater.
Determining “validity” usually requires a more epistemologically secure standard of reference against which to measure the accuracy of an observation or technique of measurement. For example, the diagnosis of a brain tumor on the basis of symptoms of impaired neurological function can be verified by identifying the tumor through MRI, CT scan or biopsy (surgery and histological inspection). The logical direction of this process of validation is that the relatively non-specific, but more obvious phenomena either act as a symbol for or point to less obvious but more specific pathognomonic phenomena. In epidemiology, each piece of data used in a study should be valid and the kind of epidemiologic study in which that data are embedded, should generate results that are valid. The validity of a particular kind of epidemiologic study would be gauged by that kind of study considered to be the gold standard, in epidemiology that is usually the prospective cohort study.
So to claim that a procedure is valid is to be able to provide justifications that the validator is more epistemologically secure than the procedure itself. This appeal to firmer ground is an aspect of empiricism. So validation is apt not to have occurred if there is no descent to something more observationally direct; if there is no ascent to something more intuitively solid or plausible; or if there is no traverse either to something conceded to be less impeachable or to a series of logically and epistemically interconnected somethings within which that which is to be validated harmoniously meshes. An epidemiologic study would be validated to the extent it agrees with that kind of study considered to be the gold standard or, if the epidemiologic study is incomparable, to the extent it lacks internal errors.
The “validation dilemma” is that although something or some method may be a purported validator, often it has no surer epistemic foundation that than which it purports to validate. Simply, consider validation as a process of finding firmer epistemic ground. For example, an epidemiologist cannot very well validate the quality of the methodology of the case-control study through reference to the results of another case-control study. But she could validate the methodology of the case-control study by reference to the large prospective cohort study.
How many so-called “external validators”, when analyzed, are nothing more than the consensus of experts—“general acceptance in the relevant scientific community.” What could be the problem with validation based on the consensus of experts? Suppose Dr. Bob is an expert in naturopathic medicine. He believes that the ginseng root, if rubbed on the scalp, will result in prolific hair growth. This he believes not because he performed experiments with the root, but because Dr. Billy Bob, his venerated teacher in naturopathic school, told him so. Dr. Billy Bob, in turn, believed it was so because when he was a student he read that it was so in the treatise, “Ancient Cures for Baldness” by the late Dr. Graymalkin of Edinburgh, Scotland. No one knows why Dr. Graymalkin believed it to be so because she has been dead for three hundred years. The moral of this tale: the validity of a particular method of validation should never be taken for granted.
Yet there are limits to critique: “All testing, all confirmation and disconfirmation of a hypothesis takes place already within a system. *** The system is not so much the point of departure, as the element in which arguments have their life.” That is, the basic assumptions of the system are not put to the test, not backed up by evidence. This is what Wittgenstein meant when he said: “Of course there is justification; but justification comes to an end.” So the results of a case-control study may be verified by the results of a large prospective cohort study whose results in some circumstances, could be verified by a controlled clinical trial, whose results in the final analysis could only be verified by a program of such studies over a course of time.
What are examples of methodological flaws in proffered epidemiologic evidence that warrant a finding that the proffered evidence is, under FRE 702, inadmissible? First would be that “sampling error” likely accounts for the putative positive association between exposure and effect. Second would be that “systematic bias” likely accounts for the putative positive association between exposure and effect. Third would be that “confounding” likely accounts for the putative positive association. Fourth would be that the study has not been published in a peer-reviewed journal.
- Is the Opinion Sufficiently Determined by the Data?
A basic problem, then, in any process of validation is this: the theory, principle or procedure investigated will always be underdetermined by the data. As a result, a logical and epistemic space exists in which will be at play that activity called “interpretation.” That means that process of validation is a process of assessing an interpretation in light of other interpretations. Usually in that process, a number of interpretations may be offered. And, unfortunately, no one will have an impeccable gold standard by which to assess the merits of those various and often competing interpretations. So the process of validation becomes a process of circumferentially narrowing that logical and epistemic space through a process of argumentation or persuasion. That process is always open-ended. And so validity is always a matter of degree. The lesson is that validation is a process that is both scientific and rhetorical, and so requires both evidence and argument.
Epidemiologists can be challenged most often, if allowed to testify about causation, when offering opinions about general causation. Often those opinions will be significantly underdetermined by the underlying epidemiologic evidence. In those instances, as the United States Supreme Court has held, nothing in either Daubert or the Federal Rules of Evidence requires a trial court to admit opinion evidence which is connected to existing data only by the ipse dixit of the expert. A court may conclude that the analytical gap between the data and the opinion proffered is simply too great, and refuse to admit the opinion into evidence.
To enforce these requirements, the defense will need to engage a first rate epidemiologist to find every weakness in the epidemiologic study or studies on which plaintiffs’ experts base their opinions, to prove general causation. Once that’s done, the defense should require a hearing under FRE 104 to present the epidemiologist’s critique in order to block admission of that study or those studies into evidence. This will not be an easy task. The defense is apt to be required to demonstrate that not only does the study have errors, but that these errors account for the positive association. As a practical matter, opinions based on published epidemiologic studies other than case-control studies are unlikely to be ruled inadmissible. Battles over the admissibility of epidemiologic evidence are likely to be fought over two kinds of analysis: “meta-analysis and reanalysis.” Both are easily manipulable and hence tools of abuse in the hands of the unscrupulous forensic epidemiologist.
- Does the Proffered Evidence “Fit” the Issues of the Case.
Proffered epidemiologic evidence must also have a tendency to prove an issue of fact at issue in the litigation. If it does not, it is irrelevant and hence inadmissible. For instance, epidemiologic evidence does not fit the issues in the litigation if it demonstrates that exposure to benzene is significantly associated with leukemia when the issue of fact is whether or not silicone breast implants cause autoimmune disease.
- Opinion Based on Hearsay Reasonably Relied Upon by Experts in the Field in Forming Opinions.
By evidentiary rules such as FRE 703, an expert may base her opinion on facts or data not admissible in evidence if of a type reasonably relied upon by experts in the particular field in forming opinions or inferences upon the subject. Epidemiologists will often base their opinions on facts or data which would be considered hearsay and so not admissible in evidence.
- Probative Value of Evidence Outweighed By Danger of Unfair Prejudice, Confusion of Issues, or Misleading the Jury
Even if the proffered evidence satisfied the criteria of evidentiary rules such as FRE 702 and 703, it may still be ruled inadmissible under a rule such as FRE 403. FRE 403 provides that “although relevant, evidence may be excluded if its probative value is substantially outweighed by the danger of unfair prejudice, confusion of the issues, or misleading the jury, or by considerations of undue delay, waste of time, or needless presentation of cumulative evidence.” This rule is used rarely. Yet it might be used appropriately to exclude certain epidemiologic studies, those case-control studies, for instance, either with an odds ratio less than 2.0 or with a noticeable bias.
II. Expertise in Epidemiology
Epidemiologists are usually physicians with a Master of Public Health degree, those with a Sc.D. in epidemiology, or those with a Ph.D. in epidemiology.11 Those with PhD’s are said to have had, as a rule, more rigorous schooling on the principles of epidemiology.12
But establishing that a proffered expert has had formal training in epidemiology should be the beginning not the end of an inquiry into her expertise. What’s most important, because the field of epidemiology like the field of law is so varied, is whether or not the proffered expert has some practical experience about the disputed issue of general causation. Simply, an epidemiologist needs some experience with that issue to acquire an understanding of the variety and complexity of independent variables that might correlate with the relevant effect. In this regard, the epidemiologist should provide, or the defense should otherwise obtain, a list of what epidemiologic studies she has published. These studies will disclose her primary interests. These studies should also be read with a jeweler’s glass for statements about what epidemiologic methods are required for valid and reliable results. With those statements, the defense can then compile “a list of reminders” for the expert when she becomes tempted, in order to advance plaintiff’s litigation, to stray from the truth.
Early in the history of mass toxic tort litigation, plaintiffs tended to neglect epidemiologic evidence and, giving it short shrift, would engage and proffer physicians or others without expertise in epidemiology to testify about the relevant epidemiology.13 Typically, that testimony would describe the findings of the epidemiologic studies without analyzing the relative merits of those studies. As litigation of mass toxic tort cases has matured and the courts have recognized the importance of epidemiologic evidence, plaintiffs now typically engage epidemiologists to testify about epidemiology. Wisely, plaintiffs usually engage epidemiologists with solid sympathies for plaintiffs, someone such as Shauna Swan, Ph.D., who repeatedly testified for plaintiffs in the litigation over Bendectin and silicone breast implants.14 Yet, once the credibility of such an expert becomes exhausted, as it invariably does, plaintiffs will have to find another epidemiologist, with the lure of either substantial retainers or the opportunity to grind ideological axes.
But epidemiologists available to testify in court are rare. Most do not want to be forensic experts. For some reason, they do not seek the thrill of matching wits with trial lawyers. Nor do they relish having their names appear in opinions by trial or appellate courts, where they may be derogated for having some kind of commercial or ideological bias. Those few who do brave these hazards and testify are often not the epidemiologists involved in conducting the epidemiologic studies offered as evidence on the issue of general causation. As a result, what they say about an epidemiologic study by way of critique, if they have not reviewed the raw data, is often somewhat conjectural. “The results of this study may have been influenced by selection bias because the controls were selected from this particular population rather than a population similar to the cases.”
Given the small pool of forensic epidemiologists, physicians without special training in epidemiology may be asked to testify about the significance of epidemiologic studies relevant to the issue general causation.15 Usually the physician merely describes rather than critiques the epidemiologic evidence. Yet, whatever expert is proffered to testify, she should be able to analyze the statistical models or techniques used in those studies. She should be able to say whether or not they are appropriate.16 That is a litmus test. For if the so-called expert cannot do that, she cannot assess whether or not sampling error accounts for the association.
III. Exposure
1. Consensus Definition of Exposure
The term ‘exposure” refers to that situation in which people come into contact with something potentially harmful.17 These potentially harmful things can range from electromagnetic radiation to a variety of biologics, such as viruses and bacteria, and to various drugs and devices. Exposures of historical interest in the context of litigation have included injection of a flu vaccine; ingestion of L-tryptophan manufactured through recombinant DNA; use of tampons; ingestion of Bendectin; implantation of silicone breast implants; inhalation of tobacco smoke; and inhalation of asbestos fibers.
At times, an exposure may be described at the level of cells or even molecules. This level of description would provide a degree of precision most helpful in explaining how biologically plausible is a particular theory of causation. But it’s not required for purposes of epidemiology. For that purpose, what’s required is a phenomenological description of the exposure. For instance, did this insomniac ingest L-tryptophan manufactured by this company; or did this shipyard worker inhale asbestos in this ship thirty years ago; or did this pregnant woman ingest Bendectin to palliate symptoms of morning sickness? Even though this description need be only phenomenological, it should be as precise as possible to ensure uniformity in identifying exposures.18 For instance, in epidemiologic studies considering the association between use of a intrauterine contraceptive device (IUD) and pelvic inflammatory disease, one study defined exposure as use of an IUD within one month of hospital admission while another study used three months.19 This problem can be compounded if, within the same study, different definitions of exposure are applied to members of the samples in the study. Precision also enables other epidemiologists to replicate the study. It further prevents an epidemiologist, during analysis of the data, from redefining the definition of exposure in a way that partitions the data most favorably for the research hypothesis.20
If consensus is not reached and the various studies each apply a different measure of exposure or effect, the results could be said not to have been replicated from study to study.
2. Measures of Exposure
Ideally, the epidemiologist should specify precisely what variables are to be measured to establish an exposure. Exposure can be measured on a variety of variables, such as (1) intensity or dose of exposure; (2) duration or frequency of exposure; (3) route of administration of the putative toxin; and (4) timing of exposure.21
Information about these measurable variables can be obtained from a variety of sources. An epidemiologist may directly observe an exposure or she may obtain information about an exposure through interviews or questionnaires either with the person exposed or with witnesses.22 For instance, an epidemiologist may ask a Vietnam veteran whether he saw one or more of his comrades being killed in order to assess posttraumatic stress disorder. The epidemiologist may also review records to obtain information about exposure. For instance, she may review medical records to determine whether or not a patient received silicone breast implants. Or she may review prescription records to determine whether a woman bought Bendectin.
Whatever methods are adopted to assess exposure, the epidemiologist should attempt to minimize errors of measurement by obtaining information on exposure from more than one source.23 As Sir Austin Bradford Hill remarked, “one must go seek more facts, paying less attention to techniques of handling the data and far more to the … methods of obtaining them.”24
For the defense, ascertaining the method of assessing exposure is vital. What an epidemiologist asserts about an exposure must be valid. It must be in fact what it is purported to be. So always ask, what have the epidemiologists conducting the study done to insure validity.25 For example, if the method of ascertaining exposure is a questionnaire, obtain a copy and review it for the variety of problems that might impair its effectiveness and objectivity. Does it contain ambiguous questions or leading questions or restricted categories of possible responses? Odds are, the questionnaire has significant flaws, often serious enough to compromise the validity of the results of the epidemiologic study.
Data on exposures are usually obtained from the following:
- Face to face interviews;
- Existing records;
- Self-administered questionnaires;
- Telephone interviews; and
- Databases.
3. Biological Markers of Exposure
At times, biological markers are used to identify exposure. These markers are called “exposure markers.”26 For instance, when the ambient exposure fluctuates over time, measurement of a more stable biological marker will provide a more reliable measure of exposure. For example, measuring levels of fasting plasma glucose to estimate hyperglycemia is unreliable because those levels vary daily; however, that variability is avoided by measuring instead nonenzymatically glycosylated hemoglobin, an index of glycemia stable over several weeks.27
Even so, this method of identification has potential for error. This is so simply because tests for biological markers rarely predict perfectly whether someone has the marker. These tests for biomarkers are often merely economical or efficient surrogates for a gold standard test. As a result, they have a certain “sensitivity” (defined as the true positive rate) and “specificity” (defined as the true negative rate). Obviously, the more sensitive the test, the more effectively it will accurately identify the marker of exposure. But to say that a test, say with 95% accuracy, is positive is to say only that, assuming this patient has the target marker, this test will be truly positive 95 times in a hundred tests. What the positive test does not do is identify this patient as having a 95% probability of having the target marker.
What is wanted, given a positive test, is the probability that the patient has the target marker. This is called the “predictive value” of the test. To determine the predictive value, the epidemiologist needs to know the prevalence of the marker in the relevant population.
Prevalence of Marker x True Positive Rate
PPV = (Prevalence x True Positive Rate) + (1 – Prevalence x False Positive Rate)
For example, if the prevalence of this marker was 1%, and the true positive rate was 95%, and the false positive rate 5%, the predictive value would be 1.5%. That is, if the patient has a positive test, he has a 1.5%, not a 95% probability of having the target marker.
So when an exposure has been measured through a biological marker, the defense needs to become aware of the sensitivity and specificity of the test, of the gold standard used to establish sensitivity and specificity, and of the prevalence of the marker in the relevant population. Knowing the gold standard test is important because it provides the most accurate measurement of exposure. If use of the gold standard test is not that uneconomical, the epidemiologist should explain why the less accurate surrogate test was used in lieu of the gold standard test.
The trend of misclassification of exposure would be to have more people test positive for the marker than truly had the marker. That results in more people being classified as having been exposed. This is a trend the defense would want to limit. To limit this potential error, the epidemiologist should insure that such biological markers fulfill the following criteria: (1) be specific for a given exposure; (2) persist or degrade over time in a way preserving the order of cumulative exposures; (3) be detectable with accurate and reliable assays; and (4) have a small ratio of intra-subject to inter-subject variation.44
4. Misclassification Bias
If what counts as an exposure is not defined carefully and in a way permitting reasonably reliable identification, then what is apt to result is misclassification of what constitutes exposure and non-exposure. That is, a true exposure will be classified as a non-exposure and a true non-exposure will be classified as an exposure.
Of these possible kinds of misclassification, the kind most likely to occur, outside litigation, is that in which true exposures will be classified as non-exposures. But in litigation, the reverse is apt to occur because litigants will lie about the nature of their experience in order to become classified as exposed. In either event, the rate of misclassification differs between cases and controls. In the latter example, for instance, controls are more often misclassified as cases than cases are misclassified as controls. For the defense, this is a dangerous phenomenon. It is known as “differential misclassification.”45
Plaintiff will have the effect. When the effect is not a signature disorder, then proof of exposure becomes critically important. But often, plaintiff will have difficulty proving exposure. If the stakes in proving exposure are great, plaintiff can be expected to lie, testifying to an exposure.
5. Case Control Studies
In case-control studies, controls, when selected, should be classified as exposed or unexposed. Cases should be classified at the time of diagnosis. The processes for identifying the rate of exposure for cases should be comparable to those for identifying the rate of exposure for controls.46 This is vital. For instance, if the subjects are being interviewed, the interviewer should be blinded to disease status. The interviewer should also use a “structured” interview, asking each subject exactly the same question in the same manner. If a record is being reviewed, the information should be equally likely to have been recorded for both cases and controls. Questionnaires are very often used to obtain information about exposures. Yet this method of gathering information is somewhat imprecise. As a result, the epidemiologist should conduct sub-studies to validate the self-reports of those responding to the questionnaire.
Accurately identifying the exposure is critical to the validity of these studies. The major threat to validity is differential misclassification of exposure. That kind of misclassification is an important problem for the defense if cases are classified as exposed when, in fact, they are not exposed. When members of the study groups are in litigation, obtaining information about exposure through interview or questionnaire is fraught with a potential for differential misclassification of exposure. Yet, in litigation, many investigators approach subjects in the study, naively as though the subjects had no motive to malinger. Anyone in litigation has at least a monetary interest in being classified as someone exposed. So when an epidemiologist seems unreasonably naïve, her motives should be questioned. For instance, when the study base is small, epidemiologists would obtain more precise information through structured interviews and clinical examinations. If, instead, they obtain information through less precise methods such as questionnaires, ask them what justifies use of the less precise method.
In case-control studies, identifying an exposure often occurs not through interviews with or questionnaires to the subjects but through a review of the subjects’ medical records. This process can result in differential misclassification of exposure, biasing the results of the study in favor of the defendant. For example, in the silicone breast implant cases, a plaintiffs’ epidemiologist argued that, in an important case-control study favoring defendants, exposed women might be misclassified as unexposed. In that study, determination of exposure was made strictly by reviewing Mayo Clinic medical records. But women with records there may have had their SBIs implanted at other hospitals without that fact being recorded in the Mayo Clinic records. As a result, women who had medical problems and SBIs may have been misclassified as not having SBIs, thereby underestimating their risk from SBIs.
IV. Effect
1. Identify and Define the Effect
Identifying accurately the effect of an exposure is critical. Often the effect will be easily identified. For instance, if the exposure is alleged to cause a form of cancer, that form of cancer can be identified by trained pathologists. Yet, at other times, the effect can be identified only provisionally. This typically occurs when the effect is a syndrome constituted of unspecified subsets of a vast array of non-specific symptoms. Then the defense should be very much concerned with ensuring that the diagnostic criteria for such a syndrome be adequately defined before the epidemiologist begins the study.31
To adequately define the diagnostic criteria, qualified clinicians must identify a cluster of related signs and symptoms with a characteristic evolution indicative of pathology. To begin, they will consider plaintiffs’ alleged complaints. Usually plaintiffs allege a variety of complaints. Invariably, these complaints are subjective and “non-specific.” That is, they are symptoms which may result from any number of ordinary and extraordinary causes.
Plaintiffs typically implicate scores of symptoms from abasia to zoster. Implicating so many symptoms is troublesome in that it proves too much. With that many symptoms, trillions of combinations of symptoms exist, trillions of varied manifestations of a so-called disorder such that every one of us could have it as long as we were exposed. The possible combinations of signs and symptoms, if n represents the total number of signs and symptoms, is 2n-1. To back out the empty subset, subtract 1 so that the relevant possible combination of signs and symptoms is 2n-l – 1. Obviously, if the only thing that distinguishes those with the alleged disorder and those without it is having the exposure, no effect has been identified in a cause and effect relationship; all that has been identified is a potential cause.
To identify a unique pathology, plaintiffs need to do some more analysis. First, they should analyze all these symptoms and identify which, if any, tend to occur together. “There are statistical means,” said a prominent plaintiffs’ epidemiologic expert, “to look for clusters of symptoms and patterns in symptoms.” If these symptoms occur together in some pattern, they are events which are “statistically dependent.” That is, when a symptom occurs, its occurrence predicts, by a value greater than chance, the presence of another symptom and another, and so on. The extent to which a symptom predicts another can be determined through “correlation studies.” These studies will generate a quantitative measure of the strength of that relationship called the “correlation coefficient.” Of course, that these variables correlate does not necessarily mean they are caused by the exposure. They could be caused by other events each plaintiff has in common. Not surprisingly, plaintiffs’ experts will probably not have conducted correlation analyses of the signs and symptoms allegedly associated with the exposure. Indeed, until forced by a court to do so, plaintiffs’ experts will probably not even attempt to fashion a hypothesis about what is that alleged disorder, let alone test it.
Some plaintiffs’ experts may concoct a hypothesis about the diagnostic criteria of the alleged disorder. But even this hypothesis is apt to be ignored by many plaintiffs’ experts; they will continue as though no hypothesis exists. So, as a practical matter, each expert has concocted his own private set of criteria. (This is reminiscent of Wittgenstein’s remark about a silly method of verification: “Imagine someone saying, ‘But I know how tall I am!’ and laying his hand on top of his head to prove it.”)32 As a result, this alleged “disorder” always will be an “essentially contested concept” like the concepts “social justice,” “the good life,” and “the most beautiful dog.”33 All are “soft end points” – notions that lack sufficient and necessary criteria or that cannot otherwise be defined by reference to facts about the world.
Issues about case definition:
- Does the health condition exist along a continuum of severity;
- Is the disease label a catchall for variety of conditions, possibly of a different origin;
- Is there disagreement over criteria for diagnosis of the disease;
- Does the disease develop over a short or long period of time; and
- Whether subjective and/or objective criteria are used to diagnose disease.
2. Diagnostic Reliability and Validity
Reliable and valid diagnosis of the effect or disorder is key to a valid epidemiologic study.34 Yet achieving that is notoriously difficult. First, the process of diagnosis is not particularly reliable. The same diagnostician may diagnose the same phenomena correctly one day, but incorrectly the next. Or different diagnosticians may diagnose the same phenomena differently the same day or the next day. Indeed, diagnosticians often do disagree about all aspects of the process of diagnosis: They disagree about the same patient’s history, the physical findings on examination, the interpretation of diagnostic tests and the diagnosis.35
For example, when two experienced surgeons, using the same diagnostic criteria, independently interviewed the same group of patients who had had operations for peptic ulcer, they agreed on whether the operation had been successful in less than two-thirds of the cases. In another instance, when three cardiologists interviewed the same 57 men with chest pain, at least one clinician judged 54% to have angina pectoris. Yet all three cardiologists agreed about the history in only 75% of cases, and when one cardiologist concluded that a given patient had angina pectoris, the other two agreed with him only 55% of the time.36
For a diagnosis to be reliable, the same diagnostician should reach the same diagnosis for the same clinical phenomena at different times (inter-diagnostic reliability) and different diagnosticians should reach the same diagnosis for the same clinical phenomena (inter-diagnostic reliability). Yet, obviously, this is merely a utopian dream, an ideal. The more clinical phenomena to assess, the more likely a diagnostician will evaluate that phenomena differently one day to the next. The more diagnosticians involved in assessing a single set of clinical phenomena, the more likely each will reach a different diagnosis. Reliability could be increased if the diagnosticians were limited. Yet, even then, if a limited set of diagnosticians were hired, say, by plaintiffs’ attorneys, and if they agreed upon the diagnostic criteria to apply to the clinical phenomena, while reliability is thereby increased, the risk increases for bias. Then, although the reliability of the diagnoses may increase, the validity of these diagnoses may be altogether absent.
Second, apart from the reliability of the diagnosis, the diagnostic criteria used to identify a disorder may not be particularly valid, even though all the diagnosticians agree on the diagnosis. Establishing the validity of diagnostic criteria is complicated.37 Strategies available for establishing the validity of a clinical syndrome include the following: (1) identification and description of the syndrome either by clinical intuition or by cluster analysis; (2) demonstration of boundaries between related syndromes by techniques such as discriminant function analysis and latent class analysis; (3) follow-up studies establishing a distinctive course or outcome; and (4) therapeutic trials establishing a distinctive response to treatment.38
Often, no reliable gold standard exists to verify the classification criteria. For example, no reliable gold standard validates the classification of certain rheumatic diseases owing to the lack of a set of unique diagnostic findings. In that event, diagnostic criteria would have to be developed with prospective studies and through methods using statistics and consensus (for example, the Delphi method or the expert panel). This process often involves the following steps: (1) a committee of experts isolates a set of historical, physical and laboratory features for further consideration; (2) the committee then determines the sensitivity and specificity of these features by the Delphi method, a technique designed to use consensus of expert opinion in situations of uncertainty by assuring anonymity, feed back, and iteration; (3) the committee then conducts a prospective study enrolling patients diagnosed with the target disorder and a comparison or control group with “confusable” signs and symptoms.
Initially, the committee uses a set of diagnostic criteria greater than those assessed through the Delphi method. This enlarged set of criteria is used by clinicians, blinded to the status of the case or control group, to diagnose the target disorder. A criterion is included in subsequent analyses if it discriminates between the target disorder and the comparison group. (Alas, the gold standard is the clinical diagnosis of the target disorder.) Using this approach, the committee narrows the enlarged set of criteria to be further considered as important in the diagnosis of the target disorder; (4) the committee then develops classification criteria using two different statistical techniques: (a) stepwise logistic regression and Boolean algebra and (b) classification trees or recursive partitioning.39
When the classification criteria are established, they will reflect a trade off between stringency and laxity. For the defense, the cutline is important. Stringent criteria are apt to misclassify some mild cases as normals. This is what the defense would want. Less stringent criteria may misclassify some normals as diseased. This is what plaintiffs want.
3. Timing of Diagnosis
If, after an exposure, the subject in a study group does not manifest symptoms of the disorder for some time, the process of identifying cases may be imprecise and result in misclassifying cases as non-cases. (Generally, this is not a problem for the defense.) As a result, an epidemiologic study should consider this possible bias by considering the “induction period” and the “latent period” of the disorder. The “induction period” is the period of time from causal action until initiation of disease. The “latency period” is the interval of time between occurrence of the disease and detection of the disease.40
If the latency period is long, then early case-control studies will misclassify cases as controls and the results of the study will be biased, demonstrating falsely that the exposure is not associated with the effect. Sometimes, the latency period can be reduced by improved methods of detecting disease. As a result, if many diagnosticians are involved in identifying cases, some with more sophisticated ways to detect disease than others, the result is an uneven identification of cases, and that too can result in bias.
4. Misclassification Bias
If what counts as an effect is not defined carefully and in a way permitting reasonably valid and reliable identification, then what is apt to occur is misclassification of what constitutes an effect.41 That is, a true case will be classified as a non-case and a true non-case will be classified as a case. In the context of litigation, the likelihood is that a true non-case will be classified as a case. This is termed “differential misclassification.” For example, when a physician is uncertain about which of several possible diagnoses is appropriate for a patient, she may select that diagnosis widely publicized as having resulted from a notorious exposure. She may do so simply because of the siren song of the publicity and not because of sound clinical inference.42 For the defense, this is a problem. Simply, this bias will result in a finding that the effect is associated with the exposure. Indeed, if the effect is rare, misclassifying non-cases as cases will increase the strength of that spurious association significantly.
5. Prevalence of Effect
Generally, the prevalence of an effect is defined as the ratio of (1) the number of those manifesting the effect in a specified population at a specified time to (2) the population at that specified time.43 Two species of prevalence are “point prevalence” and “period prevalence.” “Point prevalence” refers to a static picture or snapshot of the number of persons who have the disease in a population at one point in time.44 “Period prevalence” refers to the number of people who have the disease in a population during a specified period of time. (This measure is infrequently used because it is said to be no longer useful.)45 It is customary to use the actual or estimated size of the population at the midpoint of the period of time.
Point Prevalence = Number of Existing Cases
Total Population
Period Prevalence = All Cases Existing During the Period
Total Population at Midpoint of Period
6. Incidence of Effect
The “incidence” of an effect is an important concept in cohort studies. For it is the variable needed to compute measures of association. The incidence of an effect is defined broadly as the number of new instances of the effect occurring during a specified period of time in a specified population.46 Incidence can be measured in four basic ways:
Incidence as a Count of Events. Incidence may be measured as merely the number of newly observed events. If n represents the number of newly observed events, the “incidence” is n.
Incidence as Events per Unit of Time. Incidence may be measured as the ratio of the number of events to the time of observation for those events. If n represents the number of observed events, and t represents the period of time for observation, then the incidence is the quotient of n divided by t.
Incidence as Events per Unit of Amount of Observation. Incidence may be measured as the ratio of number of events observed to the time of observation for those events and the number of observed people in the population. If n represents the number of observed events, and t represents the period of time for observation and e represents the number of observed people in the population, then the incidence is the quotient of n divided by the product of e and t. Because this is a measure of the density of events occurring during the observation period, it is often referred to as the “incidence density.” Following are ways of computing this ratio:
Incidence as a Probability
Incidence may be measured as the ratio of the number of events in a given period to the number of observed people at risk in the population. This is the probability of the event occurring to an element of the observed people at risk in the population over a given interval. It’s known as the “cumulative incidence rate.” If n represents the number of events in a given period and e represents the number of observed people at risk in the population, then the incidence is the quotient of n divided by e.
V. Association between Exposure and Effect
1. Association Defined.
An “association” is defined as a relation between an exposure and an effect such that the exposure and effect occur together more (or less) frequently than they would strictly by chance. For example, in 1981, in the Women’s Health Study, a relative risk of 1.6 (95% CI 1.4 – 1.9) was reported for users of intrauterine devices (IUD) compared to non-IUD users for pelvic inflammatory disease (PID). This finding is a report of an association, beyond that due to chance, between use of IUDs and PID.47
That an exposure is statistically associated with an effect is a necessary, but not a sufficient condition for inferring that the exposure causes the effect. For example, in 1994, an epidemiologic study reported a relative risk of 1.38 for breast cancer from exposure to electromagnetic fields. But, today, few would conclude from this association that electromagnetic fields cause breast cancer.
2. Measures of Association
Epidemiologists have two basic measures of association: quotients and differences.
Quotients
Measures of association expressed as a quotient are the relative risk (“rate ratio” and “risk ratio”) and the “odds ratio.”48 The relative risk is used as a measure of association in cohort studies.49 The odds ratio is used as a measure of association in both cohort and case-control studies.50
In epidemiologic studies, the data needed to measure an association are often represented in a 2 by 2 (or fourfold) table. A 2 by 2 table consists of two columns representing the presence or absence of the effect (disease or disorder) and two rows representing the presence or absence of the exposure.
Diseased Healthy
Exposed | A | B | Total Exposed A+B |
Unexposed | C | D | Total Unexposed C+D |
A + C
Total Diseased |
B + D
Total Healthy |
Relative Risk (Risk Ratio and Rate Ratio)
The relative risk is defined as the ratio of two incidence rates. An “incidence rate” is the expected proportion of a fixed population-at-risk that develops the disease over some specified period. So the rate ratio is the ratio of the incidence of the effect in those exposed to the incidence of the effect in those unexposed.
Incidence of effect in exposed
Relative Risk = ————————————————————–
Incidence of effect in unexposed
When “incidence density rates” are compared, the quotient is referred to as the “rate ratio.” When “cumulative incidence rates” are compared, the quotient is referred to as the “risk ratio.” Both the rate ratio and the risk ratio are referred to as the relative risk.51 The domain of values for the variables in this formula will result in a quotient equal to 1 or greater than 1 or less than 1. If the quotient is equal to 1, the exposure is not associated with the effect. If the quotient is greater than 1, then the exposure is positively associated with the effect. If the quotient is less than 1, then the exposure is negatively associated with the effect.
Relative risk is distinguished from “absolute risk.” “Absolute risk” is the incidence of disease in a population.52 Measures of absolute risk include, for instance, incidence rates and prevalence. The absolute risk for the exposed is A/[A+B]. The absolute risk for the unexposed is C/[C+D]. The relative risk is the ratio of these two risks, and is expressed as follows:
RR = A/A+B
C/C=D
The odds ratio is the ratio of the odds of the effect in the exposed to the odds of the effect in the unexposed.
The odds of the effect of in the exposed is as follows:
A/B = |
A/(A+B) |
B/(A+B) |
The odds of the effect in the unexposed is as follows:
C/D = |
C/(C+D) |
D/(C+D) |
So the ratio of these two odds (the odds ratio) is expressed as follows:
OR = A/B
C/D
The odds ratio often overstates the relative risk, especially if the effect is common.53 In case-control studies, subjects are sampled conditioned on whether or not they have the effect. As a result, in case-control studies, one cannot obtain a direct estimate of relative risk. Yet sometimes the odds ratio does adequately approximate the relative risk. This occurs if the cases are “incident” cases and the controls are concurrently selected from the same study group.54 More specifically, it occurs: (1) when the cases are representative as to history of exposure of all with the effect in the population from which the cases were drawn; (2) when the controls are representative as to history of exposure of all without the effect in the population from which the cases were drawn; and (3) when the disease or effect does not occur frequently; however, if people who become cases over the risk period are also eligible for inclusion in the control group, the rare-disease assumption can be discarded.71
When those conditions exist, the relative risk will approximate the odds ratio as A and C approach 0.
- Differences
The basic measure of association expressed as a difference is the attributable proportion of risk (or attributable risk).72 The attributable proportion of risk is the incidence rate among those exposed less the incidence rate amount those not exposed:
A/[A+B] – C/[C+D].
The attributable risk indicates that amount of the total risk attributable to the exposure. For example, if the attributable risk was 0.60 for myocardial infarction from being a defense attorney, then being a defense attorney is the exposure or risk which accounts for 0.60 of the total risk for having a myocardial infarction.
3. Strength of Association
When the quotient of a ratio measure of association is greater than 1, the exposure is positively associated with the effect. But because the range of values for positive associations is from a value just greater than 1 to infinity, not all positive associations are the same. As a rule, the greater the value of the quotient, the stronger the association. The stronger the association, the less likely that it is due to confounding variables.57 And the more persuasive it is as a necessary but not a sufficient indicia of general causation.58 Unfortunately, no reliable way exists of identifying a dividing line between significant and insignificant associations. Some epidemiologists would offer the following guidelines:
0.9-1.1 No effect
1.2-1.6 Weak hazard
1.7-2.5 Moderate hazard
> 2.6 Strong hazard
Some would require a quotient of 2 or more to credit an association as an indicia of causation.59 Yet many epidemiologists would suggest that as a rule, only a quotient of three or four is significant.60
Study period in j intervals.
to ti t 2 t3 t j-1 t j
—–|—-|—-|—–|——|———-|——- →
Start Finish
Under incidence-density sampling, one or more controls are selected for each case from those at risk at the time of onset of the case (to).
Under cumulative-incidence sampling, controls are selected from those still unaffected at time t1. Cases are selected from those developing the disease in same population over a period of time (to to t1).
“Incidence-density ratio” is the ratio of two incidence densities. The “incidence density” is the expected number of new cases per unit of person-time at risk.
“Risk ratio” is the ratio of two cumulative incidence rates. A “cumulative incidence rate” is the expected proportion of a fixed population-at-risk that develops the disease over some specified period.
Under incidence-density sampling, as opposed to under cumulative-incidence sampling, the odds ratio is generally a better approximation to the risk ratio.
Incidence-density sampling is a more appropriate model in chronic disease epidemiology.
Others would suggest that as a general rule of thumb, only a ratio of three or four is noteworthy. Some courts require a ratio of 2 or more in order to credit the association as an indicia of causation.
Study period in j intervals.
to ti t2 t3 t j-1 t j
—–|—-|—-|—–|——-|———|———→
Start Finish
Under incidence-density sampling, one or more controls are selected for each case from those at risk at the time of onset of the case (to).
Under cumulative-incidence sampling, controls are selected from those still unaffected at time t1. Cases are selected from those developing the disease in same population over a period of time (to to t1).
Under incidence-density sampling, as opposed to under cumulative-incidence sampling, the odds ratio is generally a better approximation to the risk ratio.
VI. Systematic Error or Bias
1. Systematic Error Defined
Systematic error or bias is defined generally as “any process at any stage of inference which tends to produce results that differ systematically from the truth.”61 So when error occurs randomly, it is called “measurement error.” But when it occurs non-randomly, that is, systematically, it is called “systematic error.”
The “validity” of an epidemiologic study is determined by the degree to which it lacks systematic error. As a result, the dogged hunt for systematic error is the crux of evaluating epidemiologic studies. For the defense, this is particularly important. In litigation, plaintiffs will often support their conclusions about causation with the results of case-control studies, a form of epidemiologic study highly susceptible to systematic error. These are the studies the defense will be vetting for systematic errors that result in a positive association. Errors to find are those which overstate the numerator and understate the denominator in the odds ratio.
Unfortunately, many kinds of systematic error may infect an epidemiologic study. And, often, from merely reading the study, these biases will not be revealed. For instance, bias can occur when the epidemiologist reads up on the subject of interest; or specifies and selects the study sample; or executes the study; or measures exposures or effects; or analyzes the data; or interprets the analysis. To identify these biases, the defense will need the raw data of the study and will need to depose, whenever possible, the epidemiologist who conducted the study. Always remember in epidemiology, the devils in the details.
2. Confounding
Systematic bias, besides being distinguishable from measurement error, is also distinguishable from “confounding.” A confounder is a variable causally related to the effect and associated with but not a consequence of the exposure. So to be a confounder, a factor must satisfy these three criteria: (1) it must be a risk factor for the effect; (2) it must be associated with the exposure; and (3) it must not be affected by the exposure or the effect.62 For instance, suppose that women infected with Lyme’s disease at younger ages tend to have longer incubation times. Suppose also that the women in the study groups are, on average, younger than women in the general population. These members of the study groups will, on average, have longer incubation times than members of the general population. In that event, age is a confounder for estimating risk in the general population, and confounding by age would result in underestimation of the proportion of women in the general population who will develop Lyme’s disease.
Following are three phenomenon which may be confused with confounding: “intervening variables,” “effect modification” and “interaction.”
- Intervening Variables
Intervening or intermediate variables are defined as variables associated with the exposure in the causal pathway from exposure to the ultimate effect.63 These intervening variables are not confounders.
- Effect Modification
“Effect modification” (statistical interaction) is different from “confounding” and occurs when the magnitude of the rate or odds ratio varies owing to the value of a third variable.64 For example, suppose that eating ten donuts a day elevated the myocardial infarction rate of men in the study by 1.70, but elevated the myocardial infarction rate of women by only 1.20. Gender modified the rate ratio. So gender is the effect modifier.
- Interaction
Interaction is also different from confounding, and is defined as that situation in which the incidence rate of disease resulting from two or more risk factors differs from the incidence rate expected to result from those risk factors individually.65 The exposure is associated with an effect, but the effect is lesser or greater depending on the interaction of the variables.
3. Major Types of Bias
The major types of bias are selection bias, information bias, and uncontrolled confounding.66
- Selection Bias
Selection bias can occur when the samples are selected for the epidemiologic study. When the procedure for selecting the samples is not random, the result is apt to be selection bias. Selection bias results in an observed relation between exposure and effect that is different among those in the study from among those who would have been eligible but were not chosen for the study.67 For instance, suppose an epidemiologist wanted to determine the proportion of people in the general population who enjoyed professional baseball games. To determine this, she stood at the turnstile of a baseball park and asked every other person through the turnstile whether or not they enjoyed professional baseball games. Most would say they enjoyed them. But they are not likely representative of the general population.
Be careful to distinguish selection bias from information bias. In a case-control study, selection bias is a product of the process of selecting cases and controls. Information bias is the product, once the relevant samples are selected, of determining who was exposed before onset of the effect. In cohort studies, selection bias is the product of the process of selecting those who were exposed and those who were unexposed. Information bias is the product, once the relevant samples are selected, of determining who developed the effect during the period of follow up. Selection bias results if, for example, the control group was selected in such a way that it would have proportionately fewer members exposed than a sample of controls drawn randomly from the population. Selection bias results if, for example, the case group was selected in such a way that it would have proportionately more members exposed than a sample of cases drawn randomly from the population. For example, selection bias will occur when the exposure in question becomes a defining characteristic of the effect. For instance, in the silicone breast implant litigation, plaintiffs and their experts fashioned a definition of the effect–“silicone breast implant disease”–which included the criteria of having silicone breast implants.
- “Berkson’s Bias”
Berkson’s bias, a species of selection bias, occurs when a group of cases is drawn from a hospital population if that group has a greater probability of having more than one medical problem than a group drawn from the general population.68 If the epidemiologist decided to select controls from only one of the following three populations: (1) hospital population; (2) general population or (3) military population, the result may be selection bias. The hospital population has the least healthy members. The military population has the most healthy members. So if the epidemiologist selects the unexposed group from the hospital population, the incidence of the effect may be disproportionately smaller.
- “Neyman’s Fallacy”
Neyman’s fallacy, another species of selection bias, occurs when “prevalent cases” are used as representative of “incident cases.” Prevalent cases, unlike incident cases, are affected by the duration of the disease which in turn is affected by treatment, cures and mortality.69
- Post Hoc Selection Bias
Post hoc selection bias results after the samples have been selected for the epidemiologic study.70 As a result of events that occur during the study, the composition of those samples is altered. For instance, members of a sample may drop out of the study or die or fail to respond to requests for information relevant to the study, or they may disclose information during the study which the epidemiologist needed to know before the study was underway.
- Ascertainment Bias
Ascertainment bias results when the epidemiologist fails to ascertain cases or controls who were exposed at the same rate as would have occurred had the cases or controls been selected randomly from the population.71 For instance, ascertainment bias occurs if the epidemiologist selects the cases from the clinic of a physician to whom plaintiffs’ counsel have referred all their prescreened clients with symptoms of fatigue and silicone breast implants, but then selects the controls from a clinic in a Shaker community.
- Diagnostic Bias
Diagnostic bias occurs when, in selecting cases for a case-control study, the clinician fails to diagnose cases at the same rate as would have occurred had true cases been randomly selected from the study population.72 Watch for the physician who cannot evaluate whether or not the patient has the effect when blinded to whether or not the patient was exposed. In that event, the result is apt to be over-diagnosis of that effect putatively associated with the exposure, that is, misclassifying a non-case as a case.
- Response Bias
Response bias occurs when those who agree to participate are different from those who decline to participate on the characteristic of exposure or effect.73 For instance, when those asked to be controls frequently decline to participate, the result may be selection bias if they have a greater prevalence of the effect than those who agree to participate. This kind of selection bias will harm the defendant. Simply, it will increase the strength of an association between exposure and effect.
- Information Bias
Once the appropriate samples have been selected, the epidemiologist must obtain information from the members of these samples to determine, for example, whether, in case-control studies, a member of the sample is exposed or unexposed or whether, in cohort studies, a member has the effect or does not have the effect.90 Errors in obtaining that kind of information result in misclassifications, and are called “misclassification errors.” Misclassification errors are of two types: differential and non-differential.
- Differential Misclassification
Differential misclassification occurs when the subjects who fall into one category are misclassified at a rate greater than are the subjects who fall into another category.75 That is, in case-control studies, cases are misclassified as exposed at higher rates than controls are misclassified as exposed. And in prospective cohort studies, the exposed are misclassified as cases at higher rates than the unexposed are misclassified as cases. For example, in a prospective cohort study, the effect was under-diagnosed in the unexposed group because, relative to the exposed group, members of that group visited physicians less often.
* Differential Misclassification of Exposure: This kind of misclassification results, for example, when cases are misclassified as being exposed more often than controls are misclassified as exposed. For instance, in a case-control study, epidemiologists determined exposure to silicone breast implants (SBIs) by reviewing medical records from a particular clinic. Yet, arguably, women may have had SBIs without that fact having been recorded in that clinics medical records. As a result, women with medical problems and SBIs may have been misclassified as not having SBIs, thereby underestimating the odds ratio.
* Differential Misclassification of Effect: This kind of misclassification results, for example, when exposed people are misclassified as having the disorder more often than unexposed people are misclassified as having the disorder. For instance, women with silicone breast implants (SBI’s) were referred for diagnosis to physicians hired at considerable expense to testify for plaintiffs because these physicians believed a priori that SBIs caused autoimmune disease. These physicians diagnosed all women with SBIs with “SBI disease” and, of course, diagnosed none of the women without SBIs with the disease.
- Non-differential Misclassification
Non-differential misclassification is misclassification that is random.76 As a result, if the samples are large enough, the proportions of subjects misclassified is approximately equal in all study groups. Non-differential misclassification of exposure tends to shift the relative risk or odds ratio towards 1. In studies with a measure of association much greater than 1, concern about non-differential misclassification of exposure or effect is rarely warranted. This is so because the estimate of exposure or effect, without misclassification, could be even greater provided the misclassification probabilities apply uniformly to all subjects.
*Non-differential Misclassification of Exposure: This kind of misclassification results, for example, when some in the study fail to candidly acknowledge they were exposed. As a result, they are classified as non-exposed.
*Non-differential Misclassification of Effect: This kind of misclassification results, for example, when some in the study fail to acknowledge symptoms that would cause them to be diagnosed with the effect. As a result, they are classified as controls.
- Interviewer Bias
Interviewer bias occurs when an epidemiologist fails to gather information in the same way from the different study groups.77 In case-control studies, this bias is particularly at play in determining exposure. Simply, when the subject has the effect, that fact may prompt the interviewer to probe the subject’s past more thoroughly to find an exposure. For instance, in a case-control study, nurses reviewing the medical records knew which women had silicone breast implants (SBIs). As a result, if, on one hand, these nurses were biased against the idea that SBIs cause disease, they would look less intently for disease. If, on the other hand, they had sympathy for women with SBIs, they may have looked more intently for medical problems in the women with SBIs than in women without SBIs.
- Detection Bias
Detection bias occurs when in a cohort study, for example, those who were exposed are more likely to be examined for and diagnosed with the effect than those who were unexposed.78 For instance, women who took estrogen (the exposure of interest) might develop uterine bleeding as a result of the estrogen, and thereby visit a physician more often than those not taking estrogen. Because those taking estrogen have more gynecological examinations, they are more likely to be diagnosed with endometrial cancer (the effect of interest).
- Recall Bias
Recall bias occurs when members of the study groups recall past events with different rates of accuracy or completeness.79 In case-control studies, for example, epidemiologists need to assess whether or not the subjects were exposed. In that situation, recall bias is particularly likely when subjects are asked to recall whether or not they were exposed, especially if the subject is in litigation. This bias creates the potential for differential misclassification. For instance, if a member of the case group with the effect of fatigue was asked to recall whether she ingested an over-the-counter laxative in the last year, she may recall that she did more often than a subject in the control group without fatigue or any other worrisome symptoms.
- Reporting Bias
Reporting bias results when those in one group tend to be more or less likely to report information than those in another group.80 For instance, if those in the case group tend to report an exposure more often than those in the control group, the result may be reporting bias if the case group and the control group have the same prevalence of exposure. The different rates of reporting exposure could be due to the fact that the case group is in litigation and so has more carefully searched the past for exposure.
- Response Bias
Response bias results when those in one group tend to fail to respond to requests for significant information more often than those in another group.81 For instance, if those in the exposed group tend to fail to respond to questions on a questionnaire more often than those in the unexposed group, the result may be response bias.
- Uncontrolled Confounding
A confounder is a variable causally related to the effect and associated with the exposure in the study population, but not a consequence of the exposure.82 Confounders, when identified, are often controlled through the design of the study or through analysis of data. But when not identified, they remain uncontrolled, and become a source for a false positive association.
For example, in the polio-vaccine trials, the incidence of polio was clearly lower among unvaccinated children whose parents refused permission for injection than among children who received the placebo injection after their parents gave permission. Was the placebo injection of a purportedly harmless substance causing polio?
The answer is, it was not because a confounder was responsible for the difference. As it turned out, families who gave permission differed from those who did not in ways related to susceptibility to polio. Here are the reasons: (1) higher-income parents would consent to an injection more often than lower-income parents, and (2) children of higher-income parents are more vulnerable to polio than children of lower income parents. This is because polio is a disease of hygiene. Children who live in less hygienic surroundings tend to contract mild cases of polio early in childhood, while still protected by antibodies from their mothers. After being infected, they generate their own antibodies which protect them against more severe infection later. Children who live in more hygienic surroundings do not develop these protective antibodies.
4. Methods For Controlling Bias
For a valid epidemiologic study, it is essential to control for bias and confounding. But doing so effectively is difficult. More than technical expertise is required; also required is substantial insight into how the variables being studied might interact. So each epidemiologic study is only as reliable as the level of insight possessed by the epidemiologists who conduct that study. Unfortunately, before beginning a study, epidemiologists cannot always effectively specify all the variables they should control. As a result, they usually control for biases and potential confounders not through the design of the study but through what is called “analysis” of the data.83 For instance, they may analyze the data using techniques called “stratification” and “adjustment.” But even then they may find that their analyses work only when they have additional data beyond those from the study. Those additional data are usually absent or very limited. To meet this problem, they may have to resort to less satisfactory partial analyses or forgo any analysis for errors beyond sampling errors.
- Study Design
Before a study begins, experienced epidemiologists recognize that they need to control some potential confounders, such as gender or age. Usually, they will attempt to control these variables through the “design” of the study. For instance, they may design the study using “randomization,” “restriction,” or “matching.”
- Randomization
“Randomization” is a process by which subjects are “randomly” assigned from the population to the relevant study group.84 A random sample is one drawn in such a way that each member of the total group to be sampled has an equal chance of being selected. The goal of randomization is to create samples representative of the population. For instance, in cohort studies, randomization involves randomly assigning members who have been exposed to the exposed group and members who have not been exposed to the unexposed group. In case-control studies, randomization involves randomly assigning members who have the effect to the case group and members who have do not have the effect to the control group.
Unfortunately, “randomization” has limitations. For instance, it is usually effective only in large studies. That is, the distribution of risk factors tend to become identical in the various study groups only as the size of the study groups increases.
- Restriction
“Restriction” is a technique to control for potential confounding by admitting only certain subjects into the study.85 For example, if the potential confounder is gender, only females might be admitted into the study. Unfortunately, like randomization, restriction also has limitations. For instance, it can compromise the external validity of a study. Simply, it tends to homogenize the study group. This tendency limits the ability to apply the results of the study beyond the study group to the more diversely constituted general population.
- Matching
“Matching” is the process of selecting controls in a case-control study (or unexposed subjects in a cohort study) so that they are similar to the cases (or exposed subjects in a cohort study) on characteristics considered to be potential confounders.86 Matching may be of two types: (1) group (or frequency) matching and (2) individual matching. In group matching, the controls are selected in a way that the proportion of controls with a certain characteristic is identical to the proportion of cases with that characteristic. In individual matching, a control is selected similar to the case with respect to the characteristics considered to be potential confounders. The result of matching is a series of groups of cases and controls, each group matched on potential confounders.
Matching also has limitations. First, and most importantly, what characteristics are matched should be, most probably, risk factors for the effect. Beyond those characteristics, matching results in a problem called “overmatching.”87 That problem can underestimate the relative risk or odds ratio. For the defense, this result is generally not a problem. Second, in case-control studies, matching introduces a bias toward the null value (OR=1). That problem can only be removed through stratification. For the defense, this result is also generally not a problem.
- Analysis of Data
The analysis of raw data has several stages.88 First, the data are edited. Editing involves checking the data for accuracy, consistency and completeness. Second, the data are summarized. Summarizing usually involves pigeonholing the data using 2 by 2 tables. Third, measures of effect or association are estimated. This involves using confidence intervals and significance testing. It also involves controlling for systematic error and potential confounders, using techniques known as “stratification” and “adjustment.”
- Stratification
“Stratification” is a way to assess the effects of possible confounders, to control for certain forms of selection bias, and to evaluate and describe effect-measure modification.89 Stratification involves grouping the data from subjects into sub-samples based on whether a subject has or does not have a characteristic that is a potential confounder. For example, if the epidemiologic study is attempting to ascertain the association between diet and heart attacks, and gender is considered a potential confounder, the data can be stratified on the basis of gender. So all males will be in one stratum; all females will be in another stratum. For each stratum is calculated the measure of association between diet and heart attacks. That measure of association is termed “stratum-specific.” Contrast that measure with the measure of association calculated from the unstratified data, termed the “crude” or “pooled” estimate of association.
- Adjustment
“Adjustment” is a statistical procedure to minimize the effects of differences in the composition of the samples in the study.90 Adjustment usually occurs by means of “standardization” or “regression modeling.”
- Standardization
“Standardization” is a way to remove, to some extent, the effects of confounding variables, such as age or gender, when comparing two or more populations.91 For instance, suppose an epidemiologist wants to compare the risk of death in various groups of people in different geographical areas while controlling for the difference in the risk of death resulting when one group in one geographical area (Palm Springs, California) is older than those in another group in a different geographical area (Portland, Oregon). To do this, the epidemiologist standardizes the distributions of age in both Palm Springs, California and Portland, Oregon by using the distribution of age in a “standard area” instead of the actual distributions of age in Palm Springs and Portland. This process “adjusts” the rates of death in those two geographical areas by controlling the confounder of age. The adjusted rates of death are then compared, and any difference in the rates is attributed to something other than age.
- Regression Modeling
“Regression modeling” is a statistical technique used to control confounding when stratification becomes impractical.91 For instance, as the number of strata increase, the subjects in the sample may become sparsely distributed across each stratum. When this occurs, regression modeling becomes a more practical way to estimate association. In case-control studies, for example, the most popular regression model is “linear logistic regression.”93 Linear logistic regression quantifies the association between an exposure and an effect after adjusting for other potential confounders. That is, it finds an equation of those independent variables that best predicts the effect.
In the logistic model, the conditional probability that disease (D) will occur given an exposure (E) is represented by the following linear function:
where logit is an abbreviation of logarithmic unit; ln is the natural logarithm; P(DE) is the probability of disease given exposure; a,β are the regression coefficients; and E may be dichotomized as 0 for no exposure and 1 for exposure, or expressed as a continuous variable. From the preceding equation is derived the following expression for the odds ratio:
Λ
OR = eβ
This simple model can be extended to account for potential confounders (Vι) and effect modifiers (Wj) :
where a and β are the regression coefficients; γj is the regression coefficient for potential confounders and δj is the regression coefficient for effect modifiers. (These regression coefficients may be estimated by either of two approaches: discriminant analysis or, the preferred approach, maximum likelihood estimation.)
From the preceding equation is derived the following expression for the odds ratio:
OR = exp [b]
The odds ratio is expressed as the exponential of the sum of the main effect and the effect modifiers.
- Post-Study Reanalysis
After a study is completed, an epidemiologist (usually other than the one who originally conducted the study) may reanalyze the data to correct for bias or confounders.94 For example, reanalysis may involve a change of control groups, stratification or re-stratification of the data, a change from a two-tailed to a one-tailed significance test, or an analysis of additional confounders. Reanalysis does require access to the raw data. That access will often be denied for a variety of reasons, including the need to preserve the independence of scientific investigation, confidentiality, and the need to preserve the integrity of the data.
Analysis of data is as much art as it is science. This is so because no one has a gold standard for choosing between alternative explanations of that data. One epidemiologist will favor one analysis of the data; another epidemiologist will favor another analysis. As a result, reanalysis of the same set of data in epidemiology is, some say, similar to replication of experiments.95 Obtaining similar results helps validate the results. Obtaining different results from one analysis to another may be due to different data or it may be due to flaws in the original analysis. Reanalysis is a way to ferret out those analytical errors. Those errors include, for example, failure to control for certain confounders; failure to test for interactions of variables and failure to use samples of sufficient size to provide sufficient power to detect weak associations. Unfortunately, conflicting analysis of the same data will cast doubt on all analyses of those data.
In litigation, plaintiffs are fond of reanalysis. These re-analyses are rarely published, and so rarely subjected to peer-review. So for the defense, when plaintiff mentions the word “reanalysis”, suspect the worst.96 Even so, re-analysis of data is not per se objectionable. For example, an epidemiologist may validate a reanalysis by explaining why she chose only certain of the available data and disregarded the remainder. But, although not objectionable per se, reanalysis is objectionable when used as an epidemiologic sleight-of-hand to bamboozle a judge or jury. In that context, it often becomes a form of “data dredging.” That is, plaintiff’s forensic epidemiologist reanalyzes the data in a variety of permutations until an association is revealed (as it always will be) and then stops, declaring stentorially the importance of this association in proving plaintiff’s claims.
VII. Sampling Error
1. Sample and Population
Epidemiologists strive to identify what exposures or risks in the population are associated with what effects or diseases. Obviously, if they could measure each relevant variable in every member of the population, the result would be relatively precise epidemiologic conclusions about what exposures or risks are associated with what effects in the population. But, unfortunately, they usually cannot practicably measure every member of the population. It’s too expensive, and not everyone would co-operate. And so, epidemiologists look at subsets or samples of the population and measure instead the aspect of interest in just members of those samples. From that limited number of measurements, they then attempt to generalize those findings to the population.
2. Sampling Error
If every one in the population were the same (such as the population of electrons), what could be said about anyone could also be said about everyone. But experience demonstrates that this is rare. Usually, in some respects, one member of a population differs from every other member. As a result, epidemiologists recognize that when members of the population are randomly sampled, each member of the sample may differ from each other member of the sample and from the remaining members of the population. And if a number of samples are randomly selected from the population, each sample will differ, in some respects, from each other sample.
For example, of two hundred fifty random samples (each of 100 people) drawn from a population constituted of 46% of men and 54% of women in a health study, the number of men in each sample was as follows:
51 40 49 34 36 43 42 45 48 47 51 47 50 54 39 42 47 43 46 51 43 53 43 51 42 49 46 44 55 36 49 44 43 45 42 45 43 55 53 49 46 45 42 48 44 43 41 44 47 54 39 52 43 36 39 43 46 47 44 55 50 53 55 45 43 47 40 47 40 51 45 56 40 49 47 45 49 41 43 45 54 49 50 44 46 48 52 45 47 50 43 46 44 47 46 54 42 44 47 36 52 50 51 48 46 45 54 48 46 41 49 37 49 45 50 43 54 39 55 38 49 44 43 47 51 46 51 49 42 50 48 52 54 47 51 49 44 37 43 41 48 39 50 41 48 47 50 48 46 37 41 55 43 48 44 40 50 58 47 48 45 52 35 45 41 35 38 44 50 44 35 48 49 35 41 37 46 49 42 53 47 48 36 51 45 43 52 46 49 51 44 51 39 45 44 40 50 46 50 49 47 45 49 39 44 48 42 47 38 53 47 48 51 49 45 42 46 49 45 42 45 53 54 47 43 41 49 48 35 55 58 35 47 52 43 45 44 46
That each sample did not contain 46 men is due to a phenomenon called “chance”, the luck of the draw, that part of experience which is unpredictable. From this example is drawn an important lesson: a measurement on members of a sample will differ from a measurement on the members of the population. The difference is called “sampling error.”97 Sampling error is equal to the sample value minus the population value.
Most, if not all, epidemiologic studies involve measurements on some aspect of members of a sample and not on all members of the population. So if inferences from those measurements are to be accurate, an adjustment must occur for sampling error. If no adjustment occurs, an epidemiologist may find an association when, in fact, none exists (a false positive finding) or may fail to find an association when, in fact, one does exist (a false negative finding).
Even so, it is important to recognize that statistical techniques designed to assess sampling error do not assess whether the claimed results are due to systematic error or bias. As one epidemiologist remarked, “full epidemiologic analysis assesses bias, confounding, causation and chance. Of these, chance is least important but still receives most attention.”
When considering sampling, identify the following:
- Sample size;
- Number of samples of that sample size drawn from the population; and
- All possible random samples of sample size x from a population of size y.
3. Probability
Accounting for sampling error involves the concept of “probability.” Namely, what is the “probability” that what is true for the sample, given the phenomenon of sampling error, is also true for the population? But pinning down the meaning of the concept of probability is somewhat problematic. “Probability,” as a concept, has two basic conceptions: the conception of probability known as “relative frequency” and the conception of probability known as “subjective probability.”98 Each describes a method for assigning a probability to an event.
The conception of probability known as “relative frequency” is a method for assigning a probability to an event based on how often that particular event occurs in a process that generates a variety of events.99 Statisticians who favor this conception of probability are sometimes called “frequentists.” Frequentists believe that if a process (called a statistical experiment) is repeated n times and an event A is observed f times, then the probability of A is the quotient of f divided by n. But this quotient, under this conception of probability, is only an approximation. Fortunately, the approximate value is considered to approach the actual value if the statistical experiment is repeated many times. This is said to occur owing to the “Law of Large Numbers.” This law states that the probability the arithmetic mean Sn/n will differ from its expected value μ by more than ε approaches zero as n → ∞.
But what is the use of the frequentist conception of probability when the process of sampling is not repeated or not repeated often enough? For example, what is the probability that plaintiff Smith will lose her lawsuit against defendant Jones? A clever response to this concern is the conception of probability known as “subjective probability.” By this conception of probability, the probability assigned to an event is based not on the law of large numbers, but on subjective judgment or experience, limited, importantly, by the basic laws of probability.100 Those basic laws include, for instance, that the probability of an event lies in the range 0 to 1 and that the sum of the probabilities of all simple events is 1. Statisticians who favor this conception of probability are sometimes called “subjectivists” or, more esoterically, “Bayesians.”
“Probability is the very guide of life.”
Cicero, De Natura
4. Type I Error
Epidemiologists, to carry out their work, compare samples to determine whether these samples are different. That is, in case-control studies, they compare samples of cases with samples of controls. And in cohort studies, they compare samples of exposed subjects with samples of unexposed subjects.
In assessing these comparisons, epidemiologists can decide as follows:
(1) No difference exists, beyond that due to chance, between the samples. This assessment can be correct or incorrect. If it is correct, the epidemiologist has committed no error. If it is incorrect, the epidemiologist has committed what is called a Type I error.101 That is, the epidemiologist has falsely concluded that no difference exists, beyond that due to chance, between the samples. This is a problem plaintiffs typically dislike. The probability of committing this type of error is called “alpha.”
alpha = Pr (H is rejected | H is true)
Epidemiologists conventionally set limits on that value of alpha which they consider acceptable for purposes of drawing valid inferences from the sample about the population. The value of alpha is conventionally set at .01, .025, .05 or .10. Most epidemiologists favor an alpha set at .05 owing to an overriding desire to avoid false positive results.102 Those who want alpha to be set at .10 or even .20 are more concerned with avoiding false negative results, and more tolerant of the risk of false positive results–that is, that sampling error will be counted as falsely representing that the exposure is positively associated with the effect. This is a risk plaintiffs are, of course, happy to accept.
(2) A difference exists, beyond that due to chance, between the samples. This assessment can be correct or incorrect. If it is correct, the epidemiologist has committed no error. If it is incorrect, the epidemiologist has committed what is called a Type II error.103 That is, the epidemiologist has falsely concluded that a difference exists, beyond that due to chance, between the samples. This is a problem defendants typically dislike.
5. Type II Error
A Type II error occurs, as mentioned, when the epidemiologist erroneously fails to reject the null hypothesis. The probability of committing this type of error is called “beta.”104
Beta = Pr(H is not rejected | H is false)
The values of alpha and beta are interdependent. More precisely, they are inversely proportional. So, for a fixed sample size, the value of alpha cannot be lowered without raising the value of beta, and the value of beta cannot be lowered without raising the value of alpha. Beta is conventionally set at .20, a value higher than the value conventionally set for alpha (.05). This means it is more difficult to commit a Type I error than it is to commit a Type II error. Obviously, this is a situation defendants favor and plaintiffs disfavor.
Following is an illustration of a Type I and Type II error. Plaintiff’s expert assayed the serum of a case group of 249 women with silicone breast implants (SBIs), and two control groups, one of 47 healthy women without SBIs, and another of 39 women with autoimmune disorders without SBIs. The expert claimed that 9 of the 249 women with SBIs in the case group had elevated levels or “titers” of antibodies to “protein-silicone complexes,” but none of the women in the control groups did. From these results, he concluded that he had developed a serologic test that detected antibodies to protein-silicone complexes.
Was this expert’s conclusion supported by the results of nine positive results out of 249 women with SBIs and zero positive results in his control groups? Actually, it was not. The expert failed to account for the effect of chance in interpreting the results of his serology test. He could have had nine positive results by chance alone from a sample size of 249. He could have had no positive results by chance alone in the control groups owing to their much smaller sizes. Because the sample size of the control groups was so small, the statistical test on these samples had too low a “power” to rule out the possibility that the negative results were false negatives.
6. Assessing Sampling Error
Epidemiologists assess sampling error in two basic ways: (1) through “hypothesis testing” and (2) through “estimation.”105
- Hypothesis Testing
Hypothesis about the role of sampling error may be “tested” using either “critical values” or “probability values.”106
- Critical Values
Hypothesis testing using critical values is analogous to a test graded as either pass or fail. If sampling error likely accounts for the results of the study, the results are reported as “not statistically significant,” a failing grade. If sampling error does not likely account for the results of the study, the results are reported as “statistically significant,” a passing grade.
Hypothesis testing is designed to determine statistically whether the two samples (the case and controls groups or the exposed and unexposed groups) are from the same population or, instead, from different populations. (If they are from the same population, the exposure is not associated with the effect). This task is complicated by the phenomenon of “sampling error.” That is, two samples may appear different, yet be the same with the apparent difference being due to sampling error. The epidemiologist tackles this problem by making an assumption: she assumes that the two samples are simply random samples from the same population. That is, she assumes, as true, the “null hypothesis.”
To test this assumption, she will then compute what is called a “test statistic.” This test statistic, a random variable, is a function of two basic variables (1) the difference between the relevant values observed in the sample in the study and those values that would have been expected if the null hypothesis were true and (2) the amount of variability of the values from the sample in the study.107 For example, when rates and proportions are analyzed, involving discrete, binary variables (as they are in cohort and case-control studies) the test statistic is either chi-square, when the expected frequency of observations in each cell of the fourfold table is at least 5, or the total number of exposed cases observed in the study in the Fisher exact test, when the expected frequency of observations in each cell is less than 5.108
χ² = Sum of (observed – expected number of individuals in cell) ²
Expected number of individuals in cell
This “test statistic” is used to test the null hypothesis. First, in the process of this test, the epidemiologist will construct a “sampling distribution” of the “test statistic.” To do this, she will draw all samples that can be possibly drawn from the population, calculating the test statistic for each such sample. So for that population, this sampling distribution will represent the range of values of the test statistic. When the epidemiologist plots the values of the test statistic, for the sampling distribution, the y axis is calibrated with the frequency of a particular value of the test statistic (from 0 to 1) and the x axis with the value of the test statistic. (This plotted distribution will indicate the probability of obtaining a particular value of a test statistic on the basis of sampling error alone since all samples are assumed to be drawn from the same population.)
This range of values provides a standard of comparison. Against this standard is compared the value of the test statistic computed from the samples being investigated, for example, the case and control groups in the study. This is called the “observed value” of the test statistic. (Again, the epidemiologist is comparing the observed test statistic for the sample to determine whether it likely came from the same population or from a different population.) If the value of this test statistic is “small,” the epidemiologist concludes that the sample giving rise to that test statistic was drawn from the same population and so accepts the null hypothesis. If the value of the test statistic is “big,” she concludes that the sample giving rise to that test statistic was not drawn from the same population and so rejects the null hypothesis.
What is the boundary between small and big? This boundary is established by convention. The epidemiologist selects a cut-off value on the x axis of this sampling distribution. The points on the x axis on that side of the critical value nearest the tail of the sampling distribution is called the “rejection region.” This region usually contains only 2.5% of the possible values of the test statistic plotted in the sampling distribution. This rejection region represents a probability value called “alpha.” (Commonly used values for alpha are .01, .025, .05 and .10.) A value of a test statistic (plotted on the x axis) is considered “big” if it is larger than this cut-off value (also plotted on the x axis) and falls within this rejection region. For instance, if the cut off value is 1.9 and the value of the observed test statistic is 1.2, the test statistic is considered “big.” Then the epidemiologist would conclude that the value of the observed test statistic was “statistically significant,” and reject the null hypothesis.
On what is meant by “statistically significant” and “statistically non-significant,” consider the remark of a renown statistician:
“The proper inference from a statistically significant result is that a nonzero association or difference has been established; it is not necessarily strong, sizable or important, just different from zero. Likewise, the proper conclusion from a statistically non-significant result is that the data have failed to establish the reality of the effect under investigation. Only if the study had adequate power would the conclusion be valid that no practically important effect exists. Otherwise, the cautious “not proven” is as far as one ought to go.”109
- Probability Values
Besides testing hypotheses with critical values, an epidemiologist or statistician may test hypotheses using the “probability-value” (or P-value).110 This method is similar to the preceding method using critical values. In this process, the first step is to state the null and alternate hypotheses. The second step is to select the appropriate sampling distribution for the test statistic. The third step is to determine the rejection and non-rejection regions by setting “alpha.” The fourth step is to calculate the value of the test statistic from the observed sample, and translate that value on the x axis into a probability or P-value. The P-value is the probability of obtaining a value of the test statistic as large as, or larger than, the one computed from the data when the null hypothesis is true. The fifth step is to reject the null hypothesis if the P-value is less than the value of alpha or accept the null hypothesis if the P-value is greater than the value of alpha. With this method, the decision to reject or accept the null hypothesis is based on comparing the P-value (a probability value for the test statistic computed from the observed sample) with alpha (another probability value conventionally established before the comparison).
Hypothesis testing is widely used in epidemiologic studies, but it is subject to the following criticisms: (1) hypothesis testing using critical or P-values is all or nothing decision-making, (2) the critical and P-values convey no information about the extent to which two groups differ or two variables are associated, (3) highly significant P-values can accompany negligible differences (if the sample sizes are large) and unimpressive P-values can accompany strong associations (if the sample sizes are small).111
- One-Sided or Two-Sided Hypothesis Test
The hypothesis test may be either one-sided (one-tailed) or two-sided (two-tailed).112 If the hypothesis test is one-sided, the alternative hypotheses would be (1) the null hypothesis that either a negative or no association exists between exposure and effect; (2) the alternate hypothesis that the exposure is positively associated with the effect. In testing these hypotheses, if alpha is set at .05, the entire .05 would be allocated to right side or “tail” of the sampling distribution.
If the hypothesis test is two-sided, the alternative hypotheses would be (1) the null hypothesis that no association exists between exposure and effect and (2) the alternate hypothesis that an association, either positive or negative, exists between exposure and effect. In testing these hypotheses, if alpha is set at .05, the .05 would be split with .025 being allocated to the left side or tail of the sampling distribution and .025 being allocated to the right side of the sampling distribution. If alpha is set at .10, the .10 would be split, with .05 being allocated to the left tail and .05 being allocated to the right tail. Some epidemiologists may set alpha at .10 considering that at that level it is equivalent to a one-sided test with alpha set at .05.
If the direction of the tested association can be predicted in advance, use of the one-sided test is justified. Otherwise, the epidemiologist should use the two-sided test. The defense prefers the two-sided test, particularly when the outcome of interest is strictly one-sided. Plaintiffs, of course, prefer the one-sided test, arguing that no one in the litigation is asserting that the exposure has a potentially helpful only a harmful effect. For the defense, this can be a sensitive issue when the result of the epidemiologic study is a positive association, statistically significant with a one-sided test but not with a two-sided test. In that event, the reasons for using the two-sided test should be at the ready.
- Estimation
Estimation is a statistical procedure to account for sampling error. By means of estimation, a value of a sample statistic is used to estimate the corresponding value of the population parameter.113 An estimate may be a “point estimate” or an “interval estimate.” A “point estimate” is the value of the measure of association calculated from the samples used to estimate the measure of association for the population. For instance, the point estimate of the odds ratio for the sample might be 2.67. The point estimate, being an estimate of the value for the population or “parameter,” may or may not be accurate. It has a margin of error.
To quantify this margin of error, the epidemiologist constructs around the point estimate an estimated “interval.” This interval, called the “confidence interval,” brackets the parameter, to some degree of probability. This degree of probability is called the “level of confidence.” The level of confidence is expressed as a percentage as (1 minus alpha) 100%. This confidence interval provides a range of values among which it is hoped is the true value of the parameter. For instance, a 95% confidence interval for the odds ratio might be 1.03 – 6.9. The confidence level of 95% gives the percentage of all such possible intervals that will actually include the true parameter. Commonly used confidence levels are 99%, 95% and 90%. The width of the confidence interval (1.03 to 6.9) reflects the precision of the estimate. The narrower the interval, the more precise the estimate. The wider the interval, the less precise the estimate. The width of the confidence interval will narrow if the confidence interval is lowered or if the size of the sample is increased. Of course, the preferred alternative, to improve precision, is not to lower the confidence level but to increase the size of the sample.
The following formula indicates how many possible intervals exist for a population of a certain size with a sample of a certain size being drawn from that population:
Population size!
Sample size! (sample size – 1)!
The specific 95% confidence interval associated with a given set of data may or may not actually include the true parameter. No one can know for sure one way or the other. The specific 95% confidence interval obtained depends on the specific random sample drawn from the population. So each sample will have a different 95% confidence interval. But one can be sure that in the long run, 95% of all possible 95% confidence intervals will include the true parameter.
For example, if the exposure is benzene and the effect is leukemia, the results of a case-control study may be as represented in the following 2 by 2 table:
Benzene | ||||
Leukemia | Exposed | Not Exposed | Total | |
Yes | 24 | 6 | 30 | |
No | 90 | 60 | 150 | |
Total | 114 | 66 |
For these data, the estimated odds ratio is (24)(60)/(90)(6) = 2.67. The sampling distribution of the estimated odds ratio will be non-normal with a positive skewness. For skewed distributions, the log transformation will result in a distribution with a more normal shape. For this reason ln (OR) is used for calculating confidence intervals.114
A confidence interval estimate for the population odds ratio is obtained by first taking the loge of the estimated odds ratio, using the standard error of the ln (OR) to construct a confidence interval for ln (OR), and finally exponentiating to obtain a confidence interval for the odds ratio.
The log odds is ln (2.67) = 0.981. The estimated variance of the log odds is :
Var [ln (OR)] = 1/24 + 1/90 + 1/6 + 1/60 = 0.236.
The 95% confidence interval for ln (OR) is:
.0981 – 1.96√0.236 ≤ ln (OR) ≤ 0.981 + 1.96 √ 0.236
0.029 ≤ ln (OR) ≤ 1.93
Exponentiating to retransform the odds ratio to original units:
e 0.029 ≤ OR ≤ e 1.93
1.03 ≤ OR ≤ 6.9
Confidence intervals can also be used to test hypotheses.115 (This is an alternative to hypothesis testing). If the confidence interval associated with a set of data includes the null value for that statistic (e.g. 95% CI .90 –2.5), the epidemiologist concludes that insufficient evidence exists to reject the hypothesis of no effect with P > alpha. If the confidence interval does not include the null value for that statistic (e.g. 95% CI 2.5-6.9), the epidemiologist concludes that sufficient evidence exists to reject the hypothesis of no effect with P< alpha. Even so, the epidemiologist cannot say where within that interval lies the true parameter. So, if the confidence interval contains a null value for that statistic, the evidence is not sufficient to rule out the possibility that the null hypothesis is true.
Two advantages exists for testing hypotheses with confidence intervals: (1) the null hypothesis can be rejected when the confidence interval does not include the null value of the statistic; and (2) information about the magnitude of the value of the statistic is provided. It identifies small but non-null values for the statistic arising from large samples.116
Confidence intervals have a limitation. Namely, they are based on the assumptions of the relative frequency conception of probability. That is, for example, if the observation in the study was repeated 100 times, 95 of the observations would produce 95% confidence limits bracketing the true parameter. So the confidence limits of a single study do not allow one to say that the 95% confidence limits contain the true parameter with 95% probability.117
Today, most epidemiologists provide both confidence intervals and P-values. For example, in a retrospective cohort study about the relationship between exposure to silicone in silicone breast implants and sclerodema, the statistical values were reported as follows:
Relative Risk | 95% Confidence Interval | P |
1.84 | 0.98 – 3.46 | .060 |
This study indicates that the 95% confidence interval contains the null value of the statistic (RR=1) and so the null hypotheses cannot be rejected. The study also indicates that the relative risk of 1.84 was not statistically significant with a P-value of .060.
Each epidemiologic study should describe its statistical methods. Yet, typically, this description will be brief, too sketchy perhaps to help assess whether or not those methods are appropriate. Even with more detail, the reader may not be able to double check the statistical analysis without having the raw data of the study. In the typical case-control study, for example, the description of the statistical method is elliptical. For instance, in a study exploring the relationship between coffee consumption and pancreatic cancer, the authors described the statistical method as follows:
“Tests of significance and estimates of adjusted relative risks and their confidence limits were derived with the method of Mantel and Haenszel and its extension. That data were stratified by age in 10-year groups and by sex where appropriate. All confidence limits are 95 per cent intervals.”118
The test statistic is the Mantel-Haenszel test statistic following a chi-squared distribution with one degree of freedom.
Logarithms are used when the values of the independent variables are so significantly larger than the values of the dependent variables that it would be difficult to work with resulting disparately different magnitudes between the scales on the x and y axes.
The likelihood, when evaluated for a particular value of the parameter, can turn out to be a very small number. So it is more convenient to use the natural logarithm of the likelihood in place of the likelihood.
Hypothesis testing of the odds ratio using the chi-squared statistic follows a chi-squared distribution with one degree of freedom. That distribution is non-normal and positively skewed. If alpha is set at .05 the hypothesis test would be one-sided.
The confidence limits of an estimated odds ratio can be calculated using one of several methods.
To recap, with this method, the decision to reject or accept the null hypothesis is based on comparing the value of the test statistic for the observed sample with the value of the test statistic at that point on the x axis of the sampling distribution known as the “critical value.” The critical value sets a boundary point on the x axis. From that critical value along the x axis to the closest tail of the sampling distribution represents the values for which the null hypothesis will be rejected. These values on the x axis lie under the area of the sampling distribution called “alpha.” The area of the sampling distribution is called “alpha.” If the value of the test statistic computed from the observed sample falls on that portion of the x axis in the rejection region, then the epidemiologist will reject the null hypothesis. If otherwise, she will conclude the value of the observed test statistic was not statistically significant, and not reject the null hypothesis.
7. Power and Sample Size
Sometimes, epidemiologic studies will have the weakness of low “power.” “Power” is defined as the probability of correctly rejecting the null hypothesis.119 It is equal to one minus “beta.” Beta is the probability of erroneously accepting the null hypothesis. The power of a statistical test should be as high as possible. The higher the power, the greater likelihood small effects, if they exist, will be detected.120
Power can be increased by reducing beta. The smaller the value of beta, the greater the value of one minus beta, or power. But because the value of beta is interdependent with the value of alpha, given a fixed sample size, reducing the value of beta increases the value of alpha. The greater the value of alpha, the greater the probability of erroneously rejecting the null hypothesis. This is a dilemma, but a dilemma with a solution. The solution is to increase the size of the sample. Only by doing that can power be increased without sacrificing precision.121 Yet herein lies a further dilemma. When the effect and the exposure are both rare, achieving adequate power requires very large samples.122
The optimal size of a sample is usually determined by formulas. To determine the smallest value for a measure of association that can be detected with a specific power, given a fixed sample size, consult S.D. Walter, Determination of Significant Relative Risks and Optimal Sampling Procedures in Prospective and Retrospective Comparative Studies of Various Sizes. Am. J. Epidemiology, 105:387-397 (1977). The size of the sample should be assessed not only at the time of its selection but also throughout the course of the study.
An epidemiologic study will often refer to its power to detect quotients of various strengths. For instance, in an epidemiologic study on breast implants was the following remark about the power of the study:
“For breast implants, considering only white cases and controls, we had 80% power to detect an OR of 4.0, assuming alpha = 0.05 (2 sided) and a prevalence of breast implants of 10/1000. When considering all cases and controls in the analysis, we had 80% power to detect an OR of 2.0 or greater for a given risk factor, or an OR of 0.29 or smaller for a given protective factor, assuming a prevalence of exposure of 5/100 among controls (e.g., the approximate prevalence we observed for exposure to silicone caulk, glues, or sealants was 4/100).”
What is the appropriate ratio of controls to cases? When the number of cases is limited, increases in the number of controls increases power until a ratio is reached of 4 to 1 or 5 to 1. After that, gains in power usually become too small to be worthwhile. Under equal allocations to the case and control groups, when the power is extremely small (<0.1) or large (>0.9), increasing the number of controls will be unhelpful.123
8. Meta-Analysis
To overcome the flaw of low power, an epidemiologist may resort to a technique that combines the results of several epidemiologic studies. This technique is called “meta-analysis.”124 Meta-analysis involves the following six basic steps: (1) defining the problem and the criteria for admitting epidemiologic studies into the meta-analysis; (2) locating these epidemiologic studies; (3) classifying and coding important characteristics of those studies; (4) quantitatively measuring those characteristics on a common scale; (5) aggregating the findings of those studies and relating those findings to the characteristics of the studies; and (6) reporting the results of the meta-analysis. Good examples of meta-analysis are the studies by Heyland, D.K. et. al. Total Parenteral Nutrition in the Critically Ill Patient. JAMA, 280:2013-2019 (1998) and by Janowsky E.C. et al. Meta-Analyses of the Relation Between Silicone Breast Implants and the Risk of Connective-Tissue Diseases. NEJM, 342:781-190 (2000).
Statistical analysis of the results of those various studies would include the following: (1) calculating summary descriptive statistics across the epidemiologic studies and averaging those statistics; (2) calculating the variance of a statistic across studies; (3) correcting the variance by subtracting sampling error; (4) correcting the mean and variance for study artifacts other than sampling and measurement errors; and (5) comparing the corrected standard deviation to the mean to assess the size of potential variation across those studies.125
Meta-analysis of observational studies has its detractors.126 As a prominent epidemiologist remarked, “the meta-analysis of non-randomized observational studies resembles the attempt of a quadriplegic person to climb Mount Everest unaided.” The reason for this pessimism is that in observational studies, particularly case-control studies, are likely found biases and unmeasured confounders. These serious deficiencies cannot be overcome by carefully combining the data. Moreover, in meta-analysis, it’s important to have identified all relevant studies. That task, unfortunately, cannot always be accomplished owing to what is called “publication bias.”127 Publication bias refers to the phenomenon in which studies with positive results are more likely to be published than those with null or negative results. This bias concerns the defense. Simply, it tends to result in finding, through meta-analysis, that the exposure is positively associated with the effect.
The accuracy of meta-analysis can be evaluated by comparing its results to a gold standard. In medical research, the gold standard is the large randomized, controlled clinical trial. So meta-analysis would become more credible if the results of meta-analyses agreed with the subsequent outcomes, on the same topic, of large randomized, controlled clinical trials. That prospect was recently studied, and it was found that meta-analyses failed to predict the outcomes of twelve large randomized, controlled trials 35% of the time.128 Obviously, that result raises serious questions about the accuracy of meta-analyses.
Yet that study does not entirely undermine use of meta-analysis. It does suggest that summarizing information from these various epidemiologic studies into a single odds ratio may be unproductive. But meta-analysis may be productive to the extent it facilitates a careful analysis of the relevant epidemiologic studies, evaluates the consistency of their results, and finds these results to be consistent.129 So the defense should expect an epidemiologist conducting a meta-analysis to individually evaluate each study for possible biases and confounders, and, on the basis of those evaluations and the quality of the study, consider the evidential weight of each study in the overall analysis and then identify the probable direction and magnitude of the biases in the estimate of the summary odds ratio or relative risk.
9. “Alpha” and the Burden of Proof
The level of alpha is sometimes equated with the legal burden of proof.130 Those who equate the two often complain that to set alpha at .05 is too stringent a requirement in light of a much less stringent requirement for the legal burden of proof set at .5+. But this criticism misconstrues both alpha and the legal burden of proof. The two simply do not equate.131
The legal burden of proof is that weight of evidence, of all the evidence introduced into evidence, which plaintiff must have in its favor to prevail on that issue of fact. In a civil trial, that weight is to be “the greater weight.” So if the total weight of all the evidence introduced is 100% of the weight, then the greater weight is more than 50% of that total weight.
That legal burden of proof entails a measure of probability which the jury applies subjectively. But the jury does not apply that measure according to the conception of probability known as “subjective probability.” Nor does it apply that burden to each piece of evidence. Instead, it applies that standard after all the evidence is introduced, and then only to the corpus of evidence for each party. Moreover, the jury is typically instructed not to base its decision on speculation.
The level of alpha is different. Alpha is a variable in a calculation to determine sampling error. It means, if set at .05, that if the exposure is assumed not to cause the effect, less than a 5% chance exists of observing data generating this particular value of the test statistic. Obviously, then, a concern about alpha is a concern about the extent to which sampling error (that is, speculation) underlies a conclusion about an association between exposure and effect. Again, in that context, alpha is a concern about one piece of evidence. If many pieces of evidence were introduced into evidence, each with alpha set at .05, the corpus of evidence would most likely be utterly speculative. That is, if the probability of any piece of evidence being due to chance or speculation is .05, the probability of the corpus of evidence being due to chance is [1-(.95)n]with n representing the total of the various pieces of evidence.
Sampling error, moreover, is merely one variable among many that may affect the validity of a conclusion about an association and, derivatively, about causation. Other variables affecting validity include, for example, systematic bias, confounding, and the use of inappropriate statistical analyses. The legal burden of proof is concerned indirectly with alpha, but also with these other variables. So the legal burden of proof is better equated with the probability of having certain values for all these variables than solely with the value of alpha.
VIII. Causation
Once an exposure has been revealed to be associated with an effect, epidemiologists are often the interested in building upon that association in an effort to argue that the exposure caused the effect. In this effort epidemiologists are often guided by criteria known as the Bradford-Hill criteria. If an epidemiologic study reports that the exposure is positively associated with the effect, plaintiff will certainly assert that this reported association demonstrates that the exposure causes the effect. If the study is a case-control study, that assertion will almost always be premature and probably ultimately incorrect. Of course, if an epidemiologic study reports that the exposure is not positively associated with the effect, defendant will assert that the study does truly demonstrate that the exposure does not cause the effect. In so doing, defendant will likely be justified.
1. Causation Defined
Causation is a concept with two important conceptions: the legal and the scientific conceptions of causation.
- Legal Conception of Causation
A prima facie element of all personal injury cases, to be proved by a preponderance of the evidence, is that the alleged tortious conduct caused the alleged harm or injury. Legal causation is assessed with either the “but for” or the “substantial factor” rule.
- “But For” Rule
The “but for” rule provides that defendant’s conduct is a cause of the event if that event would not have occurred but for defendant’s conduct, that is, if that event would not have occurred without defendant’s conduct.148 The “but for” rule has uncertain application when, for example, two defendants each initiate a cause and those two causes concur to bring about an event and neither cause operating alone would have been sufficient to cause that event.
- Substantial Factor Rule
The “substantial factor” rule was developed to avoid that uncertainty. It provides that defendant’s conduct is a cause of the event if that conduct was a material element and substantial factor in bringing it about.149 This rule can be also stated as providing that when the conduct of two or more actors is so related to an event that their combined conduct, viewed as a whole, is a “but for” cause of the event, and application of the “but for” rule to them individually would absolve all of them, the conduct of each is a cause of the event.
- Scientific Conception of Causation
The scientific conception of causation is varied and invariably complex.150 Given that, the aim here is not to outline in detail various philosophically sophisticated explanations of the concept of causation. Instead, it is merely to provide notice that the concept of causation in science, because it is varied and complex, should not be taken for granted. Yet, however nuanced, that scientific conception certainly contemplates that the cause of any event or effect must consist of a constellation of components acting in concert. A cause of an effect, then, is a set of minimal conditions and events that inevitably produce the effect.151
- Conditional Theory of Causation
Some conceive of causation in terms of “sufficient” and “necessary” conditions. Cause A is a “necessary” condition of effect B if it is not the case that A exists, then it is not the case that B exists. Cause A is a “sufficient” condition of effect B if it is the case that A exists, then it is the case that B exists. A causes B just when A is necessary and sufficient for B. On its face, this definition of causation has the merit of logical precision but, ultimately, it is too narrow.152
- J. L. Mackie’s Modification of the Conditional Theory
J.L. Mackie, a philosopher at Oxford, refined the conditional theory of causation.153 He proposed that A is the cause of B when, given certain other conditions, A is sufficient for B. That is, “in the circumstances,” A is sufficient for B. While A is itself not necessary for B, it is a necessary part of a wider condition. And while A is not sufficient for B, this wider condition is sufficient for B. That is, A is an insufficient but necessary part of a set of conditions which is sufficient though not necessary for B.
- Counterfactual Theory of Causation
Some want to highlight the fact that an event could be caused by any number of other events, but that that event would not have occurred in the circumstances at hand had a particular event not occurred.154 This particular causative event is a necessary condition in these circumstances for the effect. This necessary condition in these circumstances is termed a “counterfactual” condition. That is, if A caused B, then, given the circumstances, if A had not occurred, B would not have occurred. A caused B when a chain of counterfactually dependent events link A and B. An epidemiologist would say, then, that “a cause of a disease event is an event, condition, or characteristic that preceded the disease event and without which the disease event either would not have occurred at all or would not have occurred until some later time.”
The “cause” of an effect is an antecedent event, condition, or characteristic necessary for the occurrence of the effect given that other conditions are fixed.
“Causal co-action” or “joint action” is the participation of two component causes in the same sufficient cause to produce the effect.
The cause is a part of a wider set of conditions which suffices for its effect.
The cause of any effect must consist of a constellation of component causes acting in concert. For biological effects, most of the components are unknown.
A “sufficient” cause is a set of minimally necessary conditions and events inevitably producing the effect.
The “causal complement” of a factor is the set of conditions necessary and sufficient for the factor to produce the effect.
2. General and Specific Causation
“General causation” is distinguished from “specific” causation. General causation refers to the causal relationship between the exposure and the effect: Does the exposure cause this effect in those exposed? For instance, does L-tryptophan manufactured by Showa Denko KK cause connective tissue disorders in people who ingested Showa Denko KK’s L-tryptophan? Proof of general causation often requires epidemiologic evidence.155 When epidemiologic studies are proffered to prove general causation, plaintiff must show that these studies “fit” the issues of causation in this particular case.156 That is, she must establish, for example, that she was exposed to the same substance as the subjects in the epidemiologic studies, that the exposure or dose levels were comparable, the exposure occurred before onset of the disorder and onset of the injury was similar to those in the studies.
“Specific causation” refers to the causal relationship between an exposure and an effect, given a relationship of general causation, in a particular individual: Did this particular exposure of this plaintiff cause this particular effect? For instance, did the L-tryptophan that plaintiff bought from this store and consumed by her on this date cause these particular signs and symptoms diagnosed by her physician as the connective tissue disease known as scleroderma?
Epidemiological studies cannot be proffered to directly prove “specific causation.”157 This is so simply because epidemiologic studies indicate what on the average is occurring in the study population. In that population, for some people, the exposure is associated with the effect; for others, it is not. That is, some people have a predisposition which, when they are exposed, results in their susceptibility to produce the effect. Others lack that predisposition. Indeed, some would have developed the effect even if not exposed.
These kinds of distinctions for specific people are not sorted out in epidemiologic studies. Instead, epidemiologic studies can only demonstrate that, generally, an exposure is associated or is not associated with an effect. So, in keeping with that, an epidemiologist should not be allowed to testify about specific causation except to acknowledge that she has no expertise in clinical medicine to assess whether or not a particular exposure caused a particular effect.
3. Proof of General Causation
Proof of general causation involves two basic considerations. First is, what rules of inference are appropriate in assessing whether the epidemiologic data are a sign that this kind of exposure can cause this kind of effect? Second is, at trial, what legal rules apply in proving causation?
- Rules of Inference
How does the epidemiologist bridge the gap between the results of an epidemiological study reporting a positive association and the conclusion that the exposure caused the effect? One way is by the process of induction.158 Adherents of this process are called “inductivists.” Another way is by the process of deduction.159 Adherents of this process are called “deductivists.” Epidemiologists, in approaching the issue of assessing causation, tend to be either inductivists or deductivists.
- Inductive Criteria
Inductivism is the doctrine that science begins with observations and then moves to generalizations about those observations in the form of laws and theories and then to predictions entailed by those theories and then to tests of those predictions with further observations to determine whether the theory is valid. For example, after observing that all ravens they have ever seen are black, ornithologists infer the law that all ravens are black, and predict that all ravens anyone will ever see will be black, and then confirm or disconfirm that prediction through further observations of ravens.
To infer causation, epidemiologists often rely upon inductive criteria. The original inductive criteria fashioned to establish causation from exposure to biological agents are known as the Henle-Koch postulates. They are: (1) the parasite occurs in every case of the disease in question and under circumstances which can account for the pathological changes and clinical course of the disease; (2) it occurs in no other disease as a fortuitous and nonpathogenic parasite; and (3) after being fully isolated from the body and repeatedly grown in pure culture, it can induce the disease anew.160
Over the years these Henle-Koch postulates were adapted to include a variety of exposures. Currently they have been generalized into widely used set of inductive criteria termed the “Bradford-Hill” criteria.161 Here are those nine criteria:
(1) Strength of Association: The stronger the association, the more likely the exposure caused the effect. A strong association is unlikely due to one weak unmeasured confounder or other source of modest bias. This criterion is neither necessary nor sufficient for causation.
(2) Consistency: Repeated observation of an association in different populations in different circumstances suggests causation. Ideally, many studies with different architecture should produce results that converge.162 When that occurs, the corpus of epidemiologic evidence can be said to be reliable. Consistency, some say, is not a necessary criterion of causation, but serves only to rule out hypotheses that the association is attributable to some factor that varies across studies.
(3) Specificity: Specificity requires that an exposure, if causal, produce a single effect, not multiple effects. This criterion, many epidemiologists recognize, is neither necessary nor sufficient for causation. Simply, single events or conditions may have many effects.
(4) Temporality: The cause must precede the effect. This criterion is necessary for causation. But it is not sufficient: As Shakespeare wrote, “I have heard the cock, that is the trumpet to the morn, doth with his lofty and shrill-sounding throat awake the god of day… .”
(5) Biologic Gradient: Biologic gradient refers to the presence of a monotonic, that is, unidirectional, dose-response curve. This criterion is neither necessary nor sufficient for causation.
(6) Plausibility: Plausibility refers to the biologic likelihood that the exposure caused the effect. This criterion is neither necessary nor sufficient for causation.
(7) Coherence: Coherence, like the criterion of bioplausibility, requires that the hypothesis about causation not conflict with what is known about the natural history and biology of the disease. This criterion is neither necessary nor sufficient for causation.
(8) Experimental Evidence: Experimental evidence from available sources corroborates the hypothesis of causation. This criterion is neither necessary nor sufficient for causation.
(9) Analogy: Analogy refers to the similarity between the association at issue and other associations which are considered more firmly to be cause-and-effect relationships. This criterion is neither necessary nor sufficient for causation.
Some epidemiologists have taken these Bradford-Hill criteria and categorized by them by the weight of their importance.163 This scheme of categorization is as follows:
“Guidelines for Evaluating the Evidence of a Causal Relationship. (In Each Category, Studies are Listed in Descending Priority Order).
1. Major Criteria
a. Temporal Relationship: An intervention can be considered evidence of a reduction in risk of disease or abnormality only if the intervention was applied prior to the time the disease or abnormality would have developed.
b. Biological Plausibility: A biologically plausible mechanism should be able to explain why such a relationship would be expected to occur.
c. Consistency: Single studies are rarely definitive. Study findings that are replicated in different populations and by different investigators carry more weight than those that are not. If the findings of studies are inconsistent, the inconsistency must be explained.
d. Alternative Explanations (confounding): The extent to which alternative explanations have been proposed is an important criterion in judging causality.
2. Other Considerations
a. Dose-response relationship: If a factor is indeed the cause of a disease, usually (but not invariably) the greater the exposure to the factor, the greater the risk of the disease. Such a dose-response relationship may not always be seen because many important biologic relationships are dichotomous, and must reach a threshold level for observed effects.
b. Strength of the Association: The strength of the association is usually measured by the extent to which the relative risk or odds depart from unity, either above 1 (in the case of disease-causing exposures) or below 1 (in the case of preventive interventions).
c. Cessation Effects: If an intervention has a beneficial effect, then the benefit should cease when it is removed from a population (unless a carryover effect is operant).”
“Despite the apparent simplicity of many of these criteria, many epidemiologists would probably agree that [the Bradford-Hill] criteria are not totally adequate, that they provide few hard and fast rules for making causal inferences.”
- Deductive Criteria
Deductivism is the doctrine that science begins with hypotheses and then moves to observations that can either confirm or disconfirm those hypotheses. For instance, the most influential modern deductivist, Karl Popper, believed that scientists postulate an hypothesis, an uncorroborated conjecture, and then compare its predictions with observations obtained through testing to see whether it is confirmed or disconfirmed. If the test produces data inconsistent with the conjecture, then the conjecture is refuted or falsified. If the test produces data consistent with the conjecture, then scientists continue to favor itnot as proven, but not as refuted.164 This notion is the crux of deductivism. Simply, deductivism aims at finding the truth only in a limited sense. That is, it only proceeds to rule out false theories. According to Popper, it is conjectures all the way down. So what matters in science is not the foundation of a conjecture but the quality of a conjecture. And what distinguishes conjectures in science from conjectures in other disciplines is that conjectures in science can be falsifiable.
Some epidemiologists, the inductivists, resort to positive evidence to establish causation. When an epidemiologist has controlled for all sources of error she can identify, she then faces the possibility of unidentified sources of error. Positive evidence in the form of inductive criteria such as the Henle-Koch or Bradford-Hill criteria address these unspecified sources of error in the process of assessing causation. But these inductive criteria, while they help evaluate the likelihood of such unspecified bias, cannot eliminate that possibility. Instead, they are designed merely to help evaluate the likelihood of such bias. So, in the face of unspecified sources of error, these inductive criteria help provide a basis for concluding the exposure causes the effect.
But, say the deductivists, these positive criteria are nothing more than untestable tautologies.165 This is so because any statement about the existence of an unspecified source of error is untestable. Both inductivists and deductivists would agree that inductive criteria cannot establish that an association is certainly causal. But inductivists would part ways with deductivists when deductivists maintain that inductive criteria cannot even establish that an association is even probably causal, owing to the fact that the inductive criteria are nothing more than untestable tautologies. To infer probabilistically, argue deductivists, an epidemiologist needs to know something about the universe or sample space to which the inference applies; in this case, that universe would be all possible associations between exposures and effects. How could any epidemiologist hope to ascertain that? Assume all possible associations have been identified, how could the epidemiologist then determine which are causal and which non causal? She cannot point to a list of “established” causal associations to assess causation. Obviously, if she could, she would not need the inductivists criteria as a surrogate for her gold standard. Given this apparent dilemma, deductivists are quite skeptical of any claim that the results of an epidemiologic study demonstrate that an exposure caused an effect.166 Of course, this skepticism is quite disconcerting to plaintiffs seeking to prove that an exposure caused an effect.
But , as a concern to the defense, just as no positive evidence exists to support an assertion of causation, no “negative” evidence exists, say the deductivists, to support an assertion that an association is not causal. Suppose an epidemiologic study results in a positive association between exposure and effect. The deductivist would assert that this association may be due to errors in the study. But the deductivist would further assert that the mere fact that error is possible is not a basis for concluding that no true association exists. Some of these errors may be subsequently identified. But even if further errors are identified, that discovery is not a basis for concluding that no true association exists. The direction of those errors would also need to be identified. But even then, that the direction of the error is identified is not a basis for concluding no true association exists. The effect of the error on the magnitude of the odds ratio would also need to be ascertained. But even then, that the magnitude of the error is identified to be insufficient to account for all the magnitude of the association is not a basis for concluding that an association is thereby proof of causation. Unidentified errors may still exist in the study. Even so, say the deductivists, this possibility is not a basis for concluding no true association exists or that the association is not representative of causation.
The deductivist’s alternatives to the inductive criteria of Bradford-Hill are simply “predictability” and “testability.”167
(1) Predictability: Predictability means that once a hypothesis about causation has been proposed, certain kinds of predictions can be deduced from it in order to compare those predictions with empirical observations.
(2) Testability: Testability means that the predicted consequences of the hypothesis are capable of conflicting with observations and that everything has been done to improve the opportunity for those conflicts.
But again all these criteria can hope to accomplish, if satisfied, is to allow one to say that a theory is not an untestable tautology but instead a conjecture that can be falsified if the relevant data came to be observed.
- Legal Rules
As a matter of policy, courts must adopt the view of the inductivists in order to enable plaintiffs an opportunity to prove causation through epidemiologic studies and other types of evidence. But what a court may grant as a matter of policy in one realm it may take away as a matter of public policy in another. An increasing number of courts have ruled that the results of an epidemiologic study cannot be admitted into evidence unless the measure of association has a quotient greater than 2.168 They reason that when that quotient is greater than two, the probability is “more likely than not,” in keeping with plaintiff’s burden of proof, that the exposure “caused” the effect.
This judicial inference needs parsing. A quotient (RR or OR) greater than 2 indicates that of the total number of individuals with the effect subject to study, more than half (51%) have effects associated with the exposure and less than half (49%) have effects associated with other background risks. Obviously, epidemiologic evidence satisfying this requirement should not be considered, by that fact alone, “sufficient” proof of general causation. It should only be considered, by that fact alone, a “necessary” constituent of a greater body of evidence introduced to prove general causation.
This judicial rule excluding epidemiologic studies with a quotient less than 2+ is considered a boon to the defense. It keeps from the jury epidemiologic studies with weak associations which may otherwise be difficult to rebut. Yet the inferential leap implicit in this judicial rule – equating a measure of association with a quotient greater than two with the standard of proof by a preponderance of the evidence – carries with it some subtle assumptions, assumptions which the defense needs to occasionally challenge. A working assumption is that the study is internally valid–that is, the strength of the association is not due to sampling error, systematic bias or unaccounted for confounders. Certainly the defense will argue that one or more of these problems is at play, and so the quotient cannot be taken at face value. Another subtle assumption is that a quotient greater than two is the sine qua non indicia of causation. This rule, if the jury becomes aware of it, also tends to lend the imprimatur of the court to the validity of the epidemiologic study: “This study has met the benchmark of the court and merits your undivided attention as proof of causation.” Neither of these assumptions the defense should let go unchallenged. Another nettlesome assumption is that a quotient of 2.1 is more than adequate proof of causation. Yet many epidemiologists believe that the quotient should be three or four or even greater in order to suggest that an association indicates causation. This is a chorus which the defense should join.
And although this judicial rule is overall a boon to the defense, it rests more on judicial policymaking than on logic. Simply, the standard of proof in the courtroom is not conceptually equivalent to a measure of association with a quotient greater than 2.169 Proponents of this judicial rule appear to equate the conception of probability inherent in the particular standard of proof known as “by a preponderance of the evidence” with the conception of probability which some extract from a measure of association with a quotient greater than two.
Neither conception appears to track with any formal conception of probability. First, plaintiff’s standard of proof is not semantically equivalent to any formal conception of probability. Jurors, when instructed on the concept of burden of proof, are not told to apply that concept in a way consistent with “subjective” (or Bayesian) probability. The jury instruction on the burden of proof is couched in a general way: for instance, “the preponderance of evidence is such evidence that, when weighed with that opposed to it, has more convincing force and is more probably true.” In that context, the jury considers plaintiff’s evidence relative to defendant’s evidence not relative to a wider context of beliefs. The jury is instructed to consider only that information “in evidence.” No jury is instructed on Bayes’ formula170 or about the basic rules of formal probability such as (1) that the probability of an event occurring measured by some number 0 and 1 should be equal to 1 minus the probability of the event not occurring or (2) that the probability of several independent events all occurring is the product of their respective probabilities.
Second, a measure of association with a quotient greater than two does not equate, by itself, with any formal conception of probability. (It could with additional data become a variable in an assessment of subjective probability). By itself, it merely means that of the total number of individuals with the effect subject to study, more than half (51%) have effects associated with the exposure and less than half (49%) have effects associated with other background risks. That quotient from this particular epidemiologic study is but one piece of evidence out of many pieces the jury will likely consider. The only relation of the strength of an association to the task of the jury in determining whether the plaintiff’s evidence on general causation is more probably true than not true is that, as one indicia among the many of Bradford-Hill criteria, the greater the strength of association, the more likely that the association is not due to systematic bias or unaccounted for confounders. But, as a renown epidemiologist notes, that single indicia is neither necessary nor sufficient for causation:
“The fact that an association is weak does not rule out a causal connection. A commonly cited counterexample is the relation between cigarette smoking and cardiovascular disease: One explanation for this relation being weak is that cardiovascular disease is common, making any ratio measure of effect comparatively small compared with ratio measures for diseases that are less common. Nevertheless, cigarette smoking is not seriously doubted as a cause of cardiovascular disease. * * * A strong association serves only to rule out hypotheses that the association is entirely due to one weak unmeasured confounder or other source of modest bias.” 171
Despite these criticisms of the judicial rule, it is a rule the defense should vigorously continue to support. An epidemiologic study reporting a weak association can be unduly persuasive to a jury. Simply, a jury is apt to equate a finding of a positive association, no matter how weak or unreliable, with a finding of causation. Sadly, it is the lax kind of thinking to which juries are prone.172
The standard for admitting proffered evidence into evidence is different from the standard of proof for the jury in resolving factual issues.
Most courts instruct the jury not to decide the issues of fact on the basis of guesswork, conjecture or speculation.
IX. Endnotes
1. Liddell, H.G. and Scott, R. Greek-English Lexicon (Oxford, 1983) Some would say it is the study (“logos”) of what is among (“epi”) the people (“demos”).
2. Epidemiology is the study of the incidence, prevalence, distribution and etiology of states of health in the population. Lilienfeld, D.E. Definitions of Epidemiology. Am.J. of Epidemiology, 107: 87-90 (1978); Last, J. L. A Dictionary of Epidemiology (2d ed Oxford, 1988).
3. Haggerty v. Upjohn Co., 950 F. Supp. 1160 (S.D. Fla. 1996) (“epidemiological studies analyze the incidence, distribution and etiology of diseases in the human population, and are an important factor in determining the admissibility of an expert’s opinion on causation”).
4. Epidemiologic evidence is not armor plated; it has chinks that at times make it less than reliable or persuasive. The gold standard epidemiologic study is the prospective cohort study. Unfortunately, this kind of study is expensive and requires lengthy follow up. This is especially true if the exposure and the effect are both rare. Then the size of the cohort will have to be extremely large. And that means enormous and probably prohibitive expense. So, in lieu of the prospective cohort study, the results of case-control studies are apt to be proffered. But this kind of epidemiologic study is notoriously unreliable, and often has power too low to rule out false negative findings. To solve the problem of low power, an epidemiologist might undertake meta-analysis, but this kind of analysis is also unreliable, as demonstrated by the comparison of its results with those of the platinum standard of the controlled clinical trial.
5. Hoffman, R.E. The Use of Epidemiologic Data in the Courts. Am. J. of Epidemiology, 120: 190-202 (1984); Kubs v. United States, 537 F. Supp. 560 (E.D. Wis. 1982) (no epidemiologic studies established a relationship between the swine flu vaccine and polymyalgia rheumatica; plaintiffs failed to prove causation by a preponderance of the evidence); Sorenson v. Shaklee Corp., 31 F.3d 638, 643 n.8 (9th Cir. 1994) (epidemiology is an accepted scientific discipline dealing with the integrated use of statistics and biological/medical science”); DeLuca v. Merrell Dow Pharm., Inc., 911 F2d 941, 954 (3d Cir. 1990), aff’d, 6 F.3d 778 (3d Cir. 1993), cert. denied, 510 U.S. 1044 (1994) (“the reliability of expert testimony founded on reasoning from epidemiological data is generally a fit subject for judicial notice; epidemiology is a well-established branch of science and medicine, and epidemiological evidence has been accepted in numerous cases”); Wilson v. Merrell Dow Pharmaceuticals, Inc., 893 F.2d 1149, 1154 (10th Cir. 1990) (epidemiologic evidence is the best evidence of general causation in mass toxic tort cases); Hall v. Baxter Healthcare, Corp., 947 F.Supp. 1387 (D. Or. 1996) (epidemiology is the medical science devoted to determining the cause of disease in human beings; the existence or non existence of relevant epidemiology can be a significant factor in proving general causation in toxic tort cases); Kelly v. American Heyer-Schulte Corp., 957 F. Supp. 873 (W.D. Tex. 1997) (stating that “while epidemiological evidence is not a necessary element in every toxic tort case, it is certainly a very important element, especially when there is no evidence of the biological mechanism which links the product to the complained-of condition”); Hopkins v. Dow Corning Corp., Inc., 33 F3d 1116 (9th Cir. 1994) (proof of general causation may be based on animal studies and biophysical data absent a solid body of epidemiological data).
6. National Bank of Commerce v. Dow Chemical Co., 965 F. Supp. 1490 (E.D. Ark. 1996) (defining cohort and case-control studies); Szklo, M. Design and Conduct of Epidemiologic Studies. Preventive Medicine, 16: 142-149 (1987).
7. Abramson, J.H. Cross-sectional Studies in Detels, et al. (eds) Oxford Textbook of Public Health, chpt. 8 (3d ed. Oxford, 1997).
8. Feinleib, M. et. al. Cohort Studies in Holland, W.W. et. al. (eds) Oxford Textbook of Public Health, chapter 11 (2d ed. Oxford Univ. Press, 1991).
9. Ibrahim, MA and Spitzer, WO. The Case-Control Study: the Problem and the Prospect. J Chronic Dis, 32: 139-144 (1979); Cole, P. The Evolving Case-Control Study. J Chronic Dis, 32: 15-27 (1979); Lilienfeld, AM and Lilienfeld, DE. A Century of Case-Control Studies: Progress? J Chronic Dis, 32: 5-13 (1979); Feinstein, AR. Methodologic Problems and Standards in Case-Control Research. J Chronic Dis, 32: 35-41 (1979); Schlesselman JJ. Case-control Studies: Design, Conduct and Analysis. (Oxford University Press, 1982.); Breslow, N. Design and Analysis of Case-Control Studies. Ann Rev Public Health, 3: 29-54 (1982); Weinberg, CR and Wacholder, S. The Design and Analysis of Case-Control Studies with Biased Sampling. Biometrics, 46: 963-975 (1990); Austin, H. et. al. Limitations in the Application of Case-Control Methodology. Epidemiologic Reviews, 16: 65-76 (1994); Greenberg, R.S. and Ibrahim, M.A. The Case-Control Study in Holland, W.W. et. al. (eds) Oxford Textbook of Public Health, Chapter 9 (2d ed. Oxford Un. Press, 1991); the gold standard for determining the effect of an exposure when the exposure is potentially harmful is not the controlled clinical trial but the prospective cohort study. Yet when the effect or disease is rare, a cohort study, with adequate power to detect an appropriate relative risk, would require very sizable samples. The cost of such a study is apt to be prohibitively expensive and so impracticable. As a result, as a practical matter, no gold standard is apt to exist for case-control studies assessing rare exposures and rare potentially harmful effects.
10. Mayes, L.C. et. al. A Collection of 56 Topics with Contradictory Results in Case-Control Research. International J. of Epidemiology, 17: 680-685 (1988); Esdaile, J.M. & Horwitz, R.I. Observational Studies of Cause-Effect Relationships: An Analysis of Methodologic Problems As Illustrated By Conflicting Data for the Role of Oral Contraceptives in the Etiology of Rheumatoid Arthritis. J. Chronic Dis., 39: 841-852 (1986); Demissie, K. et al. Empirical Comparison of the Results of Randomized Controlled Trials and Case-Control Studies in Evaluating the Effectiveness of Screening Mammography. J. Clin. Epidemiol., 52: 81-91 (1998).
11. Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 895 (N.D. Iowa 1982), aff’d, 724 F.2d 613 (8th Cir. 1983) (when the epidemiologic experts use accepted statistical procedures and methods but their opinions differ, then the proper course is to limit the studies and leave the weight of the testimony to the jury; epidemiologic studies prepared by professional, disinterested public officials according to statistical research techniques accepted in the field of epidemiology fall within the public records hearsay exception); Lakie v. Smithkline Beecham, 965 F.Supp. 49 (D.D.C. 1997) (a crucial distinction exists between the admissibility of expert scientific testimony and the weight such testimony should be afforded by the trier-of-fact).
12. Sutera v. Perrier Group of America, Inc., 986 F Supp. 655 (D. Mass. 1997) (motion for summary judgment for defendant because, among other reasons, no epidemiologic evidence links exposure to low levels of benzene to acute myeloid leukemia).
13. Note, Confronting the New Challenges of Scientific Evidence. 108 Harvard Law Review 1481-1605 (1995). Given the same architecture, epidemiologic studies with results of a positive association are more persuasive to a jury than studies with results of no association. As a result, for the defense the preferred strategy is to find a reason to have those epidemiologic studies ruled inadmissible.
14. If a particular trial court is generally averse to excluding proffered evidence before trial, the defense may tactically prefer to wait until trial is underway before challenging the admissibility of proffered epidemiologic evidence under FRE 703. The timing of the challenge helps prevent plaintiff’s experts from tailoring their testimony to circumvent defendants critique of that proffered evidence. That is, had defendants challenged this proffered evidence before trial and that challenge failed owing to the trial court’s judicial philosophy, then plaintiff’s experts would likely have had time, when it was their turn to testify, to adjust their opinions to conform to the evidentiary requirements of FRE 702.
15. Daubert v. Merrell Dow Pharm. Inc., 509 US 579, 113 S Ct 2786, 125 LE 2d 469 (1993).
16. Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 113 S Ct 2786, 125 L Ed 2d 469 (1993); just as an opinion based on a methodology that is inherently unreliable should be inadmissible so should an opinion based on a methodology that, although generally accepted in the scientific community as reliable, is imperfectly executed. E.g., Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F3d 1311 (9th Cir. 1993).
17. E.g., McKendall v. Crown Control Corp., 122 F3d 803 (9th Cir. 1997).
18. See Tyus v. Urban Search Management, 102 F3d 256, 263 (7th Cir. 1996), quoting from Braun v. Lorillard, Inc., 84 F3d 230, 234 (7th Cir. 1996).
19. Searle, J. The Construction of Social Reality. p. 151 (Free Press, 1995).
20. Greenland, S. Concepts of Validity in Epidemiological Research in Holland, W.W. et al. (eds) Oxford Textbook of Public Health, chapter 14 (2d ed. Oxford Un.Press, 1991); Rose, G and Barker, DJP. Epidemiology for the Uninitiated: Repeatability and Validity. Br Med J, 2: 1070-1071 (1979); Lust v. Merrell Dow Pharmaceuticals, Inc., 89 F3d 594 (9th Cir. 1996) (inadmissible under FRE 702 was testimony of epidemiologist who proposed to testify that clomid causes hemifacial microsomia based on his published epidemiological research which was prepared in anticipation of trial but which was not peer-reviewed).
21. Mayes, L.C. et. al. A Collection of 56 Topics with Contradictory Results in Case-Control Research. International J. of Epidemiology, 17:680-685 (1988). Ideally, from the perspective of the defense, no epidemiologist should be allowed to offer an opinion that the exposure is associated with the effect based on the results of a single case-control study. Simply, case-control studies tend to be unreliable. Because case-control studies tend to be unreliable—susceptible to bias and confounders—an epidemiologist should be allowed to offer such an opinion only on the basis of a series or program of case-control studies conducted by different epidemiologists operating independently, when the results of those studies converge to the conclusion that the exposure is positively associated with the effect.
22. Wittgenstein, L. On Certainty 243 (Harper Torchbooks, 1969). For instance, the case-control study, as a methodology, is accounted to have a significant rate of error. Ordinarily quantifying the rate of error requires a gold standard against which to compare the results of the method being assessed. In clinical medicine, the gold standard is the controlled clinical trial. So to quantify the rate of error of case-control studies, an investigator would need to compare the results of a controlled clinical trial with the results of a series of case-control studies on the same topic.
23. Malcolm, N. The Groundlessness of Belief in Malcolm, N. Thought and Knowledge (Cornell, 1977).
24. Id.
25. General Electric Company v. Joiner, 522 US 136, 118 S Ct 152, 139 L Ed 2d 508 (1997).
26. Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 US 579, 113 S Ct 2786, 125 LE2d 469 (1993).
27. Black, B, et. al. Guide to Epidemiology in Black, B & Lee, P.W. (editors) Expert Evidence: A Practitioners’ Guide to Law, Science, and the FJC Manual (West, 1997).
28. Id. at 112.
29. Padgett v. United States, 553 F. Supp. 794 (W.D. Tex 1982) (an economist could not testify about epidemiology).
30. Daubert v. Merrell Dow Pharmaceuticals, Inc. 509 US 59, 113 S Ct. 2786, 125 LEd 2d 469 (1993); Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F3d 1311 (9th Cir. 1995).
31. Walls v. Armour Pharmaceutical Co., 832 F. Supp. 1467 (M.D. Fla. 1993) (plaintiff’s expert was a medical doctor with a specialty in infectious diseases who testified about the significance of results of epidemiologic studies; defendant’s expert was a professor of statistics who testified about the significance of the results of epidemiologic studies).
32. Sutera v. Perrier Group of America, Inc., 986 F. Supp. 655 at 667 (D. Mass 1997) (an oncologist and hematologist with no expertise in epidemiology or biostatistics and with no familiarity of the epidemiologic studies undermining his opinion of causation was ruled not qualified to opine on causation); In re Agent Orange Prod. Liab. Litig., 611 F. Supp. 1223 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2d Cir. 1987) (a medical doctor should be allowed to testify on toxic tort causation only if he can demonstrate knowledge of epidemiology; an expert’s failure to consider the most relevant epidemiologic studies and other possible causes of disease resulted in their proffered testimony being ruled inadmissible); Wells ex rel. Maihafer v. Ortho Pharmaceutical Corp., 615 F. Supp. 262 (N.D. Ga. 1985) aff’d in part and modified, 788 F2d 741 (11th Cir.); cert. denied, 479 US 950 (1986) (plaintiff’s expert replied: “I am sorry sir, I am not a statistician… I don’t understand confidence levels. I never use them. I have to use the author’s conclusions;” “it does not matter in terms of deciding the case that the medical community might require more research and evidence before conclusively resolving the question; what matters is that [the] fact finder found sufficient evidence of causation in a legal sense in this particular case”).
33. Last, J.M. A Dictionary of Epidemiology p. 46 (2d ed. Oxford, 1988); Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 48 (2d ed. Lippincott-Raven, 1998) (“exposure can refer to a behavior [(e.g., needle sharing)], a treatment [(e.g., genotype)], or an exposure in the ordinary sense [(e.g., injection of contaminated blood)].”
34. Consider, for instance, measurement of exposure to electromagnetic fields. What problems arise if all the following modes are used: (1) wire configuration codes; (2) spot or 24-hour measurements of the fields; (3) self reports of use of electrical appliances? Of course, the result would be chaos.
35. Kronmal, R.A. et. al. The Intrauterine Device and Pelvic Inflammatory Disease: The Women’s Health Study Reanalyzed. J. Clin. Epidem., 44: 109-122 (1991).
36. The concept of “analysis of the data” is discussed infra at section VI.
37. Correa, A. et. al. Exposure Measurement in Case-Control Studies: Reported Methods and Recommendations. Epidemiologic Reviews, 16: 18-31 (1994); White, E et. al. Exposure Measurement in Cohort Studies: The Challenges of Prospective Data Collection. Epidemiologic Reviews, 20: 43-56 (1998).
38. MacMahon, B. and Trichopoulos, D. Epidemiology: Principles and Methods, pps. 179-180 (2d ed. Little, Brown & Co. 1996).
39. Gordis, L. Assuring the Quality of Questionnaire Data in Epidemiologic Research. Am. J. of Epidemiology, 109: 21-24 (1979); Olsen, J. Epidemiology Deserves Better Questionnaires. Int’l J. of Epidemiology, 27:935 (1998).
40. Hill, A.B. Observation and Experiment. NEJM, 248: 995-1001 (1953).
41. Colditz, G.A. et. al. Validation of Questionnaire Information on Risk Factors and Disease Outcomes in a Prospective Cohort Study of Women. Am. J. of Epidemiology, 123: 894-900 (1986).
42. Hulka, B.S. et al. Biological Markers in Epidemiology. (Oxford U Press, 1990); Hulka, B.S. & Margolin, B.H., Methodological Issues in Epidemiologic Studies Using Biologic Markers. Am. J. of Epidemiology, 135: 200-298 (1992); Schulte, P.A. A Conceptual Framework for the Validation and Use of Biologic Markers. Environ. Res., 48: 129-144 (1989).
43. Duncan, B.B. and Heiss, G. Nonenzymatic Glycosylation of Proteins – A New Tool for Assessment of Cumulative Hyperglycemia in Epidemiologic Studies, Past and Future. Am. J. of Epidemiology, 120: 169-189 (1984).
44. Correa, A. et. al. Exposure Measurement in Case-Control Studies: Reported Methods and Recommendations. Epidemiologic Reviews, 16: 18-31 (1994); McMichael, AJ. Molecular Epidemiology: New Pathway or New Travelling Companion? Am. J. Epidemiology, 14: 1-11 (1994); Perera F.P. & Weinstein, IB. Molecular Epidemiology and Carcinogen – DNA Addict Detection: New Approaches to Studies of Human Cancer Causation. J. Chronic Disease, 35: 581-600 (1982).
45. Flegal KM, et. al. Differential Misclassification Arising from Nondifferential Errors in Exposure Measurement. Am J Epidemiol, 134: 1233-1244 (1991).
46. Poole, C. Exposure Opportunity in Case-Control Studies. Am J Epidemiol, 123: 352-358 (1986).
47. Explaining what is an “exposure” and what is an “effect” is an exercise in identification through observation and then through classification, or through classification and then through observation; it is a process often presenting the proverbial problem of which came first, the chicken or the egg? Whatever comes first, it is a process that entails a potential for error. The magnitude of error can often be reduced through consensus about what is the exposure or the effect. Hyams, K.C. Developing Case Definitions for Symptom-based Conditions: the Problems of Specificity Epidem. Reviews, 20: 148-156 (1998); Rempel, D. et. al. Consensus Criteria for the Classification of Carpal Tunnel Syndrome in Epidemiologic Studies. Am. J. of Public Health, 88: 1447-1451 (1998); Westbrook, J.I. et. al. Agreement between Medical Record Data and Patient’s Accounts of Their Medical History and Treatment for Dyspepsia. J. Clin. Epidemiology, 51: 237-244 (1998).
48. Wittgenstein, L. Philosophical Investigations ¶ 279 (trans. GEM Anscombe, MacMillan, 1953).
49. Gallie, W.B. Essentially Contested Concepts. Proceedings of the Aristotelian Society, LVI: 167-199 (1955-56); to the dissatisfaction of the defense, clinicians, as they will explain, do not strictly adhere to the classification criteria for a disorder. Some patients who fail to satisfy the criteria will be diagnosed with the disorder because, in the clinician’s judgment, their clinical profile seems most consistent with the disorder. Some patients who satisfy the criteria may not be diagnosed with the disorder because, in the clinician’s judgment, their clinical profile better fits the criteria of another disorder.
50. In re Swine Flu Immunization Prod. Liab. Litig., 508 F.Supp. 897, 903 (D. Colo. 1981), aff’d sub nom. Lima v. United States, 708 F.2d 502 (10th Cir. 1983) (the court critically evaluated a study relied on by an expert whose testimony was stricken because in that study, determination of whether a patient had Guillain-Barré syndrome was made by medical clerks, not physicians who were familiar with diagnostic criteria).
51. Sackett, D.L. et. al. Clinical Epidemiology (2d edition Little, Brown, 1991).
52. Id.
53. Wacholder, S, et. al. Validation Studies Using an Alloyed Gold Standard. Am J Epidemiol, 137: 1251-1258 (1993); Brenner, H. and Savitz, DA. The Effects of Sensitivity and Specificity of Case Selection on Validity, Sample Size, Precision, and Power in Hospital-Based Case-Control Studies. Am J Epidemiol, 132: 181-192 (1990).
54. Kendell, R.E. Clinical Validity. Psychological Medicine, 19: 45-55 (1989).
55. Bloch, D.A. et. al. Statistical Approaches to Classification. Arthritis and Rheumatism, 33: 1137-1144 (1990); Fries, JF et. al. Criteria for Rheumatic Disease. Arthritis and Rheumatism, 37: 454-462 (1994); Altman, R.D. et. al. An Approach to Developing Criteria for the Clinical Diagnosis and Classification of Osteoarthritis; A Status Report of the American Rheumatism Association Diagnostic Subcommittee on Osteoarthritis. The J. of Rheumatology, 10: 180-183 (1983).
56. Rothman, K. J. Induction and Latent Periods. Am J Epidemiol, 14: 253-259 (1981); Rothman, K. J. and Greenland, S. Modern Epidemiology (2d ed. Lippincott-Raven, 1998).
57. Brenner, H and Gefeller, O. Use of Positive Predictive Value to Correct for Disease Misclassification in Epidemiologic Studies. Am J Epidemiol, 138: 1007-1015 (1993).
58. Harvey, M. et. al. Toxic Shock and Tampons: Evaluation of the Epidemiologic Evidence. JAMA, 248: 843 (1982).
59. Gordis, L. Epidemiology, pps. 32-34 (Saunders, 1996); Last, J.M. A Dictionary of Epidemiology, p. 103 (2d ed. Oxford, 1988).
60. MacMahon, B. and Trichopoulos, D. Epidemiology: Principles and Methods, Chpt 4 (2d ed. Little, Brown & Co., 1996).
61. MacMahon, B. and Trichopoulos, D. Epidemiology: Principles and Methods, Chpt 4 (2d ed. Little, Brown & Co., 1996).
62. Flanders, W.D. and O’Brien, T. R. Inappropriate Comparisons of Incidence and Prevalence in Epidemiologic Research. Am. J. Public Health, 79: 1301-1303 (1989); Freeman, J and Hutchison, GB. Prevalence, Incidence and Duration. Am J Epidemiol, 112:707-723 (1980). MacMahon, B and Trichopoulos, D. Epidemiology: Principles and Methods, Chapter 4, (2d ed. Little, Brown & Co., 1996); Feinstein, A.R. and Esdaile, J.M. Incidence, Prevalence, and Evidence. The Am. J. of Medicine, 82: 113-123 (1987); Morgenstern, H. et. al. Measures of Disease Incidence Used in Epidemiologic Research. International J. of Epidemiology, 9: 97-104 (1980); Tapia Granados, J.A. On the Terminology and Dimensions of Incidence. J. Clin. Epidemiology, 50: 891-897 (1997); Elandt-Johnson, RC. Definition of Rates: Some Remarks on Their Use and Misuse. Am J Epidemiol, 102: 267-271 (1975).
63. Burkman, R.T. and The Women’s Health Study. Association Between Intrauterine Device and Pelvic Inflammatory Disease. Obstetrics & Gynecology, 57: 269-276 (1981).
64. Wade-Greaux v. Whitehall Laboratories, Inc., 874 F.Supp. 1441, 1485 (D. Virgin Isl. 1994) (positive epidemiologic findings are, standing alone, insufficient to permit a conclusion that a particular agent is teratogenic, the court discusses relative risk, odds ratio, case-control studies, cohort studies, confounders, and statistical evaluations); Gaul v. United States, 582 F. Supp. 1122, 1125 n. 9 (D. Del. 1984); Marshall RJ. Validation Study Methods for Estimating Exposure Proportions and Odds Ratios with Misclassified Data. J Clin Epidemiol, 43: 941-947 (1990); Godley, P. & Schell, M.J. Adjusted Odds Ratios Under Nondifferential Misclassification: Application to Prostate Cancer. J. Clin. Epidemiol., 52: 129-136 (1999); Tarone, RE. On Summary Estimators of Relative Risk. J Chronic Dis, 34: 463-468 (1981); Wallenstein, S. and Bodian, C. Inferences on Odds Ratios, Relative Risks, and Risk Differences Based on Standard Regression Programs. Am J Epidemiol, 126: 346-355 (1987); Greenland, S. and Engelman, L. Re: “Inferences on Odds Ratios, Relative Risks, and Risk Differences Based on Standard Regression Programs.” Am J Epidemiol, 128: 145 (1988); Chêne, G. and Thompson, SG. Methods for Summarizing the Risk Associations of Quantitative Variables in a Consistent Form. Am J Epidemiol, 144: 610-621 (1996).
65. Zhang, J. and Yu, K.F. What’s the Relative Risk? JAMA, 280: 1690-1691 (1998).
66. Breslow, NE. Odds Ratio Estimators When the Data are Sparse. Biometrika, 68: 73-84 (1981).
67. Siliman, A.J. Epidemiological Studies: A Practical Guide (Cambridge, 1995); Martin DO, and Austin H. Exact Estimates for a Rate Ratio. Epidemiology, 7: 29-33 (1996).
68. Last, J.M. A Dictionary of Epidemiology (2d ed. Oxford, 1988)
69. Gordis, L. Epidemiology, pps. 148-149 (Saunders, 1998).
70. Greenland, S. Interpretation and Choice of Effect Measures in Epidemiologic Analyses. Am J Epidemiol, 125: 761-768 (1987); Newman, SC. Odds Ratio Estimation in a Steady-State Population. J Clin Epidemiol, 41: 59-65 (1988); Schouten, EG, et. al. Risk Ratio and Rate Ratio Estimation in Case-Cohort Designs. Stat Med, 12: 1733-1745 (1993); Siegel, DG and Greenhouse, SW. Validity in Estimating Relative Risk in Case-Control Studies. J Chronic Dis, 26: 219-225 (1973).
71. Greenland, S. Thomas, DC, and Morgenstern, H. The Rare-Disease Assumption Revisited. A Critique of “Estimators of Relative Risk for Case-Control Studies.” Am J Epidemiol, 124: 869-883 (1986); Greenland, S. and Thomas, DC. On the Need for The Rare Disease Assumption in Case-Control Studies. Am J Epidemiol, 116: 547-553 (1982).
72. Johnston v. United States, 597 F.Supp. 374 (D. Kan. 1984) (attributable Risk Calculations [or probability of causation calculation] “while this is a proper mathematical formula for calculating the probability of events which have happened, and if well founded, * * * may be of some interest as regards the risk assessments relating to any exposure, its results are only as valid as the assumptions which go into it”); Whiting v. Boston Edison Co., 891 F. Supp. 12 (D. Mass 1995) (“excess risk is calculated by epidemiologists in the form of a ratio derived by dividing the number of cases of a disease observed within a defined group by the number of cases expected in the general population, epidemiologists generally agree that excess risks of less than 50% are difficult to interpret causally”); Whittemore, AS. Statistical Methods for Estimating Attributable Risk from Retrospective Data. Stat Med, 1: 229-243 (1982); Coughlin, SS, Benichou, J. and Weed, DL. Attributable Risk Estimation in Case-Control Studies. Epidemiol Rev, 16: 51-64 (1994); Cole, P. and MacMahon, B. Attributable Risk Percent in Case-Control Studies. Br J Prev Soc Med, 25: 242-244 (1971); Cole, P. and MacMahon, B. Attributable Risk Percent in Case-Control Studies. Br J Prev Soc Med, 25: 242-244 (1971); Greenland, S. and Robins, J. Conceptual Problems in the Definition and Interpretation of Attributable Fractions. Am J Epidemiol, 128: 1185-1197 (1988). Walter, SD. The Estimation and Interpretation of Attributable Risk in Health Research. Biometrics, 32: 829-849 (1976); Whittemore, AS. Estimating Attributable Risk from Case-Control Studies. Am J Epidemiol, 117: 76-85 (1983).
73. Rothman, K.J. and Greenland, S. Modern Epidemiology (2d ed. Lippincott-Raven, 1998).
74. Id. 24
75. E.g., Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F3d 1311 (9th Cir. 1995).
76. Angell, M. The Interpretation of Epidemiologic Studies. (Editorial) New England J. of Medicine, 323: 823-825 (1990); Taubes, G. Epidemiology Faces its Limits. Science, 269: 164-169 (1995); Greenberg, R.S. and Ibrahim, M.A. The Case-Control Study in Holland, W.W. et al. (eds) Oxford Textbook of Public Health, chapter 9 at p. 130 (2d ed. Oxford Un. Press, 1991).
77. Schlesselman, J.J. Case-Control Studies: Design, Conduct and Analysis. (Oxford Un. Press, 1982); Gordis, L. Epidemiology (Saunders, 1996).
78. Grassis v. Johns-Manville Corp., 591 A 2d 671, 675 (N.J. Super. Ct. App. Div. 1991); Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 SW 2d 706 (Tex. 1997); Wickramaratne, P.J. and Holford, T. R. Confounding in Epidemiologic Studies: The Adequacy of the Control Group as a Measure of Confounding. Biometrics, 43: 751-765 (1987); Rothman, K.J. and Greenland, S. Modern Epidemiology (2d ed. Lippincott – Raven, 1998); Miettinen, OS and Coo, EF. Confounding: Essence and Detection. Am J Epidemiol, 114: 593-603 (1981).
79. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 122 (Lippincott-Raven, 1998).
80. Greenland, S. and Morgenstern, H. Ecological Bias, Confounding, and Effect Modification. Int J Epidemiol, 18:269-274 (1989); Thompson, WD. Effect Modification and the Limits of Biological Inference from Epidemiologic Data. J Clin Epidemiol, 44: 221-232 (1991).
81. Koopman, JS. Causal Models and Sources of Interaction. Am J Epidemiol, 106: 439-444 (1977); Kupper, LL and Hogan, MD. Interaction in Epidemiologic Studies. Am J Epidemiol, 106: 447-453 (1978); Rothman KJ, et. al. Concepts of Interaction. Am J Epidemiol 112: 467-470 (1980); Greenland, S. Elementary Models for Biological Interaction. J Hazardous Materials, 10: 449-454 (1985).
82. In Re TMI Litigation Cases Consol. II, 922 F.Supp. 1038 (M.D. Pa. 1996) (an expert’s failure to include a discussion of the design of an epidemiologic study and particularly the selection criteria for the subjects studied creates an enormous potential rate of error and results in the proffered evidence being inadmissible under FRE 702); Valentine v. Pioneer Chlor Alkali Co., 921 F. Supp. 666 (D. Nev. 1996) (association is defined; relative risk is defined, and attributable proportion of relative risks is defined, cohort study defined, case-control study defined, epidemiologic study ruled inadmissible because author failed to control for important bias and confounders); Padgett v. United States, 553 F.Supp. 794 (W.D. Tex. 1982) (in evaluating causation, epidemiologists must exclude three alternative explanations: chance, confounding and bias); Sackett, DL. Bias in Analytic Research. J Chronic Dis, 32: 51-63 (1979); Feinleib, M. Biases and Weak Associations. Preventive Medicine, 16: 150-164 (1987); Kopec, J.A. and Esdaile, J.M. Bias in Case-Control Studies: A Review. Journal of Epidemiology and Community Health, 44: 179-186 (1990).
83. In re “Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 783 (E.D.N.Y. 1985), aff’d, 818 F.2d 145 (2d Cir. 1987) (the court expressed concern about selection bias); Gaul v. United States, 582 F.Supp. 1122, 1125 n. 9 (D. Del. 1984) (epidemiologist analyzes the epidemiologic data and concludes that an association is due to selection bias; “relative risk describes the relationship between the risk of an occurrence, such as contracting a disease in a population exposed to a certain stimulus, and the risk of occurrence in a population not exposed to the stimulus, it is the ratio of the former risk to the latter”); Austin H, Flanders WD, and Rothman KJ. Bias Arising in Case-Control Studies from Selection of Controls from Overlapping Groups. Int J Epidemiol, 18:713-716 (1989); Lubin, JH and Gail, MH. Biased Selection of Controls for Case-Control Analyses of Cohort Studies. Biometrics, 40:63-75 (1984); Robins, JM and Pike, M. The Validity of Case-Control Studies with Non-Random Selection of Controls. Epidemiology, 1: 273-284 (1990); Robins, J and Pike, M. The Validity of Case-Control Studies with Nonrandom Selection of Controls. Epidemiology, 1: 273-284 (1990); Wacholder S, et. al. Blind Assignment of Exposure Does to Prevent Differential Misclassification. Am J Epidemiol, 134: 433-437 (1991); Wacholder S, et. al. Selection of Controls in Case-Control Studies, III: Design Options. Am J Epidemiol, 135: 1042-1050 (1992); Wacholder S, et. al. Selection of Controls in Case-Control Studies, II: Types of Controls. Am J Epidemiol, 135: 1029-1041 (1992); Wacholder S, et. al. Selection of Controls in Case-Control Studies, I: Principles. Am J Epidemiol, 135: 1019-1028 (1992); Flanders, WD and Austin, H. Possibility of Selection Bias in Matched Case-Control Studies Using Friend Controls. Am J Epidemiol, 124 : 150-153 (1986); Lasky, T and Stolley, PD. Selection of Cases and Controls. Epidemiol Rev, 16: 6-17 (1994); Lubin, JH and Hartge, P. Excluding Controls: Misapplications in Case-Control Studies. Am J Epidemiol, 120: 791-793 (1984); Miettinen, OS. The “Case-Control” Study: Valid Selection of Subjects. J Chronic Dis, 38: 543-548 (1985); Paltiel, O. et al. Two-Way Referral Bias: Evidence from a Clinical Audit of Lymphoma in a Teaching Hospital. J. Clin. Epidemiol., 51: 93-98 (1998).
84. Walter, SD. Berkson’s Bias and its Control in Epidemiologic Studies. J Chronic Dis, 33: 721-725 (1980).
85. Neyman, J. Statistics—Servant of all Sciences. Science, 122: 401 (1955); Sackett, D.L. Bias in Analytic Research. J. Chron. Dis., 32: 51-63 (1979).
86. MacMahon, B. and Trichopoulos, D. Epidemiology: Principles & Methods, p. 190-192 (Little, Brown & Co., 1996).
87. Harvey, M. et. al. Toxic Shock and Tampons: Evaluation of Epidemiologic Evidence. JAMA, 248: 840-846 (1982)
88. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 120 (Lippincott-Raven, 1998).
89. Criqui, M.H. Response Bias and Risk Ratios in Epidemiologic Studies. Am J. Epidemiol, 109: 394-399 (1979); Greenland, S. Response and Follow-up Bias in Cohort Studies. Am J. Epidemiol., 106: 184-187 (1977).
90. Wynder, E. L. Investigator Bias and Interviewer Bias: The Problem of Reporting Systematic Error in Epidemiology. J. Clin. Epidemiology, 47: 825-827 (1994).
91. Wacholder, S. et. al. Blind Assignment of Exposure Does Not Always Prevent Differential Misclassification. Am J of Epidemiol, 134: 433-437 (1991); Barron, BA. The Effects of Misclassification on the Estimation of Relative Risk. Biometrics, 33: 414-418 (1977); Copeland, KT, et. al. Bias Due to Misclassification in the Estimate of Relative Risk. Am J Epidemiol, 105: 488-495 (1977); Dosemeci M, et. al. Does Nondifferential Misclassification of Exposure Always Bias a True Effect toward the Null Value? Am J Epidemiol, 132: 746-749 (1990); Drews, CD and Greenland, S. The Impact of Differential Recall on the Results of Case-Control Studies. Int J Epidemiol, 19: 1107-1112 (1990); Gullen, WH, et. al. Effects of Misclassification in Epidemiologic Studies. Public Health Rep, 53: 1956-1965 (1968).
92. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 127 (Lippincott-Raven, 1998).
93. Rose, G and Barker, DJP. Epidemiology for the Uninitiated. Observer and Variation. Br Med J, 2 : 1006-1007 (1978).
94. Greenland, S and Neutra, R. An Analysis of Detection Bias and Proposed Corrections in the Study of Estrogens and Endometrial Cancer. J Chronic Dis, 34: 433-438 (1981).
95. Brock v. Merrell Dow Pharmaceuticals, Inc., 874 F.2d 307, 311-12 (5th Cir. 1989), cert. denied, 494 U.S. 1046 (1990) (discussion of recall bias among women who bear children with birth defects); Coughlin, S.S. Recall Bias in Epidemiologic Studies. J. Clin. Epidemiology, 43: 87-91 (1990); Drews, CD, Kraus, JF and Greenland, S. Recall Bias in a Case-Control Study of Sudden Infant Death Syndrome. Int J Epidemiol, 19: 405-411 (1990).
96. Swan S, et. al. J. Reporting and Selection Bias in Case-Control Studies of Congenital Malformations. Epidemiology, 3: 356-363 (1992).
97. Greenland, S. Response and Follow-Up Bias in Cohort Studies. Am J Epidemiol, 106: 184-187 (1977).
98. Kelly v. American Heyer-Schulte Corp., 957 F.Supp. 873 (W.D. Tex. 1997) (holding that it is contrary to scientific method to rely upon an epidemiologic study with a weak association and a low level of statistical significance and whose results are apt to be influenced by confounders); Stellman, S.D. Confounding. Preventive Medicine, 16: 165-182 (1987); Schlesselman, JJ. Assessing Effects of Confounding Variables. Am J Epidemiol, 108: 3-8 (1978); Greenland, S and Robins, JM. Confounding and Misclassification. Am J Epidemiol, 122: 495-506 (1985); Savitz, DA and Baron, AE. Estimating and Correcting for Confounder Misclassification. Am J Epidemiol, 129: 1062-1071 (1989); Smith, PG and Day, NE. The Design of Case-Control Studies: the Influence of Confounding and Interaction Effects. Int J Epidemiol, 13: 356-365 (1984);
Yanagawa, T. Case-Control Studies: Assessing the Effect of a Confounding Factor. Biometrika, 71: 191-194 (1984).
99. Hutchison, GB. and Rothman, KJ. Correcting a Bias? N Engl J Med, 299: 1129-1130 (1978); Rothman, K.J. and Greenland, S. Modern Epidemiology (2d ed. Lippincott-Raven 1998); MacMahon, B and Trichopoulos, D. Epidemiology: Principles and Methods (2d ed. Little, Brown & Co. 1996);
100. Weinberg, CR and Sandler, DP. Randomized Recruitment in Case-Control Studies. Am J Epidemiol, 134: 421-432 (1991).
101. Rothman, K.J. and Greenland, S. Modern Epidemiology. pp. 143-144 (Lippincott-Raven, 1998).
102. Greenland, S and Kleinbaum, DG. Correcting for Misclassification in Two-Way Tables and Matched-Pair Studies. Int J Epidemiol, 12:93-97 (1983); Miettinen OS. Matching and Design Efficiency in Retrospective Studies. Am J Epidemiol, 91: 111-118 (1970); Karon, J.M. and Kupper, L.L. In Defense of Matching. Am J Epidemiol, 116: 852-866 (1982); Greenland, S. The Effect of Misclassification in Matched-Pair Case-Control Studies. Am J Epidemiol, 116: 402-406 (1982).
103. Brookmeyer, R and Liana, KY, Linet, M. Matched Case-Control Designs and Overmatched Analyses. Am J Epidemiol, 124: 693-701 (1986).
104. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 201 (Lippincott-Raven, 1998).
105. Pike, MC et. al. Bias and Efficiency in Logistic Analyses of Stratified Case-Control Studies. Int J Epidemiol, 9: 89-95 (1980).
106. Cochran, WG. The Effectiveness of Adjustment by Subclassification in Removing Bias in Observation Studies. Biometrics, 24: 295-313 (1968); Day, NE, Byar, DP and Green, SB. Overadjustment in Case-Control Studies. Am J Epidemiol, 112: 696-706 (1980).
107. Kahn, H.A. and Sempos, C.T. Statistical Methods in Epidemiology, pp. 87-105 (Oxford, 1989).
108. Greenberg, R.S. and Ibrahim, M.A. The Case-Control Study in Holland, W.W. et. al. (eds) Oxford Textbook of Public Health (2d edition Oxford, 1991).
109. Greenland, S. Limitations of the Logistic Analysis of Epidemiologic Data. J. of Epidemiol, 110: 693-698 (1979); Rosner, B. et. al. Correction of Logistic Regression Relative Risk Estimates and Confidence Intervals for Measurement Error: The Case of Multiple Covariates Measured with Error. Am. J. Epidemiol, 132: 734-745 (1990).
110. Sempos, C.T. et. al. The Influence of Cigarette Smoking on the Association Between Body Weight and Mortality. The Framingham Heart Study Revisited. Am J Epidemiol, 8: 289-300 (1998); Flegal, K.M. Deja Vu All Over Again: The Re-Analysis of Epidemiologic Data. (Editorial) Am J Epidemiol, 8: 286-288 (1998).
111. Cohen, A.J. Replication. Epidemiology, 8: 341-343 (1997).
112. Re-analysis of epidemiologic data often occurs in anticipation of or during litigation. It usually is neither published nor peer-reviewed. As a result, it is likely to be ruled inadmissible under rules such as FRE 702 and under holdings such as Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 US 59, 113 S Ct. 2786, 125 LEd 2d 469 (1993); but see, Daubert v. Merrell Dow Pharmaceuticals, Inc. 43 F3d 1311, 1320 (9th Cir. 1995); Lynch v. Merrell-National Laboratories, 646 F.Supp. 856 (D. Mass 1986) (plaintiffs epidemiologic evidence is a re-analysis of epidemiologic studies, the court could not accept result-oriented re-analysis of epidemiological studies and criticisms of other’s methodology as reliable data upon which to base an opinion on causation, plaintiffs cannot rely on criticisms of the defendant’s studies to establish causation); Lynch v. Merrell-National Laboratories, 830 F2d 1190 (1st Cir. 1987) (a reanalysis of an epidemiologic study by plaintiffs’ epidemiologist was legally insufficient owing to selection bias and its failure to be published and subject to peer review).
113. DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F2d 941, 955 (3d Cir 1990).
114. Howson, C. Theories of Probability. Brit. J. Phil. Sci, 46 : 1-32 (1995).
115. Von Mises, R. Probability, Statistics and Truth (George Allen & Unwin, 1939); Von Mises, R. Mathematical Theory of Probability and Statistics (Academic Press, 1964).
116. Bayes, T. “An Essay Towards Solving a Problem in the Doctrine of Chances. Philosophical Transactions of the Royal Society, 53: 370-418 (1763); deFinetti, B. Theory of Probability (Wiley, 1974); Ramsey, F.P. Truth and Probability in D.H. Mellor (ed.) Philosophical Papers (Cambridge Univ. Press, 1990); Howson, C. and Urbach P. Scientific Reasoning: the Bayesian Approach (2nd ed. Open Court, 1993).
117. Fisher, L.D. and Van Bell, G. Biostatistics, p. 108 (Wiley, 1993); Rothman, K.J. and Greenland, S. Modern Epidemiology, pps. 186-187 (Lippincott-Raven, 1998); Glantz, S. Primer of Biostatistics, pps. 105, 160-161 (3d edition McGraw Hill, 1992).
118. Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 SW 2d 706 (Tex. 1997) (significance testing requires use of alpha of .5).
119. Fisher, L.D. and Van Belle, G. Biostatistics, p. 108 (Wiley, 1993); Rothman, K.J. and Greenland, S. Modern Epidemiology, pps. 186-187 (Lippincott-Raven, 1998); Glantz, S. Primer of Biostatistics, pps. 160-161 (3d edition McGraw Hill, 1992).
120. Glantz, S.A. Primer of Biostatistics. pps. 161-165 (3d ed. McGraw Hill, 1992); Freiman, J.A. et. al. The Importance of Beta, The Type II Error, and Sample Size in the Design and Interpretation of the Randomized Controlled Trial in Bailar, J.C. and Mostellar, F. (eds) Medical Uses of Statistics. Chapter 19 (2d ed. NEJM Books, 1992).
121. Kelly v. American Heyer-Schulte Corp., 957 F. Supp. 873 (W.D. Tex. 1997) (holding that it is contrary to scientific method to rely upon an epidemiologic study with a weak association and a low level of statistical significance and whose results are apt to be influenced by confounders); In re TMI Litigation Cases Consol. II, 922 F.Supp. 997, 1016 (M.D. Pa. 1996) (significance testing in nonexperimental settings is a matter that goes to the weight of the evidence); Christophersen v. Allied-Signal Corp., 902 F2d 362 (1990), re’vd, cert. denied, 503 US 912 (1992) (plaintiffs need not have statistically significant studies to establish causation); Thompson, WD. Statistical Criteria in the Interpretation of Epidemiologic Data. Am J Public Health, 77: 191-194 (1987); Cox, DR. Regression Models and Life Tables (with discussions). J R Stat Soc B, 34: 187-220 (1972); Clayton, D. and Hills, M. Statistical Models in Epidemiology. (Oxford University Press, 1993); Breslow, NE. and Day, NE. Statistical Methods in Cancer Research. Vol II: The Design and Analysis of Cohort Studies. (IARC, 1987).
122. Mann, P.S. Introductory Statistics, pps. 432-454 (2d ed. Wiley, 1995).
123. Mantel, N. and Haenszel, WH. Statistical Aspects of the Analysis of Data from Retrospective Studies of Disease. J Nat’l Cancer Inst, 22: 719-748 (1959); Mantel, N. and Fleiss, JL. Minimum Expected Cell Size Requirements for the Mantel-Haenszel One-Degree-of-Freedom Test and a Related Rapid Procedure. Am J Epidemiol, 112: 129-134 (1980); Mantel, N. Chi-Square Tests with One Degree of Freedom: Extensions of the Mantel-Haenszel Procedure. J Am Stat Assoc, 58: 690-700 (1963).
124. Fleiss, JL. Statistical Methods for Rates and Proportions. (Wiley, 1973.); Yates, F. Contingency Tables involving Small Numbers and the Chi-Square Test. J R Stat Soc Suppl, 1: 217-235 (1934).
125. Fleiss, J.L. Significance Tests Have a Role in Epidemiologic Research: Reactions to A.M. Walker. Am. J. Public Health, 76: 559-560 (1986).
126. Ware, JH, et. al.. P Values. In: Medical Uses of Statistics. (NEJM Books, 1986.)
127. Goodman, SN. P Values, Hypothesis Tests, and Likelihood: Implications for Epidemiology of a Neglected Historical Debate. Am J Epidemiol, 137: 485-496 (1993); Gardner, MA. and Altman, DG. Confidence Intervals Rather than P Values: Estimation Rather than Hypothesis Testing. BMJ, 292:746-750 (1986); Rothman, K.J. Significance Questing. (Editorial) Am. Journal of Internal Medicine, 105: 445-447 (1986).
128. Kelly v. American Heyer-Schulte Corp., 957 F.Supp. 873 (W.D. Tex. 1997) (holding that a two-tailed significance test is required for epidemiologic studies).
129. Turpin v. Merrell Dow Pharmaceuticals, Inc., 959 F2d 1349 (6th Cir. 1992) (the concept of confidence interval is explained).
130. Breslow, NE and Day, NE. Statistical Methods in Cancer Research. Vol. I: The Analysis of Case-Control Data. (IARC, 1980).
131. Glantz, S.A. Primer of Biostatistics p. 198 (3d ed McGraw Hill, 1992); Rothman, K.J. and Greenland, S. Modern Epidemiology, pps. 189-190 (Lippincott-Raven, 1998).
132. Id.
133. Id. at 195
134. When the data are stratified on the basis of potential confounders, then the association between exposure and effect will be assessed in each strata. If the association is the same in each strata–that is, none of the presumed confounders are in fact confounders – the strata or data are considered “homogenous.” When the strata or data are homogenous, a summary measure of these stratum-specific associations can be obtained. The most popular statistical technique for obtaining this summary measure is that devised by Mantel and Haenszel. This summary measure is a weighted average of the stratum-specific values, and is called the “Mantel-Haenszel summary odds ratio.” Mantel, N. and Haenszel, W. H. Statistical Aspects of the Analysis of Data from Retrospective Studies of Disease. J. Nat’l Cancer Inst., 22: 719-748 (1959); Mantel, N. Chi-Square Tests With One Degree of Freedom: Extensions of the Mantel-Haenszel Procedure. J. Am. Stat. Assoc., 58: 690-700 (1963).
135. Turpin v. Merrell Dow Pharmaceuticals, Inc., 959 F2d 1349 (6th Cir 1992) (the concept of “power” is explained); Rosenbaum, PR. Case Definition and Power in Case-Control Studies. Stat Med, 3: 27-34 (1984); Smith, AH and Bates, M. Confidence Limit Analyses Should Replace Power Calculations in the Interpretation of Epidemiologic Studies. Epidemiology, 3: 449-452 (1992).
136. Freiman JA, et. al. The Importance of Beta, The Type II Error and Sample Size in the Design and Interpretation of the Randomized Control Trial: Survey of 71 “Negative” Trials. N Engl J Med, 299: 690-694 (1978); Greenland, S. Power, Sample Size, and Smallest Detectable Effect Determination for Multivariate Studies. Stat Med, 4: 117-127(1985); Greenland, S. On Sample-Size and Power Calculations for Studies Using Confidence Intervals. Am J Epidemiol, 128: 231-237 (1988); Walter, SD. Determination of Significant Relative Risks and Optimal Sampling Procedures in Prospective and Retrospective Comparative Studies of Various Sizes. Am J Epidemiol, 105: 387-397 (1977).
137. Howe GR and Choi BCK. Methodological Issues in Case-Control Studies: Validity and Power of Various Design/Analysis Strategies. Int J Epidemiol, 12: 238-245 (1983).
138. Schlesselman JJ. Sample Size Requirements in Cohort and Case-Control Studies of Disease. Am J Epidemiol, 99: 381-384 (1974).
139. Greenberg, R.S. and Ibrahim, M.A. The Case-Control Study in Holland, W.W. et. al. Oxford Textbook of Public Health, p. 129 (Oxford U. Press, 1991); MacMahon, B. and Trichopoulos, D. Epidemiology: Principles and Methods p. 252 (2d ed. Little, Brown & Co., 1996).
140. For instance, if both the exposure and the effect are rare, only a incredibly large prospective cohort study would have the power needed to assess whether the exposure is associated with the effect. But the cost of a cohort study that large may be prohibitive or the period of its follow up may be incredibly long, thereby making such a study impracticable. In that event, an epidemiologist may resort to “meta-analysis.” In re Paoli R.R. Yard PCB Litig. 916 F.2d 829, 856-57 (3d Cir. 1990), cert denied, 499 U.S. 461 (1991) (the court discussed the use of admissibility of meta-analysis as a scientific technique); Tobin v. Astra Pharmaceutical Prods., Inc., 993 F.2d 528, 538-39 (6th Cir. 1992), cert. denied, 114 S. Ct. 304 (1993) (identifying an error in the performance of a meta-analysis, in which the Food and Drug Administration (FDA) pooled data from control groups in different studies in which some gave the control a placebo and others gave the control group an alternative treatment); Dickerson, K. and Berlin, J.A. Meta-analysis: State-of-the-Science. Epidemiologic Reviews, 14: 154-176 (1992); Einarson, T.R. et. al. A Method for Meta-Analysis of Epidemiologic Studies. Drug Intell. Clin. Pharm., 22: 813-824 (1988); L’Abbé, KA, Detsky, AS and O’Rourke, K. Meta-Analysis in Clinical Research. Ann Intern Med, 107: 224-233 (1987); Dickersin, K. and Berlin, JA. Meta-Analysis: State-of-the-Science. Epidemiol Rev, 14: 154-176 (1992); DerSimonian, R. and Laird, N. Meta-Analysis in Clinical Trials. Control Clin Trials, 7: 177-188 (1986).
141. Oklin, I. Statistical and Theoretical Considerations in Meta-Analysis. J. Clin. Epidemiology, 48: 133-146 (1995).
142. Thompson, SG. and Pocock, SL. Can Meta-Analyses Be Trusted? Lancet, 338: 1127-1130 (1991); Shapiro, S. Meta-Analysis/ smeta-Analysis. Am J Epidemiol, 140: 771-778 (1994); Fleiss, J.L. and Gross, A.J., Meta-analysis in Epidemiology, with Special Reference to Studies of the Association between Exposure to Environmental Tobacco Smoke and Lung Cancer. A Critique. J. Clin. Epidemiology, 44: 127-139 (1991); Greenland, S. A Critical Look at Some Popular Meta-Analytic Methods. Am J Epidemiol, 140: 290-296 (1994).
143. Dear, KBG and Begg, CB. An Approach for Assessing Publication Bias Prior to Performing a Meta-Analysis. Stat Sci, 7: 237-245 (1992); Begg, CB. and Berlin, JA. Publication Bias: a Problem in Interpreting Medical Data. J R Stat Soc A,151:419-463 (1988); Dickersin, K. The Existence of Publication Bias and Risk Factors for its Occurrence. JAMA, 263: 1385-1389 (1990).
144. LeLorier, J. et. al. Discrepancies between Meta-Analysis and Subsequent Large Randomized, Controlled Trials. NEJM, 337: 536-542 (1997).
145. Slavin, R.E. Best Evidence Synthesis: An Intelligent Alternative to Meta-Analysis. J. Clinical Epidemiology, 48: 9-18 (1995); Spitzer, W.O. The Challenge of Meta-Analysis. J. Clinical Epidemiology, 48: 1-4 (1995); Greenland, S. Can Meta-Analysis be Salvaged? Am J Epidemiol, 140: 783-787 (1994).
146. Ethyl Corp. v. United States Envtl. Protection Agency, 541 F.2d 1, 28 n. 58 (D.C. Cir.), cert. denied 426 U.S. 941 (1976).
147. Allen v. United States, 588 F. Supp.247, 417 (D. Utah 1984), rev’d on other grounds, 816 F2d 1417 (10th Cir. 1987), cert. denied, 484 US 1004 (1988) (“the cold statement that a given relationship is not “statistically significant” cannot be read to mean “there is no probability of relationship,” whether a correlation between a cause and a group of effects is more likely that not-particularly in a legal sense – is a different question from that answered by tests of statistical significance, which often distinguish narrow differences in degree of probability”); Turpin v. Merrell Dow Pharmaceuticals, Inc., 959 F2d 1349 (6th Cir. 1992) (confidence intervals are not the same as the preponderance of the evidence standard of proof).
148. Hoffman v. Merrell Dow Pharmaceuticals, Inc., 857 F2d 290 (6th Cir. 1988), cert. denied, 488 US 1006 (1989) (describes “substantial factor” and “but for” criteria for proving legal causation).
149. Restatement (Second) of the Law of Torts §§ 431-433 (ALI, 1965).
150. Lewis, D. Causation. J Philos, 70: 556-567 (1973); Sosa, E & Tooley, M (ed) Causation (Oxford, 1993); Rizzi, DA. and Pedersen, SA. Causality in Medicine: Towards a Theory and Terminology. Theor Med,13: 233-254 (1992); Greenland, S. et. al. Causal Diagrams for Epidemiologic Research. Epidemiology, 10: 37-48 (1999).
151. Rothman, KJ. Causes. Am J Epidemiol, 104: 587-592 (1976); Krieger, N. Epidemiology and The Web of Causation: Has Anyone Seen the Spider? Am J Epidemiol, 39: 887-903 (1994).
152. Pear, N. White Swans, Black Ravens, and Lame Ducks: Necessary and Sufficient Causes in Epidemiology. Epidemiology, 1: 47-50 (1990).
153. Mackie, J.L. The Cement of the Universe: A Study of Causation (Oxford, 1980); Rothman, K.J. and Greenland, S. Modern Epidemiology chapter 2 (2d ed Lippincott-Raven, 1998).
154. Lewis, D. Counterfactuals (Oxford, 1973)
155. Raynor v. Merrell Pharmaceuticals, Inc., 104 F3d 1371, 1376 (D.C. Cir. 1997) (discussing the distinction between general and specific causation); Casey v. Ohio Medical Products, 877 F. Supp. 1380 (N.D. Cal. 1995) (“the term causation has two meanings…, the first is general causation…, the second is specific causation….”); Thomas v. Hoffman-LaRoche Inc., 731 F.Supp. 224, 226 (N.D. Miss 1989) (while experts testified that ingestion of Accutane caused plaintiffs’ seizures, they lacked epidemiological data or studies to support their opinions); In re Agent Orange Prod. Liab. Litig., 611 F.Supp. 1223, 1243 (E.D.N.Y. 1985), aff’d 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S. 1234 (1988) (“in a mass tort case such as Agent Orange, epidemiologic studies on causation assume a role of critical importance”); Lee v. Richardson–Merrell, Inc., 772 F.Supp. 1027 (W.D. Tenn. 1991) (fatal to plaintiff’s Benedectin case was lack of supportive epidemiologic evidence in the face of many non-supportive epidemiologic evidence); Graham v. Playtex Products, Inc., 993 F.Supp. 127 (NDNY 1998) (ruling that opinion of plaintiff’s experts on general causation was admissible despite the absence of epidemiologic evidence supporting the opinion); Benedi v. McNeil–P.P.C. Inc., 66 F3d 1378 (4th Cir. 1995) (under the Daubert standard, epidemiologic studies are not necessarily required to prove causation; as long as the methodology employed by the expert in reaching his or her conclusion is sound); In re Breast Implant Litigation, II, F. Supp. 2d 1217 (D. Colo. 1998) (the most important evidence relied upon by scientists to determine whether an agent causes disease is controlled epidemiologic studies; epidemiological studies are necessary to determine the cause and effect between breast implants and allegedly associated diseases); Bowers v. Northern Telecom Inc., 905 F.Supp. 1004, 1010 (N.D. Fla. 1995) (“a cause-effect relationship need not be clearly established by . . . epidemiological studies before a doctor can testify that, in his opinion, such a relationship exists”); Grimes v. Hoffman-LaRoche Inc., 907 F.Supp. 33, 35 n.2 (D. N.H. 1995) (“no epidemiological studies have been done which establish any relationship between Accutane and cataracts, and [plaintiff’s causation expert] did not contend that causation can be proved by anecdotal evidence alone”); Sanders, J. From Science to Evidence: The Testimony on Causation in the Bendectin Cases, 46 Stanford Law Review 1-86 (1993); Bert Black & David Lilienfeld, Epidemiologic Proof in Toxic Tort Litigation, 52 Fordham L. Rev. 732 (1984); Vincent M. Brannigan, et. al., Risk, Statistical Inference, and the Law of Evidence: The Use of Epidemiological Data in Toxic Tort Cases, 12 Risk Analysis 343 (1992); Michael Dore, A Commentary o the Use of Epidemiological Evidence in Demonstrating Cause-in-Fact, 7 Harv. Envtl. L. Rev. 429 (1983); Note, Causation in Toxic Torts: Burdens of Proof, Standards of Persuasion and Statistical Evidence, 96 Yale L.J. 376 (1986).
156. Note, Proof of Cancer Causation in Toxic Waste Litigation: The Case of Determining Versus Indeterminacy, 61 Cal. L. Rev. 2075 (1988).
157. Robinson v. United States, 533 F. Supp. 320 (E.D. Mich. 1982) (epidemiologic evidence cannot establish specific causation; “at most, one can examine statistical correlation and then, within a chosen interval of error, determine whether GBS is more likely than not associated with the swine flu vaccine in a particular period after receipt of the vaccination,” “statistical evidence cannot establish cause and effect”); Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561 (N.D. Ga. 1991) (epidemiology is based on the study of populations not individuals); DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941, 945 & n.6 (3d Cir. 1990) (“epidemiological studies do not provide direct evidence that a particular plaintiff was injured by exposure to a substance”); Merrell Dow Pharmaceuticals, Inc. v. Havner, 953 SW 2d 706 (Tex. 1997) (epidemiologic studies cannot establish specific causation).
158. Rothman, KJ (ed.) Causal Inference. (Epidemiology Resources, 1988); Susser, M. What is a Cause and How Do We Know One? A Grammar for Pragmatic Epidemiology. Am J Epidemiol, 133: 635-648 (1991); Weed, DL. On the Logic of Causal Inference. Am J Epidemiol, 123: 965-979 (1986); Greenland, S. Probability Logic and Probabilistic Induction. Epidemiology, 9: 322-332 (1998). An argument can be made that epidemiologists cannot testify that an exposure generally causes an effect unless the epidemiologist is an expert in the potential biological mechanisms that would plausibly account for that exposure causing that effect; without that expertise, all the epidemiologist is qualified to discuss is that the exposure is or is not associated with the effect.
159. Maclure, M. Popperian Refutation in Epidemiology. Am. J. Epidemiol., 121: 343-350 (1985). This process of deduction is often called the “hypothetico-deductive” method. This method blends both the processes of induction and deduction. An initial hypothesis leads by the process of deduction to certain testable consequences. When these consequences are tested and the data from the test fail to support the deduced consequence, the initial hypothesis is modified by the process of induction. The process is then repeated.
160. Kelly v. American Heyer-Schulte Corp., 957 F.Supp. 873 (W.D. Tex. 1997) (holding that an epidemiologic study is inadmissible on the issue of causation unless it satisfies the Koch-Henle postulates); Evans, A.S. Causation and Disease: The Henle-Koch Postulates Revisited. The Yale Journal of Biology and Medicine, 49:175-195 (1976).
161. Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561 (N.D. Ga. 1991) (there are five criteria used to assess causation: (1) consistency of association; (2) strength of association; (3) specificity of association; (4) temporal relationship of association; and (5) coherence of association); Evans, A.S. Causation and Disease: A Chronological Journey. Am J Epidemiol, 108: 249-258 (1978); Hill, AB. The Environment and Disease: Association or Causation? Proc Roy Soc Med, 58: 295-300 (1965); Renton, A. Epidemiology and Causation: A Realist View. J. of Epidemiology and Community Health, 48: 79-85 (1994); Hill, A.B. The Environment and Disease: Association or Causation. Proc. Royal Society of Medicine, 58: 295-300 (1965); Morabia, A. On the Origin of Hill’s Causal Criteria. Epidemiology, 2: 367-369 (1991); Renton, A. and Whitaker, L. Proof of Causation and Relative Risk (Letter). Lancet, 339: 1058 (1992).
162. The basic unit of proof of general causation is not the single epidemiologic study but the colleciton of studies constituing the program of research on the issue of general causation.
163. Burch, P.R.J. The Surgeon General’s “Epidemiologic Criteria for Casualty.” A Critique J. Chronic Disease, 36: 821-836 (1983); Lilienfeld, A.M. The Surgeon General’s “Epidemiologic Criteria for Causality”: A Criticism of Burch’s Critique. J. Chronic Disease, 36: 837-845 (1983).
164. Maclure, M. Popperian Refutation in Epidemiology. Am J Epidemiol, 121: 343-350 (1985); Susser, M. Falsification, Verification and Causal Inference in Epidemiology: Reconsiderations in the Light of Sir Karl Popper’s Philosophy. In: Rothman KJ, ed. Causal Inference, pps. 33-57 (Epidemiology Resources, 1988); Buck, C. Popper’s Philosophy for Epidemiologists. Int J Epidemiol, 4: 159-168 (1975); Maclure, M. Popperian Refutation in Epidemiology. Am J Epidemiol, 121: 343-350 (1985); Pearce, N. and Crawford-Brown, D. Critical Discussion in Epidemiology: Problems with the Popperian Approach. J Clin Epidemiol, 42: 177-184 (1989); Popper, KR. Conjectures and Refutations (4th ed. Routledge & Kegan Paul, 1972); Popper, KR. The Logic of Scientific Discovery (2nd ed. Harper & Row, 1968); Susser, M. The Logic of Sir Karl Popper and the Practice of Epidemiology. Am J Epidemiol, 124: 711-718 (1986); Susser, M. Falsification, Verification and Causal Inference in Epidemiology: Reconsiderations in the Light of Sir Karl Popper’s Philosophy. In Rothman, KJ (ed.). Causal Inference (Epidemiology Resources, Inc., 1988); Karhausen, L.R. The Poverty of Popperian Epidemiology. Int’l J. of Epidemiology, 24: 869-874 (1995); Greenland, S. Induction versus Popper: Substance versus Semantics. Int’l J. of Epidemiology, 27: 543-548 (1998). In Daubert, the USSC ruled that “‘scientific methodology today is based on generating hypotheses and testing then to see if they can be falsified….’”, citing to K. Popper, Conjectures and Refutations: The Growth of Scientific Knowledge 37 (5th ed. 1989). That captures the essence of the deductivist philosophy. But does that mean that in the contest between inductivism and deductivism, Daubert valorizes deductivism? If so, what becomes of the prospect of proving causation given epidemiologic studies reporting an association? No doubt the United States Supreme Court did not consider the implications of this reference to Karl Popper in the dispute between inductivists and deductivists in epidemiology.
165. Lanes, S.F. Error and Uncertainty in Causal Inference. In Rothman, K.J. (ed). Causal Inference (Epidemiology Resources, Inc., 1988).
166. Rothman, KJ. and Poole, C. Science and Policy Making. Am J Public Health, 75: 340-341 (1985); Hertz-Picciotto I. Epidemiology and Quantitative Risk Assessment: A Bridge from Science to Policy. Am J Public Health, 85: 484-491 (1985).
167. Lanes, SF. The Logic of Causal Inference in Medicine. In: Rothman K.J. (Ed.). Causal Inference (Epidemiology Resources Inc., 1988.)
168. Grassis v. Johns-Mansville Corp., 591 A2d 671 (N.J. Super Ct. App. Div. 1991) (when a group of plaintiffs fail to meet the requirement of a RR>2, an individual plaintiff may prevail by demonstrating that his or her RR is greater than 2); In Re Joint E & S. Dist. Asbestos Litig., 827 F. Supp. 1014 (S.D. NY) rev’d in part, 52 F.3d 1124 (2d Cir. 1995) ([a]t least a two-fold increase in the incidence of disease attributable to… exposure is required to permit recovery if epidemiologic studies alone are relied upon;” even though epidemiological evidence regarding the relationship between exposure to c and the development of d may fall short of the 2.0 threshold of statistical significance, if this evidence is combined with clinical or experimental evidence which eliminates confounding factors and strengthens the connection between c and d specifically in the circumstances surrounding the plaintiff’s case of d, then the plaintiff’s causation proof may be sufficient to support a jury’s finding that it was more likely than not that the plaintiff’s case of d was caused by his exposure to c; the Bradford-Hill criteria must be assessed to determine the sufficiency of epidemiologic evidence to establish causation); DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941, 955, 958 (3d Cir. 1990) (“in order to avoid summary judgment, the relative risk of limb reduction defects arising from the epidemiological data Done relies upon will, a minimum, have to exceed “2”); Pollock v. Johns-Manville Sales Corp., 685 F. Supp. 489, 491 (D.N.J. 1988) (“issues of the inherent reliability of statistics aside, a 43 percent risk factor, although tangible, is clearly not ‘more probable than not’”); Manko v. United States, 636 F. Supp. 1419, 1434 (W.D. Mo. 1986), aff’d, 830 F.2d 831 (8th Cir. 1987) (“[a] relative risk of ‘2’ means that, on the average, there is a fifty percent likelihood that a particular case of the disease was caused by the event under investigation and a fifty percent likelihood that the disease was caused by chance alone; a relative risk greater than ‘2’ means that the disease more likely that not was caused by the event”); Marder v. G.D. Searle & Co., 630 F.Supp. 1087, 1092 (D. Md. 1986) aff’d sub nom., Wheelahan v. G.D. Searle & Co., 814 F.2d 655 (4th Cir. 1987) (“in epidemiological terms, a two-fold increased risk is an important showing for plaintiffs to make because it is the equivalent of the required legal burden of proof—a showing of causation by the preponderance of the evidence or, in other words, a probability of greater than 50%”); Cook v. United States, 545 F.Supp. 306, 308 (N.D. Cal. 1982) (“whenever the relative risk to vaccinated persons is greater than two times the risk to unvaccinated persons, there is a greater than 50% chance that given GBS case among vaccines of that latency period is attributable to vaccination, thus sustaining the plaintiff’s burden of proof on causation”); Padgett v. United States, 553 F.Supp. 794, 801 (W.D. Tex. 1982) (“a relative risk of ‘2’ or greater, then means that the probability that vaccination caused a particular case of GBS is better than 50,” hence, a relative risk of 2 or greater would indicate that it was more likely than not that vaccination caused a case of GBS”); Landrigan v. Celotex Corp., 127 N.J. 404, 05 A.2d 1079, 1087 (1992) (“the significance of a relative risk greater than 2.0 representing a true causal relationship is that the ratio evidences an attributable risk of more than fifty percent, which means that more than half of the cases of the studied disease in a comparable population exposed to the substance are attributable to that exposure; this finding could support an inference that the exposure was the probable cause. . . . [However,] under certain circumstances a study with a relative risk of less than 2.0 could support a finding of specific causation. . .”); Daubert v. Merrell Dow Pharmaceuticals, Inc., 43 F.3d 1311 (9th Cir 1993) (“for an epidemiologic study to show causation under a preponderance standard, the relative risk . . . will at minimum, have to exceed 2, a statistical study showing a relative risk of less than two could be combined with other evidence to show it is more likely than not that the accused cause is responsible for a particular plaintiff’s injury”); Hall v. Baxter Healthcare, Corp., 947 F.Supp. 1387 (D. Or. 1996) (the burden of proof requires plaintiffs to demonstrate that exposure to breast implants more than doubled the risk of their alleged injuries; plaintiffs must be able to show a relative risk greater than 2.0); In Re Breast Implant Litigation, 11 F. Supp. 2d 1217 (D. Colo. 1998) (plaintiffs must present expert testimony demonstrating that exposure to breast implants more than doubled the risk of their alleged injuries).
169. Oxendine v. Merrell Dow Pharmaceuticals, Inc., 506 A2d 1100 (D.C. 1986), cert. denied 493 US 1074 (1990) (an epidemiologic study with a risk ratio less than 2 was sufficient evidence); Parascandola, M. Evidence and Association: Epistemic Confusion in Toxic Tort Law. Philosophy of Science, 63 (Proceedings): S 168 – S 176 (1996).
170. Bayes’ theorem provides a rational way to modify beliefs based on subjective conditional probabilities. It states that the conditional probability of a hypothesis, given some new piece of evidence, is equal to the product of (1) the initial probability of the hypothesis before the evidence and (2) the conditional probability of the evidence given the hypothesis, divided by (3) the probability of the new evidence.
Pr (EH1) x Pr (H1)
Pr (H1E) = __________________
n
Σ Pr (EHi) x Pr (Hi)
(i = 1)
Rosner, B. Fundamentals of Biostatistics (3d ed. Duxbury, 1990); Nozick, R. The Nature of Rationality (Princeton, 1993).
171. Rothman, K.J. and Greenland, S. Modern Epidemiology, p. 24-25 (2d ed. Lippincott-Raven, 1998).
172. Characterizing an epidemiologic study as an algorithm, an almost infallible method of processing observations into a value quantifying an association between exposure and effect, is a plaintiffs’ ploy. For if an epidemiologic study were an algorithm, it would be very persuasive, especially if the odds ratio was greater than 2.0. That result would then be argued to be diamond hard fact, resistant to the erosive forces of reasoned critique. This ploy the defense must, at every turn, subtly undermine. The fact is, an epidemiologic study, even if its methodology is impeccable, is not an algorithm, a mechanized method for processing inputs into outputs, but merely an argument resting on all the discretionary premises and inferences characteristic of all non-algorithmic arguments.