Posted: Wed Sep 20, 2006 09:07 pm Post subject: Statisticians not wanted
It appears Social Security's administrative courts are not the only ones who like to use bogus statistical analyses.
Quote:
Statisticians not wanted
"On August 16, 2006, the California Supreme Court made it official: in certain legal cases that hinge on statistical calculations, it is not the business of professional statisticians to decide how to evaluate the statistical data and to judge what method is most suited to analyze that data. From now on, in California at least, the courts will decide what statistical analysis is appropriate and what is not.
It gets worse - especially if you are a professional statistician. By upholding a ruling by a lower state court, the California Supreme Court also affirmed that, in their view, the proper job for statisticians is simply to plug numbers into a formula and turn the crank to produce an answer. Not any formula will do, mind; if you want your calculation to play a rule in a California legal proceeding, it will have to be the formula chosen by the court. As a professional statistician, you may believe that it is precisely your job to make that call, and that no other profession has the knowledge, expertise, experience, and skill to make such a decision in your stead. But the California Supreme Court says otherwise. They say the decision is theirs to make. You don't believe me? Read on. . ."
Joined: 11 Jun 2004 Posts: 146 Location: Montpelier, Vermont
Posted: Thu Sep 21, 2006 07:05 pm Post subject:
After reading Devlin and the Johnson case, I find it hard to take Devlin's concerns seriously. Devlin suggests that there is a statistical controversy as to how to calculate the probability of "cold hit" DNA searches, i.e. searches in which the police don't have a suspect, and that because the court in Johnson let one method in as evidence, it was somehow deciding science instead of leaving that to statisticians. I don't buy it.
In the Johnson case the defendant tried to exclude testimony from an expert concerning the probability of a match using the method that she employed. That method (known as random match probability (RMP)) produced a probability of a correct matching of approximately 1/4x10^15. Devlin admits that this is a method accepted by many statisticians. Johnson tried to exclude this evidence because there is a controversy as to what the true statistical method for calculating probability should be. The case doesn't say whether Johnson tried to admit evidence based on the other statistical method. Apparently he did not. The court says nothing about whether they would have admitted this other method had Johnson tried to admit it as evidence.
Devlin suggests in his piece that the court and not statisticians are deciding the correct method, but I don't see that the court has done that at all. It is simply permitting one method, which has been accepted by statisticians, to be used.
I also think that I know why Johnson never tried to admit evidence based on the other method. According to Delvin the better formula to use in cold hit cases is N x RMP where N is the size of the database used. When I applied this formula to the facts in Johnson I get the number 1/2x10^10, still an astronomically high number, many times the population of the earth. (and these are just statistic for hispanics)
As a postcript, I am no statistician, but I have a hard time understanding why the size of the database would have any bearing on the probability of a random match. For instance, suppose that you had a database that contained every person on earth. If you have one match in that database, then there would be a 100% probability that that is the right match. However, N x RMP suggests that as N gets larger the probability of a match being accurate deminishes. Intuitively, this does not seem right.
Joined: 11 Jun 2004 Posts: 146 Location: Montpelier, Vermont
Posted: Mon Sep 25, 2006 08:02 pm Post subject:
Keith Devlin's Response:
Quote:
Craig,
Thanks for the feedback. The sole request we made
in our amicus brief was that the court treat
statisticians the same as other scientists and
allow the testimony of professional statisticians
in determining which statistics to admit. The
court denied that request. We did not propose
any particular statistic. Our concern was simply
that professional statisticians should be consulted.
In my column I did indeed select particular statements
by the court to make my case. I was not striving
to give a balanced report of the court's ruling.
I did however try to make sure I did not quote
the court in a misleading way, and the fact is,
what I wrote is what the court said. And frankly,
what they said is so wrong, it scares me to death.
I agree that using the DMP instead of the RMP
would also amount to overwhelming evidence against
the accused in the Johnson case, and this would be
the case in all cold hit cases to date. Which raises
the question, why the FBI and the various prosecutors
consistently insist on using the RMP. Why not sidestep
all those appeals and simply go with the equally
persuasive DMP? The answer, presumably, is that they
are looking to the future when cold hit searches
are made over databases with many millions of entries.
But then the NRC v Balding/Donnelly disagreement
becomes statistically significant. The statistics
community (which I am not a member of) really needs
to resolve this matter once and for all. My column
was intended as much as anything to raise awareness
of the issue.
As it happens, I think that the RMP is a dangerously
misleading figure to quite in court. I would feel far
happier for courts to admit the DMP - and only the DMP.
It is technically true that the RMP is, as the FBI
keeps saying, "relevant", but in my opinion the high
likelihood that it would mislead laypersons should
render it inadmissible. (BTW, I was originally drawn
into this business because of my experience in how
laypeople understand and misunderstand mathematical
data/reasoning.)
Again, thanks for writing.
KD
------------------------------------------------
Keith Devlin
Media X, Executive Committee
Executive Director, CSLI
Stanford University
------------------------------------------------
On Sat, 23 Sep 2006, Craig Jarvis wrote:
> Dear Kevin,
>
> I came upon your article "Statisticians not wanted,"
> when someone posted it on a disability web site which
> I follow. It led me to read the People v. Johnson
> case which you discuss.
>
> As a lawyer, I certainly know that judges do not
> always include the most important points of a contrary
> argument when they write an opinion. However, when I
> read People v. Johnson, I don't see the reasons for
> the concerns that you express in your article. I
> don't see that the judges are acting as scientists as
> you say. To my mind, they are doing what judges do,
> deciding on the admissibility of evidence.
>
> >From my reading of People v. Johnson the defense was
> trying to exclude the statistical numbers that result
> from the "RMP" method which you describe in your
> article. They appear to have tried to use the
> existence of a controversy to exclude statistical
> numbers completely. That is, they do not try to admit
> statistical numbers using the other possible method.
> Rather, they appear to be using the existence of a
> controversy in order to prevent a jury from
> considering any statistical numbers at all.
>
> I think I can understand why. The court says in
> Johnson that the RMP method produced a probability of
> a random match of one in every 4.3 X 10^15 hispanics.
> A huge number. Obviously no defense attorney wants
> that number coming in as evidence. You suggest, I
> think, a better method for calculating the reliability
> of a cold hit match would be N x RMP, where N is the
> size of the database. The database used in Johnson
> was 200,000 people. Not even adjusting the database
> for hispanics this still produces a number of one in
> every 2.15 X 10^10. Even this number is several
> multiples of the earth's population. A method that
> produces this number hardly helps a defense attorney
> in winning his case.
> It is pretty clear from the facts of People v. Johnson
> that the case was not winable for the state without
> the DNA statistical evidence of the probability of a
> random match. However, substituting one method for
> another is of no practicle help to the defense.
> Either number is damning. They have to keep the
> numbers out entirely.
>
> The court really doesn't say much about the issues
> which concern you in your article. It says nothing
> about whether it would have premitted numbers to be
> admitted using a different formula. It was apparently
> not presented with that situation. If it were, I
> would think that it probabily would have allowed that
> issue to be vetted by expert statisticians either
> before the judge or before the jury. I just don't see
> that they were faced with that problem.
>
> My sense is that you are defending the right of
> statisticians to argue over methodology in court.
> But, I think that the defense lawyers want something
> else. They want the DNA testing in cold hit cases to
> have the same status as a polygraph test. Certainly
> there is a scientific basis for the reliability of
> polygraphs, but the reliability has not been
> established to such an extent that they are admissible
> in court. To achieve that goal, I think you would
> need more than just a controversy over statistical
> methods. I think you would need to show fallibility
> in the "independence assumption" that is part of
> genetic science.
>
> Craig Jarvis
Posted: Tue Sep 26, 2006 12:29 am Post subject: DNA RMP DMP
I think the "Devlin's Angle" piece is marred by misunderstandings and/or distortions of the legalities that distract seriously from the substantive message of the piece. I also think the substantive message is suspect, though I don't understand the technical aspects well enough to be certain it's wrong.
First, the legal misunderstandings and/or distortions. The California Supreme Court did nothing except exercise its discretion to refuse to entertain an appeal. That decision is without precedential force, and it doesn't constitute an adoption of the lower appellate court's opinion. Any gripe is therefore really with the Court of Appeal -- not with the Supreme Court, as Devlin suggests. In addition, no court excluded any statistician's testimony in this case, nor ruled that contrary testimony from a different statistician would be inadmissible. Moreover, the courts apparently entertained an amicus brief from statisticians. That does not constitute a refusal to listen to statisticians' views, as Devlin contends. The worst of which the courts can be accused is hearing and then rejecting some some statisticians' views, including Devlin's -- with the result that the jury did hear from one statistician, rather than hearing from none. All the professional huffiness on this score, therefore, seems as unwarranted as it does graceless. It is, in addition, the sort of argument-from-authority that any rational judicial system rightly discounts.
Moreover, one legal view espoused by Devlin, on which his arguments against admissibility of the RMP test seem largely to be based, happens to be incorrect -- i.e., his belief that under California's Kelly / Frye test, the existence of debate within some discipline is sufficient to preclude the admissibility of the evidence. As a matter of doctrine, the "general acceptance" test applies, as the Court of Appeal's opinion explains, only to novel scientific methods or techniques. Moreover, even if that test did apply to all expert evidence, whether novel or not, "general acceptance," though not a sharply defined term, has never been understood to demand unanimity. As a matter of policy, a unanimity standard, or anything resembling it, would effectively bar all expert evidence from the courtroom. (On the policy front, it might also be felt that research on lay understanding of technical concepts, such as Devlin describes, should focus on how to communicate those concepts in a way that will enable a jury or other lay audience to follow the concepts and spot their misuse, as opposed to locating occasions to censor what the lay audience hears -- the latter approach being one that suggests a distrust of rational discourse, to my ears.)
As for the substantive question: Although people in the statistical community seem very convinced that the DMP statistic is the more "correct" number to present, and that presentation of the RMP statistic is less correct or altogether "incorrect," the pertinence of various statistics to the relevant inquiry is not a question exclusively within the province of statisticians. Some lawyers and judges may have thought very hard about this problem too, and they certainly have intellectual and institutional standing to do so.
In that vein, I will admit that I may have failed to follow Devlin's explanation, because the intuition persists that the method by which the authorities first identify their suspect simply are not germane to the question whether the RMP statistic should be admissible at trial. As a lawyer, I approach the latter problem by first noting the legal definition of relevance: evidence is relevant if it makes the existence of a fact more likely or less likely than it would be in the absence of the evidence. That is not how the word "relevance" is usually used in ordinary discourse, nor perhaps in academic dialogue, but it is the definition that the law has adopted. Under it, the RMP statistic seems plainly relevant to the question of guilt or innocence. This is especially so if the statistic is evaluated by the jury as one piece of evidence in a larger set of evidentiary facts, including nonstatistical facts, whose collective weight must be assessed as a matter of juror judgment by essentially nonstatistical means. In the California case, those included matching tattoos, the defendant's having a daughter of the same age as the one the perp mentioned to the victim, and a truck with a missing stereo. To me, for what it's worth, those facts seem analogous to the television reporter in Devlin's lottery example. The question here, for me, isn't about the odds that somebody would win the lottery; it's about the odds that this guy with the tattoos would.
Indeed, there were enough corroborative facts in the California case that a world may easily be imagined in which the suspect was first identified not through a DNA database, but through (say) his tattoos. Largely for that reason, I remain unable to follow why the admissibility of an RMP number should depend on how the suspect was first hauled in. To suppose that it should so depend seems to me an example of the genetic fallacy. The procedure followed after the California suspect's identification as a suspect was the same one that would have been followed had the suspect been fingered by Ouija board or a psychic. A blood sample was drawn, a DNA comparison with a sample from the crime scene was made, and a match was found. Why should the RMP number be admissible in the psychic situation but not when the suspect came to the authorities' attention via database? What if, unbeknownst to the authorities, the psychic cheated and consulted the database? The analytical problem confronting the jury in both cases seems the same.
(Or should the RMP number never be used in court? Should we always present the DMP number? To say the latter would almost be to say that we were under some sort of epistemological obligation to have a DNA database in the first place -- whereas, I take it, we are not.)
To me, the real mystery is why the DMP number should ever be thought admissible at all. (Here I leave to one side Donnelly's point that the database run has actually added information by excluding some potential suspects.) For one thing, the Court of Appeal seems correct that the question confronting the jury isn't the probability that there would be some match in the database. That might be the pertinent question for a statistician to whom we had delegated the entire task of determining guilt or innocence by exclusively statistical means and measures, based entirely on the intelligence that we had found the suspect through a database match. But that is not how our system has divided the intellectual labor, nor how it structures the substantive inquiry. In our system, a suspect is first identified, on grounds that may or may not afford a good epistemic or evidentiary basis to believe the suspect guilty. Material of evidentiary quality is then separately presented to an independent trier of fact (usually including evidence above and beyond a DNA match), and the jury weighs that evidence as a whole.
There are, meanwhile, independent reasons to be concerned about presenting the database number. Not all of our evidentiary rules spring from a truth-seeking animus. Some are positively truth-repelling. One such rule concerns the inadmissibility of prior arrests or convictions to prove guilt. To be sure, there are exceptions to that rule. In federal court and some state jurisdictions, the rule doesn't apply in ... well, the site's spam filter won't let me use the phrase that identifies the relevant class of cases, but perhaps the reader will be able to deduce it. Details involving prior offenses may also be admissible, sometimes, to show a common modus operandi, or for similar reasons. The general rule of exclusion remains just that, however. We follow this exclusionary rule not because a previous conviction has no logical bearing on the likelihood that the defendant committed this new crime. It is for other reasons, perhaps involving the notion that we should convict people of crimes one by one, because some people can overcome their criminal past and lead noncriminal leaves. Or at least we want to give them that chance.
The rule against offering prior convictions seems starkly implicated, if the jury hears about DNA match probabilities based on criminal databases. I'm not a criminal defense lawyer, but if I were, I wouldn't think my clients would want the jury to hear that they had turned up in such a database. I've heard it suggested that we could simply keep the jury in the dark about who populates the database. But it would take only a few CSI episodes before jurors had figured it out. I gather that defense lawyers are nevertheless keen to tout the DMP statistics. But at a time when those DMP numbers still leave the odds against a chance match at stratospheric levels, that keenness baffles.
You can post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
SSAConnect has been picked up on StumbleUpon and Digg. I’d appreciate a Stumble and a Digg from anyone who has a couple seconds to spare. Thanks!
Traverlaw, Traver Law, Traver Law Offices, S.C., Traver & Traver, S.C.,
Traverlaw.com, SSAConnect,
Attorneys for the disabled and disadvantaged in all areas of Social Security
disability law, http://traverlaw.com,
http://ssaconnect.com, Connect, SSA Connect, Think Bigger,
Social Security Advice Connect, Social Security Disability Advice Connect,
"Social Security Disability Advocacy, Debate, and Professional News,"
the yellow and orange swoosh image, and the square favicon.ico image,
are trademarks and service marks of Attorney David F. Traver.
For information the about use of this copyrighted and trademarked material call
262-594-2096.