Welcome to Bioblog
Dedicated to biology and music
On biotunes.org

Saturday, March 1, 2008

Adapt public education to individuals, not demographics

The genetics-solves-everything crowd is continuing to have an influence on society that threatens to set Americans' notions of equality back decades. I still believe these attitudes are cyclical, but it is always depressing and disturbing to be in the regressive part of the cycle, with no hint of change in sight. The target now of course is public education - always in the sights of extremists, whether it involves adding prayer, subtracting science, or the current fad, teaching kids their gender roles, as if society weren't taking care of all of these things adequately outside the classroom.

Dr. Leonard Sax's website is called "Why Gender Matters". His publications have such objective scholarly titles as "Reclaiming Kindergarten: making kindergarten less harmful to boys" (Psychology of Men and Masculinity, American Psychological Association, 2(1):3-12, 2001), which like his other writings set up an absurd dichotomy between boys and girls as if they are unrelated species. He claims that for boys, but not girls, kindergarten is "a series of alienating failures and humiliations" and implies it is thus the end of their academic careers. Many women competing for professional jobs (requiring extended education) with men would be surprised to hear that all males' spirits were crushed in kindergarten, given that they are still pretty much running society.

The problem with our educational system is not that "no one is teaching them how to be men and women" (from Sax's website) but that we are using blunt instruments, such as standardized testing, which saps what little autonomy teachers had in the classroom before NCLB. This means they are unable to address differences among individual students in development times of different skills. Yes, that variation exists, but using gender as the blunt instrument to guide education reform is even worse than using a standardized test. On top of it being a pointless exercise to assume anyone's academic strengths and weaknesses at a given age can be assessed using their appearance, it also reinforces so many stereotypes that so many of us had finally begun to move past, and furthermore gives them false "scientific" credibility. This type of "science" is no different from attempts a century ago to demonstrate through physical qualities that blacks were less intelligent than whites.

This blog has previously summarized the alarming trend of claiming genetic origin for every trait anyone can think of, and why the papers supporting these ideas tell us absolutely nothing. The problem of the other type of research cherry-picked by Sax to support his agenda is that it studies already-developed human beings. Anyone who has raised a child should understand the intellectual dishonesty of claiming that behavioral traits possessed by a baby or toddler are clearly genetic. Humans are social creatures, programmed from birth to learn from other humans how they should behave. That includes identification with a particular gender, and all the traits associated with it in a particular society. Brain development does not occur in a vacuum, but is affected by experience. Brain-scan differences even in a newborn can not be determined to be genetic, because the newborn's brain started developing nine months before.

Most important though, the differences found are minor and slight - meaning it is unlikely that they are biologically significant. From the Times magazine article:

Sax initially built his argument that girls hear better than boys on two papers published in 1959 and 1963 by a psychologist named John Corso. Mark Liberman, a linguistics professor at the University of Pennsylvania, has spent a fair amount of energy examining the original research behind Sax's claims. In Corso's 1959 study, for example, Corso didn't look at children; he looked at adults. And he found only between one-quarter and one-half of a standard deviation in male and female hearing thresholds. What this means, Liberman says, is that if you choose a man and a woman at random, the chances are about 6 in 10 that the woman's hearing will be more sensitive and about 4 in 10 that the man's hearing will be more sensitive. Sax uses several other hearing studies to make his case that a teacher who is audible to boys will sound too loud to girls. But Liberman says that if you really look at this research, it shows that girls' and boys' hearing is much more similar than different. What's more, the sample sizes in those studies are far too small to make meaningful conclusions about gender differences in the classroom.


Why is it now acceptable to use "science" to foster people's underlying prejudices about gender, but no longer about race? Apparently there is some sort of hair-splitting going on in the minds of these "scientists" that of course skin color and other associated traits tell you nothing about what is going on in someone's brain, we know that now, so forget about that. But different genitals, now that clearly must be correlated with brain function. Especially the genitals of pre-pubescent humans!

It is especially insidious that the idea being promoted is just a new version of "separate but equal", which as anyone knows who is at all familiar with history, means anything but. Sax's motivation is clear. He has been on a crusade for years to convince people that public education is biased against boys because most of the teachers are women. (Of course, who is responsible for that? Surely not the men who over the ages told women that the only profession they could have was teaching, since obviously it is such an undesirable job. Surely not the principals and superintendents who for some reason are still overwhelmingly male, and oversee overwhelmingly female teaching staffs. But I digress.) He does a clever job of convincing people that he cares about girls too, but this concern is nothing but pandering to get people to buy into his system of segregation.

It's truly a shame, because for completely opposite reasons, single sex classrooms in public schools can be a good idea. For instance, in the context in which many kids are more interested in what the kids of the other gender think of them than the academics going on in class, single-sex classrooms can remove a major distraction. Because it's a good idea for kids to learn to relate to the other gender socially, it seems that the best situation is some, perhaps not all, single-sex classes in coed schools. It also does help remove some teacher biases which have usually been documented to favor boys (not girls, as Dr. Sax claims) in their participation. But if, as Dr. Sax claims, the majority of schools going to single-sex classes are basing their new paradigm on his "genetics" theories, then we are in big trouble, because it will make many of the gender prejudices that have sunk below consciousness openly acceptable again.

Labels: , , , ,


Thursday, February 28, 2008

Teen sex - is it bad or good for society?

Apparently it has been an assumption for a long time in some circles that early sex by teenagers results in their later delinquency. Two recent papers demonstrate just how muddled this theory is (along with most theories generalizing about human behavior), because they differ in their conclusions based on how the data were analyzed. The first paper's ( Armour, S. and D.L. Haynie, 2007. Adolescent sexual debut and later delinquency. Journal of Youth Adolescence 36:141-152) purpose was to use data to support the theory, which it does. The second paper (Harden, K.P., J. Mendle, J. E. Hill, E. Turkheimer and R.E. Emery, 2008. Rethinking timing of first sex and delinquency. Journal of Youth Adolescence, in press) uses the same dataset to reach the opposite conclusion, that earlier sex reduces future delinquency.

The second group of authors of course claim that their analysis is the better one, and in this case it is true. These papers, in fact, are a good demonstration of one of the major problems of large-dataset human studies, which is that they only control for factors (in this case, survey responses about race, income, parent's education, GPA, drug use, etc.) that the researchers imagine could affect the data, and not all the other hundreds of factors that also could but are ignored out of practicality or researcher bias. The authors' hope is that their use of a giant dataset will obscure the fact that important information is lacking.

(Once again, we will put aside the first major problem of such studies, the use of self-reporting data. Of course since both groups of authors rely on them, neither mentions how unreliable they are, especially, one might assume, with regard to sexual experience. And one might also imagine that the group of people who are most likely to lie about sexual experience is teenagers.)

The reason the second study is the better analysis is because the authors recognize that pooling all the data loses important information. Meaningless averages are calculated by pooling teenagers from all cultures and walks of life. To a repeat a very nice analogy used by the authors of the second paper: if you wish to correlate meat consumption with life expectancy, and you compare two countries, one primarily meat-eating and another not, you find a positive relationship - higher meat-eating correlates with higher life expectancy. But a third ignored variable also correlates positively with meat-eating, and that is level of industrialization. So to truly understand the relationship between meat-eating and life expectancy, you must control for industrialization. When the analysis is rerun within one country, the correlation between meat-eating and life expectancy is negative.

In addition, what is found in both papers is simply correlation, not causation (a trap that first-year undergraduates are taught to avoid, and yet catches so many human-behavior researchers). That is, the only information one has after the meat study is that meat-eating is associated with lower life expectancy. The study has not shown that meat-eating causes lower life-expectancy.

These were the two main problems with the first paper. The authors pool individuals across a wide range of cultural norms, which gives them a spurious result, and then conclude that early teen sex causes delinquency when the two are only correlated. Even though they use a crude control for cultural influence (average reported age of first sex for a given teenager's high school) they ignore any potential unstudied factor that could cause both (just as industrialization causes both higher life expectancy, and more meat-eating), obscuring the results for individuals.

The second paper solves that problem by analyzing only the identical twins in the dataset (which was large enough for them to have data for 289 twin pairs), and therefore controlling for both genetics (which the twins share exactly) and environment (which twins living in the same household largely share). This is an appropriate twin analysis because (for this main point at least) the authors don't care about trying to separate genetics and environment to answer their question. (Twin studies that do confound objective data with subjective assumptions.)

On top of all this, though, is another major flaw in the dataset, which the second group of authors strangely acknowledge despite their analysis. The supposedly "independent" (time of first sex) and "dependent" (delinquency) variables are by definition related from the start, because in much of American society, teen sex itself is considered delinquent behavior. What they are doing is a bit like asking whether or not shoplifting is correlated with delinquency. This certainly confounds the first study.

What does it mean that the second study found that identical twins who have their first sexual experience earlier than their siblings are less likely to engage in delinquent behavior? The authors seem to feel they have no choice but to conclude that there is probably no relationship between these factors at all. Perhaps that is exactly what they would have found statistically if they had used a Bonferroni correction for their dozen or so analyses. Either that, or delinquency is caused by sexual frustration, and the problem of misbehaving teens is now solved.

Labels: , , , ,


Thursday, December 13, 2007

The Continuing Misuse of Body Mass Index

Some researchers (Flegal et al., 2005 and 2007) have claimed, using data for Body Mass Index (BMI) and death rates, that it is somehow beneficial to be "overweight." This is the type of announcement that will always make headlines in the popular press. As usual, these studies are fraught with flaws that would only be accepted in a publication about humans, and are meaningless in the terms that really matter to all of us, which is what decisions individuals should make about their lifestyles in order to have the best chance of living long, healthy lives.

The major reason why studies like these are meaningless is because of the arbitrary nature of definitions: "underweight" = BMI <18.5; "normal" = BMI 18.5-25; "overweight" = BMI 25-30; and "obese" = BMI > 30. There may not be much reason to quibble with the extremes on this scale. As expected, the majority of "excess deaths" were associated with the extremes in weight - both "underweight" and "obesity grade 2" (BMI>35). But of course what makes the headlines is the fact that over the time period studied, "overweight" people had fewer excess deaths of most diseases than "normal" people.

The first question anyone should ask is, what exactly does BMI measure? BMI is nothing more than a ratio of your weight by your height (squared). It takes nothing else into account. As originally conceived, the BMI was not intended for assessing individuals medically. It has become a popular measure in the media because it is somewhat useful for describing population-level trends - such as the well known trend in the U.S. toward more obesity over the last few decades. BMI data in this context can show us that there is something about the lifestyle of Americans which is causing us to gain weight, and in a rational world would lead to measures to provide plenty of decent food and opportunities for reasonable levels of physical activity for everyone in the country.

But for an individual, BMI is not particularly useful, because it does not take into account the individual variation in body type (for example, bone density or muscle density, which contribute far more to weight than fat), not to mention activity level or overall conditioning, or even gender. All the talk of BMI has emphasized calculating one's own BMI, for example at the federal government's CDCsite, which, based on the arbitrary, government-created ranges listed above, spits out an assessment that you are "normal," "overweight," etc. Of course the caveat on most of these sites is given that for "highly trained athletes" BMI may be "high"; it is implied that this condition is sufficiently rare that the great majority of us need not take it into account.

On any football team, even at the lowest level (such as my NAIA college), every player would be considered "overweight," and many are classified as "obese." Perhaps this is a straw man, but I would challenge someone who seriously believes in the BMI as an individual health measure to calculate it for anyone who regularly goes to a gym or just has more than moderate physical activity. Based on my own experience, I suspect that a lot of Americans who ended up in the "overweight" category were fit; hence, the "earth-shattering" results that it is beneficial to be "overweight."

Of course, these papers are indeed using the typical huge sample sizes to make their point, so aren't they using BMI correctly, to study general population trends? If only they said so in the discussion, or at least included the caveat about the arbitrary divisions in BMI classes, one could more easily put these papers into the proper context, which is that there are health risks associated with the extremes of BMI. But of course they don't. The truth is that only you and your doctor can assess your personal health risks and the benefits to changing your lifestyle (if that is even realistically possible for most people). The media saying to everyone, "hey, it's better to be overweight!" are missing the point. If you are interested in maintaining or improving your health, stop using BMI calculators. Just eat well and get enough exercise to feel good.

References

Flegal, K.M., B.I. Graubard, D.F. Williamson, and M.H. Gail, 2005. Excess deaths associated with under weight, over weight, and obesity. Journal of the American Medical Association 293:1861-1867.

Flegal, K.M., B.I. Graubard, D.F. Williamson, and M.H. Gail, 2007. Cause-specific excess deaths associated with under weight, over weight, and obesity. Journal of the American Medical Association 298:2028-2037.

Labels: , ,


Saturday, November 24, 2007

The problems with twins

A paper discussed in an earlier post (Alford et al., 2005) relies for much of its justification on a series of papers by Thomas J. Bouchard and coauthors. An expansive claim for genetic heritability of all sorts of behavior and attitudes is found in Bouchard and McGue (2003). Frankly, the arguments made are even more disturbing in this paper.

Why disturbing? What is truly the harm in scientifically separating out the genetic and environmental influences on everything from social attitudes to "vocational interests"? The harm is that such papers serve no function other than to fan the flames of bigotry.

We are likely not yet at the crest of the pro-genetics wave. There are more and more studies being published which claim genetic bases for all sorts of traits (e.g. politics, aversion to new foods, obesity). One walks a precarious path interpreting the actual significance of this work for the general public, who is under the mistaken impression that scientists know far more about human genetics than they actually do. The mainstream media certainly tends to work more as a blunt instrument than as a nuanced filter of published scientific studies, happily being used by authors and journals who have a strong self-promotion agenda.

The straw man constructed by the pro-genetics crowd is that obviously genetic variation exists or we would all be the same. But of course it is biological significance of genetic traits that matters, not whether or not the variation actually exists. Even the most rigorous study that shows valid statistical significance for variation in a trait does not necessarily demonstrate biological significance - in fact, the huge sample sizes often used in these studies, which has the intended effect of increasing the chance of reaching statistical significance, undermine arguments for biological significance (see figs. 1 and 2). If it takes a survey of 10,000 people to detect a difference, then there is clearly so much overlap in the groups being compared that the authors should be hard-pressed to convince anyone that it matters. A slight fluctuation about the mean is much more likely to be amplified into statistical significance, although the direction it is significant is determined randomly. This was brilliantly demonstrated in a pair of papers about birth order and IQ.



Where do all the heritability estimates come from? For example, Bouchard and McGue report "heritability of IQ is about 50%." This of course is based on twin studies. But the logic used by all of these authors (none of whom I have encountered so far are actual geneticists) to produce such an exact value of genetic heritability is flawed. They make the assumption that if you compare differences in scores on surveys between fraternal twins with the differences between identical twins, that difference is the genetic component of the trait. This assumes that the environment in which fraternal twins are reared is as similar as the environment for identical twins. This assumption has naturally been both challenged and defended, but for the moment, let us concede it as valid. That leaves us with the corollary assumption that the difference between fraternal differences and identical differences is therefore entirely genetic. This is where the problems with these studies lie (discounting the obvious problems with using tests or surveys - which are biased by authors, affected by mood of the taker, etc. - to make sweeping statements about genetics).

Identical twins not only share a genotype, but also a phenotype - they look the same. As explained in the previous post on this topic, how you look is going to affect your social attitudes, not to mention self image, mostly because of your interactions with other people, who clearly behave differently to people who look different. Until a study is conducted in which one half of 30 identical twin pairs has a dramatically altered appearance (e.g. is in a wheelchair or has had major facial reconstructive surgery), all the twin studies (even those in which the identical twins were reared apart, long a mainstay of the pro-genetics camp) declaring the percent genetic contribution for any subjective phenotypic trait will be meaningless.

These papers harken back to the dark days of phrenology and craniometry - the methods employed are no more scientific, because we know no more about how genetics affects these traits now (more than mere speculation) than we did back then about how the brain functioned. But, the results are used by those who are racist or sexist to defend their views. All traits are a unique combination of usually complex genetics and environment. There is no way to establish that a person a particular "genotype" for intelligence or social attitudes, and even if there were, the expression of that trait will be dependent on the environment in complex ways that are not easily measured. Most important, because of the huge overlap in any trait associated with the brain across all types of people, for any given individual, there is no way to determine what part of their intelligence, personality, or skills are based on their appearance, even if there actually is a true statistical difference in these traits for different races or sexes (Fig 2.). But studies such as these are used by people who wish to have their stereotypes confirmed "scientifically," and frankly one has to wonder if the authors are not such people themselves.


References

Alford, J.R., C.L. Funk, and J.R. Hibbing, 2005. Are political orientations genetically transmitted? American Political Science Review, 99:153-167.

Bouchard, T.J., and M. McGue 2003. Genetic and environmental influences on human psychological differences. Journal of Neurobiology 54: 4–45.

Labels: , , ,


Tuesday, November 6, 2007

Politics and Biology, Part 2

My last post covered a paper that found brain physiological correlates of political attitudes. That paper did not address the origin of the physiological differences, i.e. whether they might be genetic or environmental. An earlier paper (Alford, J.R., C.L. Funk, and J.R. Hibbing, 2005. Are political orientations genetically transmitted? American Political Science Review, 99:153-167), though, does claim a genetic origin for particular suites of political attitudes. While authors acknowledge that on the surface this seems nonintuitive, they back up their case by citing numerous studies that other social attitudes have a genetic origin.

This is another example of a twin study being used to support genetic causes of differences among people previously assumed to be purely environmental. The survey data used were collected not as part of this study, but recycled from a previous study, and include the usual thousands of twin pairs, because these authors, like so many others, assume that more is better, when that is not in fact true. A weakness of the data set is exposed when they calculate a surprisingly high 40% heritability of educational attainment, without any mention of the demographics of the survey-takers. Could there be any correlation of socioeconomic and education level and the willingness to return an esoteric research survey? It seems likely, but there is no consideration here of the possible effect.

The authors are all political scientists whose paper, published in a political science journal, purports to teach other political scientists how genetics works. (One can't help but wonder how they would react to a bunch of biologists pontificating on a political topic in a biological journal.) It probably does not matter whether or not any biologists actually reviewed this paper; twin studies appear in plenty of biological journals as well.

All the arguments the authors make supporting their assumptions, come from other twin studies, which have the same methodological issues, stemming from the ethical impossibility of manipulating human phenotype and social environment. The most important assumption upon which the results are based is of course the one apparently made in all twin studies: correlation in survey scores between identical twins minus correlation in scores between fraternal twins equals "heritability" of the survey scores. The authors attempt to address the obvious criticism that identical twins are likely to have a more shared environment than fraternal twins by citing studies that assert this is not so. Of course, if a weak study is published, all those that rely on it are weak as well. The problem with all of these studies is that they fail to separate the effects of genotype vs. phenotype. This is the major criticism of "twins reared apart" studies - similarities between identical twins reared apart aren't automatically genetic, because they look the same (and often have similar mannerisms). Researchers seriously underestimate the importance of visual cues in affecting human interpersonal interactions. Hence, their social interactions will have a greater tendency to be similar even when they live in different environments.

In a similar vein, when the authors claim that a majority of political attitudes are genetic, do they honestly think this explains why black women are likely to be more liberal than white men? Clearly phenotype, which drives much life experience, is the important difference here, not genotype. The authors also do not address at all the fact that many peoples' political attitudes change over time, sometimes in an extreme fashion. Yet they assert with a straight face that when children have opposing political beliefs from their parents, it must be due to a genetic mutation. They actually even calculate a heritability for party affiliation at 0.14. Yes, 14% of your choice of party apparently is from those alleles for "republican" and "democrat" you have tucked away.

Oddly, considering the tone of most of the paper, the authors do manage to point out correctly (even going into some depth on the topic) that all people are a product of the interaction of their genotype with their environment. They seem to do this to bolster their point that political attitudes must be partly genetic, but the discussion shows that clearly they think genes are as important as the social scientists they are criticizing think environment is. They go so far as to express regret that the current polarizing political environment will not change because of assortative mating - those with overlapping political views are more likely to reproduce together (true), and thus the "political genes" of their offspring will make them more likely to be polarized. Then they go on to speculate about the evolution of these genes and their effect on society, as if they actually know something about how evolution works, after reading a few papers.

Looks like all of the people with Ph.D.'s in genetics and evolution just wasted their time. They could have had a different career and still published papers on the topic.

Labels: , , ,


Friday, October 19, 2007

You are responsible for how your children eat

Once again, the mass media can't tell real science from a junior high fair project. Genetics is the new path to a guilt-free style of life and child-rearing; it has been mass-marketed to people who want to evade all responsibility for persistent unpleasant conditions ranging from obesity to bad behavior, not to mention sexism. This is especially true in the field of child-rearing, where there has been a backlash against the offensive "blame the mother" explanations for problem children espoused decades ago. As usual, though, the backlash has swung too far in the other direction, making parents completely blameless. Bad papers supporting this faddish view continue to be published simply because they generate positive press.

The latest condition to be given a free pass is picky eating by children, in a study (Cooke, L.J., C.M.A. Haworth, and J. Wardle, 2007. Genetic and environmental influences on children's food neophobia. American Journal of Clinical Nutrition 86:428-433) using the differences between the eating behavior in identical twin pairs and in fraternal twin pairs. The idea is that twins in each pair are both presumably reared in the same environment, and so if there is a bigger difference in fear of new foods (="neophobia") between fraternal twins than there is between identical twins, this behavior must be genetic, thus, it's not the parent's fault that a child is picky.

The press release demonstrates the first problem with this paper:
"People have really dismissed this as an idea because they have been looking at the social associations between parents and their children," Dr. Cooke said. "I came from a position of not wanting to blame parents."

This of course explains the absurd conclusions of this paper - apparently, the first author had decided what the results should be before the study was conducted.

As usual with human health papers, the authors commit the fallacy of believing that a bigger sample size is better. They admit up front that a previous paper failed to find that genetics played a role in picky eating, but discount it by asserting that its sample size of 91 was too small. This of course is a circular argument; the authors are defining 91 as too small a sample size because no effect was found.

The data from the study are from parents completing questionnaires on the pickiness of each of their twin children, answering four questions on a scale of 1 to 4. They determined that fraternal twins were more different in their food choice than identical twins, based on a difference in survey score of 0.03 on that scale. Such a minute difference could of course only be detected by an enormous sample size, which they had - 5390 twin pairs. When a huge sample size is needed to detect a difference, it is unlikely that difference is biologically meaningful. This result is thus consistent with that of the previous paper which was unable to detect a difference with its sample of 91 twin pairs. It is especially egregious that the authors try to pass off their result as meaningful, when two other analyses associated with the paper that did not contribute to the support of their theory, but were statistically significant, were dismissed because these results were "statistically significant only because of the large sample size."

The other fundamental problem with the study is the fact of its reliance on evaluations of pickiness by parents. Think about it for a moment: parents with identical twins often dress them the same, give them the same toys, and refer to them as a unit, while parents with fraternal twins are much less likely to do so. The tiny difference that was found could be merely due to a bias by parents of identical twins, in their unconscious assumption that the children are naturally similar. This is separate from the possibility that there would probably be a greater correlation between identical twins than fraternal twins in personality traits that manifest themselves as food pickiness, apart from any genetic determination of pickiness per se. And, while the questions addressed the children's behavior toward new food, there was no data on how often the children actually encountered new food.

Obviously there are clear genetic differences in how individuals taste food. This is known from research on taste receptors. Given this, it is actually surprising that a study that set out to show genetic differences in food exploration found such a slight effect. Nevertheless, the publicity surrounding this non-result will lead people to believe that they cannot control their children's stated food preferences, and thus must give in to them. This is an unfortunate implication, because it could lead to an even larger number of adults in the next generation subsisting on McNuggets from the drive-thru as they drain the health system with their obesity-related conditions.

You do have a choice about whether or not your children will eat anything besides tater tots. Give them the food you have cooked for the family, and if they don't want to eat it, fine. They really will not starve to death between dinner and breakfast. (For a lot more advice in this vein, read Ellyn Satter.)

Labels: , ,


Thursday, October 4, 2007

Why the public doesn't get science

Finally, a paper that is fun to read: Sand-Jensen, K. 2007. How to write consistently boring scientific literature. Oikos 116: 723-727.

Dr. Sand-Jensen has hit the nail on the head with this one. The ten rules below are his for making a scientific paper as inaccessible as possible - a problem frequently encountered by those of us trying to make the scientific literature more accessible to a general public which is not science literate.

1. Avoid focus
2. Avoid originality and personality
3. Write l o n g contributions
4. Remove implications and speculations
5. Leave out illustrations
6. Omit necessary steps of reasoning
7. Use many abbreviations and terms
8. Suppress humor and flowery language
9. Degrade biology to statistics
10. Quote numerous papers for trivial statements

Sand-Jensen needs to make it clear he is being tongue-in-cheek, however; honestly, it would not be surprising for some of the scientists out there to take his advice seriously. So he summarizes his views:
Because science ought to be fun and attractive, particularly when many months of hard work with grant applications, data collections and calculations are over and everything is ready for publishing the wonderful results, it is most unfortunate that the final reading and writing phases are so tiresome.

Most of these problems could be alleviated by the authors themselves. Why are they not? Because papers that have these characteristics continue to get published. In fact, if one were to attempt to remedy "rule" no. 8, it is likely that most reviewers for most journals would send the paper back in order to have all entertaining language removed.

But this is the proximate cause of the problem. What is the ultimate cause? Most people are never taught to write, including Ph.D.'s. They adopt the absurd jargonist language of their field because they were taught to write when training for that field. The crisis in our educational system does not end at the university level. Students are not trained to write in high school; a literate 10-page paper turned in for a typical college course is currently so rare it can be considered an endangered species. College professors then are left with two options: either to try and make up for basic skills that students should have learned in high school, or to join the ranks of the bitter cynics and hand out passing grades and thus degrees as rewards for students showing up to class. And guess which professors get the better evaluations, and therefore, less hassle from administrators.

Thus, many students who are bright enough in a certain field and interested in going to grad school still lack basic communication skills, and the cycle is perpetuated when they become professors themselves and teach their students to write specifically in the jargon of their field.

So getting back to why all these terrible papers get published in the first place, is it simply because standards are so low for good writing? Or even that people so rarely see good writing that they don't recognize the bad? This is part of it, but while the existence of the bad writing in the first place was a catalyst, the whole equation also includes the ego factor - there is the distinct subtext in many unnecessarily complicated papers of, "if you don't understand my paper, it's because you are not as brilliant as me." There is no other possible explanation for the slew of poorly written and mistake-ridden modeling papers in ecology. Reviewers must be afraid to tell editors that a paper does not make any sense, not realizing that if this is the case, it is the writer's fault, not the reader's fault, when the reader is an educated and well trained professional in the field.

From this insecurity-soaked process then emerges a kind of code language for professionals within a narrow field (and the narrowness of some of these fields is suggested by some of the journal titles out there, such as "Journal of asynchronous learning networks" and "Journal of aquatic ecosystem stress and recovery" just to pull out a couple from the thousands of journals available from a typical university library). This spirals inward sometimes to the point where there are only four people out there who can read a particular paper, and it gets published because those people are the reviewers - because no one else can understand the paper.

This is a particularly important issue with human health because from papers that are often horribly written, leave out important methodological information, use bad statistics, etc., the public is spoon fed a misleading press release that makes bold new health claims that are not at all substantiated by the paper, but that give publicity to the journal and sell newspapers.

So the problem identified by Sand-Jensen - but always known to the few scientists who care about proper scientific communication - reaches much further than the frustration of a scholar having to wade through a morass of bad writing. It affects the public's attitude and education about science, which in the U.S. couldn't be much lower for an industrial country. And it makes even those of us who did go into science as a career feel sometimes like closing the journal, turning on the TV, and watching Law and Order reruns for the rest of the day. Dr. Sand-Jensen speaks for me when he says, "It has been a great relief from time to time to read and write essays and books instead." That is the raison d'etre for this blog.

Labels: , ,


Thursday, September 27, 2007

Alcohol and pregnancy - what do we really know?

One media-blasted notion that most women now seem to take at face value is the importance of not drinking a drop of alcohol during pregnancy. The most ironic part of the propoganda is that those passing it along know it is a double-edged sword: they try to keep an impossible balance between convincing you that one beer will cause irreparable harm to your fetus, and assuring you that if you had some drinks before you were pregnant, you shouldn't acutally worry.

This contradiction might make some women wonder where the truth lies. While it is clear that excessive alcohol consumption can result in Fetal Alcohol Syndrome (FAS), a suite of birth defects mainly affecting the brain (learning and memory), but Wikipedia's definition of the condition is misleading in that it states: "It is unknown whether amount, frequency or timing of alcohol consumption during pregnancy causes a difference in amount of damage done to the fetus." A perusal of the current literature on the topic of alcohol and pregnancy makes it quite clear that generally there is a positive relationship between the amount of alcohol consumed during pregnancy and the likelihood and severity of defects. In addition, it is also known that the most important brain development occurs in the first trimester, and thus that is almost certainly the most critical period to avoid consumption of alcohol.

What has been unknown since the FAS-prevention media blitz started is whether or not there is a threshold level of alcohol consumption below which there is no damage to the child. Frankly, there doesn't seem to have been a lot of interest in this among medical professionals, because it is simpler just to tell women not to drink at all than it is to consider nuances that might cause women to think it is okay to drink, and then drink too much, whatever too much is. The one certainty is that if a woman does not drink at all, there is no danger of impairments to the fetus caused by alcohol.

There have been occasional papers, however, focusing on the moderate drinkers, in order to possibly establish a safety threshold for alcohol consumption. One research project in particular is interesting because it is an ongoing longitudinal study that began when 580 mothers involved joined the study at 4 months pregnant. The most recent paper I was able to access on the study has data from when the focal children were 14 years old (Wilford, J.A., Richardson, G.A., Leech, S.L., Day, N.L., 2004. Verbal and visuospatial learning and memory function in children with moderate prenatal alcohol exposure. Alcoholism: Clinical and Experimental Research. 28:497-507.) This study is a genuine attempt to uncover potentially subtle effects on children whose mothers had around three drinks or less a week during pregnancy.

The positive aspects of the paper is that 1) it separates drinking by trimester, which is developmentally relevant and 2) it compares continuous variables in a regression analysis - e.g. drinking amounts are not placed into arbitrary categories, rather the actual volume drunk is used for the data.
The problems with the paper are pretty much the same ones that plague so many medical papers. First, by necessity, drinking levels are self-reported. Given that the first assessment was at 4 months - after the first trimester - the accuracy of the data is not convincing. Not only was each woman required to remember amounts drunk months previously (questionable even for those not experiencing the undulating body chemistry of pregnancy) but at the point these women were pregnant there was already some media blitzing discouraging drinking during pregnancy. The authors do not address the problem of potential underreporting bias.

Mainly, though, the problems are statistical. The authors used a large suite of learning and memory tests to score the children. They then regressed every variable they could think of against the scores, because of the need to correct for all the other potential factors influencing the scores, for example: race, mother's IQ, child's age, child's current substance use, mother's current substance use, etc. in addition to mother's alcohol and other substance use over each trimester. This makes for an extremely large number of statistical tests performed, although the number is unreported and thus must be estimated. The "significant" results alone number 31. For the six learning and memory tests that had "significant" results, all included mother's use of alcohol in the first trimester (but not the second or third in any case). This leads to a long discussion of the mechanism by which moderate drinking has caused "learning and memory deficits" in these children, etc.

But assuming the learning and memory tests are valid evaluations, we need to take another look at what the statistics actually showed. (Table 2 with p-value results is reproduced below.) First of all, one of the most basic statistical rules, ignored by doctors in every case I have seen, is that there is a probability of being wrong about the result. It is generally accepted that if the statistics show 95% probability of an effect, it is a valid result. The problem arises when as part of a single study, a large number of statistical tests are run, as in this case. The flip side of being 95% sure is that out of every 20 statistical tests conducted, we can expect, on average, for one of them to show significance when in fact there is actually no effect. To control for this problem, many scientists employ a Bonferroni correction, which simply means dividing the base significance level (0.05) by the number of tests conducted - in this case, apparently something greater than 200, although the true number is never stated by the authors. The appropriately conservative significance level for this paper would then be 0.00025.

The highest significance level for maternal consumption of alcohol in the paper was at <0.001 for "word-pair learning," and three of the other four significant tests were at 0.05. Perhaps, though, since first trimester alcohol consumption cropped up as a relatively high probability effect for five different learning and memory indexes (out of a total of 12, I think, but again it is not clear), it could be considered a real effect. But what other effects were real? The only effect that nearly every single time passes the Bonferroni level of significance was mother's IQ, which was significant at the 0.00001 level 5 times out of 6. So, even if there is a barely measurable effect on learning and memory on the children whose mothers drank, the IQ of their mother is 5000 times more important in determining that score. Also, for a given index, equally or more important than whether the mother drank was the child's current use of alcohol or tobacco. Race was more significant for four indexes. In two of the indexes, the child's gender was more significant.

The take-home message boils down to this. A lot of factors affect your children's mental abilities. Many of these are out of your control. The first trimester of pregnancy is certainly the most sensitive period for brain development. For a lot of women, morning sickness during that time means the last thing they are interested in is alcohol. Although these results are about as sketchy as you can get, it is certainly rational for women to abstain during the first timester, just in case. It took a lot of fishing for these authors to find a result they could publish, and they never discussed it in terms of relative importance to the other factors. Ironically, the mothers who seem to worry the most about alcohol during pregnancy are the high-powered, yuppie types with the big IQs to match - who it seems have the least to worry about. After all, their mothers went to cocktail parties while pregnant, and they still got into Ivy League schools.

So next time you see a visibly pregnant woman drinking a glass of wine in public (are there any of them left who have the guts to do it?), don't insist that it is your civic duty to give her a dressing down, because she is past the first trimester anyway. The stress you cause her is probably worse for her fetus than the wine.

Labels: , ,


Sunday, September 16, 2007

One Parrot a Career Makes

At the risk of sounding heartless, the death of Alex the Talking Parrot last week leaves some of us, who have worked in the same department with Dr. Irene Pepperberg, a bit relieved. It was certainly bordering on irritating when those of us studying animal behavior using less glamorous species were putting in the long weekends to raise our sample sizes in order to make our work acceptable to the reviewers of actual science journals, while Dr. Pepperberg cranked out dozens of papers in such publications as the Journal of Comparative Psychology and Language Sciences, and was recruited to various appointments in science and psychology departments at the University of Arizona, MIT and Harvard using data from a single, often disagreeable, bird.

Certainly Dr. Pepperberg played the media like a fiddle, making her attractive as a "researcher" because of all the publicity she drew to that one bird. She laments Alex's passing in many respects, surely, but partly because after over 25 years of training, Alex still had not had the chance to show the world what a genius he was:

Alex could pull together a few simple concepts. Show him a group of objects and he could tell you, "What color is wood and four-corner?" or, "What shape is paper and purple?" Dr. Pepperberg was hoping to train Alex to spin his own recursions, informing her that the nut was "in the blue cup that’s on the tray" or "in the yellow box on the chair."

"I wish we had gotten further," Dr. Pepperberg wrote in an e-mail message. "We were just beginning to get him to designate things like 'in' and 'on.' "

Fortunately, though, he did last long enough to have a human's entire career built around him. Perhaps the two other parrots Pepperberg is training will achieve greater heights of language skill than Alex did. Still, one cannot help but wonder why Dr. Pepperberg never seemed to last more than a few years at a given institution. Perhaps it was her uninterest in interacting with her colleagues, which seemed to indicate that she had nothing of value to learn from such an interaction. Perhaps when the novelty of her research wore off at a given institution, the realization finally dawned that her career's work has added little of value to our body of scientific knowledge.

Sample size is a critical issue in science. Data from one individual (or even three individuals) are marginally useful at best because there is so much variation among individuals. What if an alien came to earth and collected data about the linguistic abilities of humans based on conversations with only Franklin Roosevelt? What if its data were based on conversations with a high school drop out with an IQ of 60? (or President Bush? Sorry, couldn't help that one.) Its conclusions would be quite different in the two cases.

Another problem with teaching animals English in order to draw conclusions about how their brain works is that we are testing them in a context that has no evolutionary relevance for their species. Ecologists and evolutionary biologists were interested in Alex's data, given more study about African grey parrots social structure in their natural habitat - because "language" is all about communication with others of one's same species. Unfortunately, although Dr. Pepperberg gave lip service to exploring such research directions in the future when that is what her colleagues wanted to hear, she never seemed much interested in actually pursuing that avenue (and a glance through her long reference list does not indicate any publications devoted to the parrots in the wild).

So people can argue forever about Alex's true abilities, parsing his every word and knowing glance, but what biologist can really muster any interest in the conclusion of that discussion? It will tell us nothing about Comparative Psychology, because all we have is an inadequate sample size about an animal living in a cage in a lab, interacting mostly with humans. But perhaps the Language Scientists love all the hullabaloo for inspiring passionate discussions about whether and how any species other than humans can use language. Perhaps training Alex had some purpose after all, but it was not science.

Labels: , ,


Monday, July 16, 2007

Is the triple screen for Down Syndrome right for you?

What is a patient to do? The available medical treatments for every condition or disease under the sun appears to be growing exponentially for those of us fortunate enough to live in a prosperous country such as the United States. Paradoxically, one of the causes for our skyrocketing health insurance costs is the continuing appearance of new procedures, tests, and drugs which are supposed to improve our health, but have the side effect of squeezing many Americans out of health insurance altogether.

Those of us who have the dilemma of choice about our health care should indeed be grateful, but the security of health insurance can be quickly overrun with confusion and stress about health care decisions, which are increasingly made by patients rather than doctors. In this age of ever new technologies and drugs, a primary care physician cannot hope to be up to speed on all the remedies available for all diseases and conditions, even those which do not require a specialist. Even doctors that try to keep up must sort through a medical literature that is sloppy and misrepresented.

An example of this is in prenatal care. An OB/GYN provides a dizzying array of tests and procedures for those concerned about the health of their unborn children, while often not having the time to fully understand the implications for your decision-making of the tests that they offer. The blood test known as the "triple test" or "triple screen," seems now to be a prenatal standard. The test detects levels of three maternal blood components whose quantities vary in a way that correlates somewhat with the condition known as Down Syndrome (DS), in which a person has an extra copy of chromosome 21 (which causes many varied mental and physical impairments). The older the mother, the more likely that DS will occur (although a small proportion of cases do come from the father). For example, the risk of DS is roughly 1 in 365 for a 35-year-old mother, versus 1 in 120 for a 40-year-old mother (reports of these proportions vary depending on the data source).

In the best case scenario, the triple test will be offered with the doctor making clear that its variability results in both false positive results (5-10%), in which the test will indicate Down Syndrome when the baby is normal, and false negative results (10-15%), in which the test will indicate a normal baby when it actually does have DS. Unfortunately, most people - especially average patients, and often even the doctors themselves - do not know how to weigh these probabilities, and therefore do not truly understand what the test is telling them. When a patient has the triple screen done simply as a matter of routine, or because the doctor suggested it, but has not yet decided whether or not to undergo amniocentesis (the only definitive prenatal screen for several birth defects including DS), or whether or not to continue a pregnancy with an affected fetus, a false positive result can add a lot of stress to the already stressful condition of pregnancy itself.

Fortunately, one can calculate the various probabilities implied by the different possible test results using a probability tree, and make a truly informed decision about whether to have the test done or not. In the figure below are example calculations for both a 35-year-old and 40-year-old mother. Note that the numbers are probabilities, rather than percentages (.9975 = 99.75%). (Click to enlarge in a new window.)



Because the false positive rate is so high for this test (I used the conservative end of the scale for these calculations), every woman doing this test should decide before getting the results back what she will do if they are positive. If she is unwilling to do amniocentesis under any circumstances, she should not get this test because it could create undue stress if returning positive. If she is willing to do amniocentesis (bearing in mind that the procedure carries its own probabilities of miscarriage, from .5%-2%), she should decide if a negative result would be satisfying enough to her to then choose not to do amniocentesis. If certain knowledge is paramount above all, women would be wise not to waste time and money with the blood test, and simply go ahead with amniocentesis.

It is tempting of course to do the test to reassure oneself that everything is normal, but a negative result does not completely rule out Down Syndrome, although it greatly reduces the probability. Another caveat is to note that all of these probabilities are estimates. False negative and positive rates vary with exact stage of the pregnancy, and different data sources give slightly different probabilities for DS based on age.

In a perfect world, every obstetrician would have a chart in his or her office with the probabilities of a Down baby based on age and the positive or negative results of the triple screen. Based on my own anecdotal evidence, they do not, and are thus unable to fully advise patients on the implications of this test.

It is difficult to navigate the world of medical decisions these days, but a little research on potential tests and procedures is worth the time. Do not expect your doctor to anticipate all the implications of your test results, with any test. A doctor's failure to do so, however, does not make him or her a bad doctor, just a busy person like you who does not have time to grasp the nuances of every medical treatment or test available.

Labels: , ,


Monday, June 25, 2007

Much ado about IQ

Moms, you don't need to send your second sons to remedial school just yet.

Once again, Science Magazine is a tool of publicity hounds who trumpet their incredible breakthroughs in press releases, knowing that almost nobody will read - let alone understand - the fine print and discover that their claims have all the foundation of a sand castle.

Petter Kristensen and Tor Bjerkedal (2007. Explaining the relation between birth order and intelligence. Science 316:1717) managed to squeak a one-pager in this week that gives even those of us trained to read scientific papers a run for their money. However, if one delves into the online supplements to the article, and an accompanying paper also newly available(1) it becomes clear that the results are biologically meaningless, despite the media's ever present desire to milk them for controversy. This is why:

First and foremost, this is yet another paper that uses an enormous sample size (over 250,000) to support its conclusions. Those who do not fully grasp the nature of statistics often believe that this makes results more credible, when in fact, it makes them less so. This is because in huge sample sizes, any tiny difference between groups is amplified to the point where it is likely to be statistically significant. However, statistical significance does not necessarily imply biological significance, especially when large sample sizes are required to identify a statistical effect. The authors of these papers themselves give the best demonstration of the fallacy of this technique when they cite a previous Science paper(2) that reached exactly the opposite conclusion using 400,000 Dutch men. If they repeat the study tomorrow using a different huge population, they could easily find the results reversed again.

If birth order has no correlation with IQ at all, then in randomly selected Norwegian family, there is a 50% chance that the first son has a higher score than the second son. My statistician colleague did some rough calculations based on the data in the Science paper, and concluded that it shows that in a randomly selected family, there is a 52% chance the first son scores higher than the second son. The statistical standard for biological significance in most contexts is 95%. For a first son to have a 95% chance of scoring higher than a second son, the average difference in IQ points would have to be 80. (As far as I know, there is no data showing that second sons are more likely to be retarded.)

In addition, the Norwegian soldiers did not take a standard IQ test, which has a population mean of 100. They took a test given by the Norwegian military which gives single-digit scores of 1-9, with a mean of 5. The scores are therefore much less precise than those from an IQ test, and a difference of 3 IQ points after the data transformation corresponds to a difference of 0.25 point on the Norwegian test. If two brothers both have a score of 6, which one is a quarter point smarter?

The media love these stories and will continue to broadcast them, analyze them, and give people complexes about them forever. But it is disheartening that Science, an internationally respected journal that scientists throw elbows to get published in because of the instant fame it brings, is in it purely for the publicity as well. People should no longer assume that a paper in Science must be actual science.


(1)Bjerkedal T., Kristensen P., Skjeret G.A., and Brevik J.I., 2007. Intelligence test scores and birth order among young Norwegian men (conscripts) analyzed within and between families. Intelligence doi:10.1016/j.intell.2007.01.004 (in press)

(2)Belmont, L., and Marolla, F.A., 1973. Birth order, family size and intelligence. Science 182:1096-1101.

Labels: , ,


Friday, June 15, 2007

The misplaced Avandia hysteria

Lawyers are already circling like sharks over the recent concern about the drug rosiglitazone (brand name Avandia, produced by Glaxo-SmithKine) for treatment of type 2 diabetes was precipitated by an early press release of a paper (Nissen and Wolski, 2007. Effect of rosiglitazone on the risk of myocardial infarction and death from cardiovascular causes. New England Journal of Medicine 356:2457-2471) over three weeks before its actual publication date, which was yesterday (June 14). This means yesterday was the first day any of us except the reviewers of the paper and editors of the New England Journal of Medicine has had a chance to evaluate the actual data, rather than statements made by the authors and editors. Congressional hearings, at which a doctor said he was pressured by Glaxo-SmithKline not to reveal his concern about Avandia's safety, were underway well before this publication date.

Meta-analyses are studies that statistically analyze the results of all previous trials together to see if there is an overall trend that was not picked up in each individual trial. What the authors of this meta-analysis are saying is that based on 42 different trials that were done at various times, for different reasons, people taking Avandia are more likely to have heart attacks than people not taking it. There are several statistical problems with meta-analyses, but the authors did the best they could, using a test that should provide reasonably accurate results with the data they had, in which heart attacks were rare.

To put it that rarity in perspective, their data overall showed that for a minimum 24-week period, 86 out of 15,565 diabetes patients taking Avandia had heart attacks, versus 72 out of 12,282 not taking the drug (this was a hodge-podge pool consisting of placebo groups, and groups taking various other diabetes drugs). This translates to 0.55% (5.5 out of a thousand) of those on Avandia having heart attacks, and 0.59% of those not on Avandia having heart attacks -- a slightly higher percentage. Of course, in a proper meta-analysis one cannot simply pool all the numbers and re-analyze them together. The statistical procedure the authors used did reveal a higher risk of heart attack in the group taking Avandia, but the difference remains slight.

One fallacy people commit in statistical analyses is to believe that the larger the data set, the more meaningful the result is. But in general, the opposite tends to be true. This is because the only way the authors could find a (barely) statistical difference in the two groups was to have a sample size of nearly 28,000. If you need a sample size that big to find a real difference between the two groups, the difference is not nearly as serious as if you had found it in a sample of a few hundred people. One gets the distinct impression that this paper was a fishing expedition, an impression that is not undermined by the disclosure that the first author receives research support from the following seven drug companies that are not Glaxo-SmithKline: Pfizer, AstraZeneca, Daiichi Sankyo, Roche, Takeda, Sanofi-Aventis, and Eli Lilly.

What makes the potential hysteria surrounding this drug especially egregious is the unscrupulous press release and author interviews three weeks ago that did not reveal the unbalanced sample sizes of the two groups:
In their analysis, Nissen and Wolski pooled data from 42 studies that have been conducted on the drug. Among more than 27,843 patients in the relatively brief studies, 86 of those taking Avandia had a heart attack, compared with 72 of those not taking the drug.

The implication of course is that overall, more people taking Avandia had heart attacks when that is absolutely false.

But even assuming the results are completely valid, there is still the huge problem of the lumping of people with all sorts of medical history in these groups. The authors state themselves in the discussion:
The FDA-approved rosiglitazone product label reports a mean increase in low-density lipoprotein (LDL) cholesterol of 18.6% among patients treated for 26 weeks with an 8-mg daily dose, as compared with placebo.

This of course means that if you have a problem with high cholesterol, this drug is not for you, but your doctor knows this already. But what were the cholesterol levels and heart conditions for those having heart attacks in the meta-analysis? It is completely unknown.

Admittedly, the authors did not have access to most of the original data sets, many of which came from Glaxo-SmithKline clinical trials (though one has to wonder if they tried asking the company for the data), which means they cannot present a breakdown of heart attack victims by a priori risk. But presenting these results without that data is at best irresponsible. It has become very much clearer over time that individuals vary and a particular drug that works wonders for one person might be fatal for another. The implication of the congressional hearings and media coverage of this one paper with its weak result is that we are going to make the decision to keep that drug from those that it will benefit, rather than take the trouble to learn enough to figure out which few people will be harmed by it, and keep them off of it.

Labels: , ,