In September 2018, Dr. Rachel Urrutia along with five colleagues in the field of women’s reproductive health published a systematic review of studies looking at effectiveness of Fertility Awareness-Based Methods (“FABMs”) for preventing pregnancy. Read more about this landmark study on the Reply blog here. Since the study was published, some online responses have raised questions that are addressed in the letter below by Dr. Urrutia and her co-authors.
July 22, 2019
Since the publication of our systematic review in August 2018, several internet critiques have been published (FACTS and Natural Womanhood blogs) which may misrepresent the specifics of our methodology and our findings. Our goal in conducting and publishing our review was to advance the science of fertility awareness-based methods. We wanted to acknowledge and summarize the body of work that had been previously conducted, as well as facilitate better understanding of the science, both within and beyond the scientific community. As such, we would like to respond to some of the critiques outlined by the above Internet publications.
In preparation for our work, we read numerous key articles and reviews of FABM effectiveness. With respect to the review by Manhart et al. (the FACTS group), there were multiple differences in how we designed our review as compared to the Manhart review. First, the Manhart et al. review used one database and limited the years of the search as well as the language of the articles to English. Our goal was to try to be as comprehensive as possible, so we evaluated studies from 5 databases, and in 4 different languages. We looked at studies as far back as the inception of the databases and were also able to evaluate several studies published subsequent to the Manhart et al. review. We also reviewed the reference list of key articles to identify additional studies. This resulted in 53 included studies (versus 29 in the Manhart review).
A second difference was in the overall design of our quality framework. Though we shared some quality indicators (or partial quality indicators) in common with the review of Manhart et al., our framework was different in several, key ways. The Manhart et al. review ranked studies according to a SORT taxonomy and then summarized the findings from only the 10 studies that met “SORT criteria of evidence Level 1 (score of 40 or more out of 56 points).” By contrast, we developed our quality ranking tool on a framework from the U.S. Preventive Services Task Force in which studies needed to adhere to a level of quality in order to be ranked high, moderate, or low quality. This means that if no studies adhered to the quality framework, it would be possible for there to be no studies ranked high quality. This is in fact what happened. None of the studies that we subsequently evaluated were ranked “high quality,” and 21 of 53 were ranked “moderate.” We are pleased that our quality ranking could be considered “extremely rigorous,” and we spent a great deal of time creating a ranking system that we thought was relevant and attainable. Though studies had to meet multiple criteria to be considered high quality, we did not expect that the studies would be perfect. We chose to exclude several additional criteria which we thought might make a high ranking unattainable. For example, we felt studies should measure emergency contraceptive use and take account of this in the analysis. However, since emergency contraception is a relatively new phenomenon, we did not want to hold studies to this standard. Also, it’s important to note that our review recommends that clinicians advising patients interested in FABMs share the effectiveness estimates identified in the “moderate quality” studies in this review, with appropriate cautions.
Third, we described the effectiveness estimates from all of the “moderate” or higher quality studies. This allowed, for most methods, opportunity to present a range of effectiveness estimates, and not just focus on one effectiveness estimate for each method from one high quality study. This is important because typical use pregnancy rates vary extensively by population differences (e.g. age, motivation, education, etc.). Therefore, presenting only one typical use estimate from each method could be misleading. It is more useful for women and couples to understand that there is a range of typical use effectiveness that can vary between populations and individuals. As an extension of the initial review, we are currently working on an analysis to investigate the impact of population characteristics on pregnancy rates in more detail.
Regarding statements made about how our study will be interpreted by others: our brief overview statement in the systematic review—that “Prospective studies evaluating the effectiveness of specific fertility awareness-based methods to avoid pregnancy are of low to moderate quality; effectiveness estimates vary between and among methods”—was required by the editors of our paper. This is an accurate statement. The purpose of our article was not to convince more people to use FABMS. Our goal was to provide the most transparent information about effectiveness possible to support people in making their own informed decisions. We hope that our review will give more people the information needed to decide whether or not to use an FABM, and which one would be most suitable for them. In the meantime, we believe more data is needed in diverse populations before we can fully understand effectiveness, particularly in comparisons of effectiveness between FABMs and other contraceptive methods. Because of this, we are concerned about the critique appearing in the online FACTS commentary stating, “After all, about 10 ‘moderate-quality’ studies consistently demonstrate the potential for effectiveness rates rivaling hormonal contraceptives with none of the side effects and at significantly lower costs to users and insurers.” The comparison to effectiveness rates for hormonal contraceptives is not accurate or appropriate, nor is it one we made in our review. The most recent pregnancy estimates per 100 woman for one year for hormonal methods (implants, hormonal IUDs, DMPA, ring, patch, and pills) range from 0.1 to 7 in typical use, and from 0.1 to 0.3 in perfect use—based on a large body of evidence for each. More importantly, these estimates triangulate with those from retrospective population-based studies like the National Survey of Family Growth. These studies have limitations but do provide “real world experience” estimates that may be more generalizable. This type of triangulation to population-based data is currently impossible for specific FABMs because the number of users for each method is so low. However, most FABMs (setting aside sympto-thermal and Marquette) in the moderate quality studies in our review ranged from 9 to 33.6 in typical use, and from 1.1-12.1 in perfect use. The estimates from Marquette and Sensiplan are lower, and our review clearly states that these methods may be more promising, but they need to be triangulated with additional data before definitive statements can be made. Therefore, we believe that comparisons between FABMs and other methods at this stage must be done with appropriate caution and nuance. The cost argument also is incomplete and inappropriate–some FABMs are quite costly and, under ACA, many contraceptive methods are free. To make an evidence-based statement on cost, a cost-benefit analysis would need to be done, which was outside the scope of our review.
Regarding the specific criteria of excluding cycles with no intercourse, leading experts in the field have recommended this quality criterion for nearly 3 decades. Trussell and Grummer-Strawn stated in their 1990 publication, “Women simply are not exposed to the risk of contraceptive failure unless they have intercourse, so cycles characterized by no exposure to the risk of pregnancy should be removed.” Not adhering to this criterion can lead to “immortal time bias.” If a criterion like this is not used, a contraceptive method with the exact same effectiveness would appear to be far more effective in a sample of women or couples who have sex 6 times a year than it would in a sample of women or couples who have sex 6 times a month due to the fact that the baseline chance of pregnancy in each group is very different. Therefore, using this criterion helps to establish a more accurate baseline when comparing different studies. The authors in this landmark publication advocate for using this approach of removing cycles with no intercourse from the analysis for all methods of contraception or family planning, and not just FABMs. Systematically reviewing whether studies of other methods excluded all cycles where intercourse does not occur would be another enormous project that was not part of the scope of our review. If we were tasked with ranking quality of studies for another method, we would certainly hold them to the same standard as we held FABMs to in this study. We are aware of several studies of other contraceptive methods where this process has been followed. More importantly, the FDA has recently required investigators to exclude cycles without intercourse for other types of family planning methods. For FABMs, particularly in the context in which we can’t triangulate estimates with generalizable data such as that from the National Survey of Family Growth, we maintain that studies should have excluded cycles with no intercourse in order to produce estimates that could be as comparable as possible with those from other studies that have done the same. We acknowledge that many users of an FABM might abstain from intercourse during a subset of days during each menstrual cycle (in which case their information would not be excluded from calculations) and, in a small proportion of cases, may abstain from intercourse for an entire cycle or a few consecutive cycles at the start of using the method. We look forward to continued academic discussion about this unique feature of FABMs versus other contraceptive methods. However, for the purposes of estimating effectiveness, we feel that it is important to follow the methodology recommended by the FDA and contraceptive effectiveness experts for the reasons described above.
Regarding the ranking of specific studies in our review, at least one critique stated that only one study met the criterion of excluding cycles with no intercourse (WHO Trussell reanalysis). However, several other studies did meet that specific part of the quality criterion, including Arevalo 2002, Arevalo 2004, Bonnar 1999, as well as the Trussell reanalysis of WHO 1981. Furthermore, giving credence to the idea that this criterion was important, a few other studies (which didn’t meet our criteria) gave a nod to this concern: Johnston 1979 did a risk-based adjustment for cycles of low risk, Bartzen 1967 excluded cycles of women whose husbands were away with the military, Doring 1967 implies (but did not explicitly state) that cycles without sexual intercourse were excluded. Finally, several studies took care to exclude people who identified themselves as not sexually active, but never stated clearly if this applied on a cycle-by-cycle basis, while still indicating that this is a criterion that should be considered. Perhaps most importantly, failing to meet this criterion did not “disqualify” ANY remaining studies from consideration as high quality. All of the studies ranked moderate quality failed to meet the “high” quality bar due to at least 2 quality criteria and, in most cases, more than 2 criteria. This is an important point that seems to have been misrepresented by the FACTS blog and in further coverage of the FACTS blog by Natural Womanhood.
We recognize that there can be legitimate differences of opinion about interpreting the body of evidence regarding the effectiveness of FABMs, and we welcome continued scientific discourse. At the same time, we desire for our work and positions be represented accurately, which was the impetus for this letter.
Rachel Peragallo Urrutia, MD, MS
Chelsea Polis, PhD
Elizabeth Jensen, PhD
Margaret Greene, PhD
Emily Kennedy, MA
Joseph Stanford, MD, MSPH