Wikileaks War Data Reveal Underreporting of Iraqi Civilian Casualties
The recent release of almost 400,000 secret US military files on the war in Iraq through Wikileaks has attracted wide media coverage.
These documents, officially known as the significant acts database (SIGACTS), add new insights to the ongoing debate on how many casualties have occurred in Iraq since the beginning of the war. The unofficial Iraq Body Count (IBC), which tracks civilian casualties in Iraq based on press reports and administrative records, has initiated a comparison of their own data to the deaths documented in the SIGACTS data. In a commendable effort, they are recoding the SIGACTS data to correct coding errors and in order to match it with their own database. They have estimated that the SIGACTS describes 15,000 civilian deaths previously undocumented by IBC (BBC’s report is here). Most of these previously-unknown deaths occurred in small incidents, in which 1-3 people were killed. In academic articles, on blogs and at conferences about quantifying war casualties, there has been debate about whether deaths that occur in small incidents tend to be systematically underreported in IBC’s press sources. This new evidence is consistent with the hypothesis that IBC’s press data underreport small events in Iraq.
In reaction to the release of SIGACTS data, Les Roberts of Columbia University wrote that both IBC and SIGACTS are ‘systematically prone to under-report deaths’ and that his previously published estimations of total death tolls are likely to be well below the actual numbers he and his colleagues previously estimated. Roberts has cautioned the use of press sources for the purpose of counting casualties, as they fail to report a significant proportion of violent events. Jacob Shapiro of Princeton University, presents a parallel argument. He reminds readers that SIGACTS includes every death that was recorded by the Multi-National forces in Iraq, not every death that occurred during the ongoing war. SIGACTS’s reporting standards changed over time and the reporting procedure varied across units. In particular, IBC has noted that the SIGACTS data do not include any civilian casualties from 2004 operations in Fallujah.
This underreporting is not surprising. Both SIGACTS and the press and other data published by the IBC are convenience samples, i.e. they are not generated with the help of a random selection process. Both SIGACTS and IBC are well-run, careful projects, but even very good direct observations that collect information on violence systematically will tend to accumulate data that is unrepresentative of the actual conflict patterns they are attempting to uncover. Lists of data that fall into this category are thus unsuitable for drawing inferences on any population apart from the list itself. Simple lists of deaths are inadequate to characterize an entire country in the midst of a highly politicized war.
The hypothesized reasons for under-reporting in Iraq mentioned by Roberts and Shapiro are not unique to Iraq or the IBC and SIGACTS databases: changing reporting patterns across time and space due to organizational changes (as was the case with SIGACTS), better coverage of events in urban areas (as was the case for both IBC and SIGACTS in Baghdad), and varying levels of victim visibility, depending on victim and perpetrator characteristics, are factors that influence almost every database that collects information on violent events. In our experience, reporting and recording bias varies dramatically and can rarely be distinguished from the actual patterns of violence. In the case of Iraq it is therefore important to keep in mind that a) no list is (or will be) complete and b) new, independent sources of data are needed to understand the reporting biases of any single source.
A possible solution to overcome the bias of single lists is to use a statistical method known as multiple systems estimation (MSE), which can provide estimates for those cases that weren’t recorded in any list. MSE, also known as the capture-tag-recapture method, corresponds to the idea that each death has the possibility of being recorded by one, two, or more data sources. Depending on the degree of overlap of cases between the sources, the number of deaths that were not reported to any source will differ. This method has been used for estimating large-scale killings in Guatemala, Kosovo, Perú, Srebrenica, East Timor, and Colombia, among others.
For the two lists available for Iraq (SIGACTS and IBC) a simple two-system MSE model (the Lincoln-Peterson estimator) can be applied. For the period covered by SIGACTS (2004-2009), IBC reports 15,000 civilian deaths found only in the SIGACTS data, 27,000 found only recorded by IBC’s press sources, and 64, 000 recorded in both, for a total of 106,000. The two-system estimation gives us a slightly higher number of: (15k+64k)*(27k+64k)/(64k) = 112,000 civilian casualties, including an estimated 6,000 deaths recorded by neither source. This estimate assumes that the collecting patterns of IBC and SIGACTS were independent of each other, which is unlikely.
Roberts suspects that both sources cover similar cases, mostly coming from the Iraqi Government, focused on Baghdad and that both under-report small events and incidences of single killings. In statistical terms, the two sources are likely to be positively correlated, which biases the estimate downward. The true number of deaths is thus likely to be larger than 112,000 cases. In order to obtain more accurate estimates of the number of deaths, a third source would be required. Three or more sources would allow us to account for similar (or dissimilar) reporting patterns of the different sources.
The revelation of the Wikileaks war log data has altered our understanding of civilian casualties in Iraq: We now know that there were more small-event casualties than previously thought and that such casualties are underreported. Further information about the ‘new deaths’ revealed in the war logs could improve our understanding with regard to victim characteristics and patterns of violence across time and space. We welcome the important work IBC is doing to correct errors in the SIGACTS coding and to match their database with the SIGACTS data. Perhaps a third source will emerge, and from the three datasets, estimates could be made which would correct for systematic underreporting across types of events, regions, religious sect or period. Hopefully, a better understanding of how violence is reported in Iraq will help us to better correct for reporting bias in casualty figures in other conflict situations.