Communicating uncertainties in GDP data

In recent times there has been a lot of controversy around the accuracy and reliability of Indian GDP estimates. In this article, Amey Sapre and Rajeswari Sengupta contend that much of this confusion stems from the lack of publicly available information about the sources and margins of errors and data quality. Communicating the limitations of official data to the public in a credible and transparent manner may mitigate the extent of uncertainty around GDP estimates.

"Perhaps the greatest step forward that can be taken, even at short notice, is to insist that economic statistics be only published together with an estimate of their error. Even if only roughly estimated, this would produce a wholesome effect. Makers and users of economic statistics must both refrain from making claims and demands that cannot be supported scientifically. The publication of error estimates would have a profound influence on the whole situation." -- Oskar Morgenstern, 1962.

Good quality data on gross domestic product (GDP) is a crucial input for effective policymaking. In India, while policy outcomes are debated extensively, evaluation of the quality of official GDP data that goes into making such policies often gets ignored. The popular discourse in the aftermath of the release of GDP estimates by the Central Statistics Office (CSO) focusses on growth rates or sector specific performance. However, the singular focus on growth rates can be troublesome as it overshadows the importance of assessing the quality of the underlying data.

GDP data are subject to multiple revisions over time before any estimate is considered final. However, periodic revisions in data are not errors. Revisions are based on a systematic process of updating the estimates as fresh data become available. Errors, on the other hand, are inherent in GDP estimates due to a variety of problems in data collection through administrative setups and surveys. Errors may also arise due to imprecise measurement, approximations, incorrect reporting, or outdated sampling frames. These can potentially introduce uncertainty in the GDP estimates at each stage of the data revision cycle.

In this article, we take a closer look at the possible sources of uncertainties in GDP data that often get ignored in our understanding of the estimates. Since GDP data are used as inputs in policymaking, understanding uncertainties from the perspectives of both the data user and the statistical agency is of critical importance.

Understanding uncertainties in GDP data

Concerns about errors and uncertainties in national income statistics are not new. Morgenstern's (1962) seminal work on the accuracy of national income estimates remains a vital source to understand the limitations of national income data. Morgenstern mentions three kinds of errors.

Errors in the basic data (production or expenditure) arising from sampling difficulties or from mass enumerations. These are the usual statistical sampling errors or difficulties in taking a proper count.
Errors that result from the process of fitting the available statistics into the aggregate conceptual framework. Primary data is often not in a form that is directly usable for estimating GDP or its components. Adjustments need to be made to such data which may hamper the quality of the estimates that are finally compiled.
Errors that get introduced as a result of filling the gaps for sectors and years where data are either not available or only partially available. Missing data are either filled using methods such as interpolation, extrapolation, imputed weights, etc., or are blown up by some statistical technique.

In recent times Manski (2015) has distinguished errors in measurement from uncertainty. He summarises three types of uncertainties.

Transitory statistical uncertainty: This kind of uncertainty arises because of time lag in data collection. Since data collection is time consuming, statistical agencies first release preliminary estimates and later update them as new data become available. A typical revision cycle of GDP estimates in India would fall in this category. The uncertainty in this case originates from the fact that the reason, direction, and magnitude of revisions may be unknown or unpredictable, and depend on the data that gradually become available. Ideally, as more data become available, the uncertainty diminishes.
Permanent statistical uncertainty: This kind of data uncertainty arises due to incompleteness and inadequacy of the data collected and it does not diminish with time. It may originate because of finite size of samples, or non-response in surveys, or respondents giving inaccurate data. In case of Indian GDP estimates, all data collected through surveys such as the employment survey, the annual survey of industries etc. are prone to this kind of uncertainty.
Conceptual uncertainty: This kind of uncertainty arises from a lack of understanding of the information that official statistics provide about economic concepts. Examples would include conceptual problems in definitions, such as value addition, final expenditure, capital formation, or in methods such as commodity flow, etc.

The communication of uncertainties in data

In the Indian context, a review of the history of National Accounts reveals that the First National Income Committee (Government of India, 1954) showed the way by documenting the sources and margins of errors. They provided details of errors that can be quantified and the limitations of data in constructing statistically valid margin of errors.

The importance of identifying the sources of error in national income statistics and communicating their influence on the final estimates was highlighted by Datta Roy Choudhury (1995). In her book on National Income Accounting, she listed three main factors that lead to errors in national income estimates. These have overlaps with the kinds of errors and uncertainties identified both by Morgenstern (1962) and Manski (2015).

Errors arising from conceptual limitations and efforts made to fit available statistics to the aggregate conceptual framework. These errors are a combination of Manski's conceptual uncertainty and Morgenstern's second kind of error.
Errors arising from coverage and quality of basic data used for estimation. This error is similar to Manski's permanent statistical uncertainty, and Morgenstern's statistical sampling and enumeration related error.
Errors resulting from adjustments in estimates when no current information is available: This is Morgenstern's third kind of error.

Over the years, the importance of identifying the sources of errors and communicating their influence on the final estimates has not received the required attention.

Communicating errors and resultant uncertainty in data requires efforts from the statistical agency on two fronts: (i) developing metrics to assess the accuracy and reliability of the data, and (ii) adopting a systematic communication policy to disseminate information about the sources and magnitudes of errors. Developing metrics to assess data quality is a long drawn process. It requires historic data on revisions and errors. While error magnitudes tell us the extent to which aggregates might be under- or overestimates, data on revisions tell us how the numbers have changed at each stage of data availability.

Countries such as the US, UK, and Denmark have made substantial progress in the area of developing revision metrics. In India, there is limited scope in developing revision metrics on account of unavailability of vintage data. Sapre and Sengupta (2017) provide for a general survey of revision metrics and approaches in the Indian context.

Conclusion

Uncertainty in official GDP data can have unintended consequences for macroeconomic policy decisions and create limitations for the data users. In the absence of any information about potential errors, users may incorrectly assume that the errors are small or negligible and instead take the estimates at face value. If there is lack of clarity on how the current estimates are likely to change in the future and on the magnitudes of possible errors, analysis of the economy's health will be ambiguous.

An important step towards building awareness about data quality is providing information on the magnitude of errors and the extent of uncertainties inherent in the official GDP estimates. In recent times there has been a lot of controversy in the popular discourse about the accuracy and reliability of the Indian GDP estimates. Much of this confusion stems from the lack of publicly available information about the sources and margins of errors and data quality. If past communication practices were revived, they might provide the users with a better understanding of national income statistics and inform them about the limitations in official data in a credible and transparent manner. This would help to mitigate the extent of uncertainty around the GDP estimates.