Forecasting long-run Indian GDP using high-dimensional big data

  • Blog Post Date 06 September, 2023
  • Articles
  • Print Page
Author Image

Debajit Jha

O. P. JIndal Global University

Author Image

Naveen Kumar

Delhi School of Economics

Author Image

Dibyendu Maiti

Delhi School of Economics

This article describes an attempt to forecast of India's long-run GDP, which uses using quarterly data on macroeconomic variables from 1996-2021, and a dynamic factor model to establish long-run trends. It explains how the model controls for exogenous shocks, including rising temperatures and oil prices, as well as changes in monetary and fiscal policies. It suggests that growth will depend either on the implementation of strategies to deal with these exogenous shocks, or on public investment and public service delivery. 

While most scholars and planners have been optimistic about the growth acceleration of the Indian economy, especially after the global financial crisis and Covid-19 pandemic shocks, there is reason to be cautious about supporting this. Before the pandemic, the economy was already crippled by a real sector crisis: a significant surge of non-performing assets in the financial sector, rising petroleum prices, and a huge unemployment problem. The lower repo rate and higher budget deficits have further squeezed the capacity to exercise expansionary fiscal and monetary policies to deal with these issues. Several fiscal and reform measures, such as privatisation, disinvestment strategies, and institutional, tax and labour reforms to boost industrial and economic activities have been exercised to counter these situations, but have not had much benefit. On top of this, the pandemic exposed to some degree, the severity of the impact that climate change can bring to human society.

Soon after the withdrawal of lockdown restrictions, scholars and policymakers, while observing Azadi-ka-Amritmahotsav (the 75th anniversary of Indian independence), attempted to forecast the recovery path for the relative long run. In our own attempt, (Maiti et al. 2023), we use big data to argue that the economy cannot grow more than 5% in a business-as-usual scenario. How much the rate can be accelerated would depend on factors which are exogenous to the economy.

Methodological complexity in long-run forecasting

Those who are familiar with long-run forecasting of macroeconomic variables may agree that forecasting long-run growth is a challenging task due to at least two reasons – (i) the unavailability of a robust methodology (or methodological complexities) and (ii) a clear strategy to deal with unobserved exogenous shocks and policy variables. In the literature, the most popular methods applied for short and long-run forecasting are the models that include the auto-regressive and moving average process (ARIMA), vector auto-regression (VAR), vector error-correction model (VECM) etc. These methods rely on a limited number of variables to keep sufficient degrees of freedom for satisfactory inferences. However, limited variables contain little information, and may miss important information that plays a dominant role in the forecasting: the dynamics of macroeconomic variables like GDP depend on various known and unknown variables. Moreover, most of these variables are highly correlated. If they are ignored, the information will be lost, and the forecasting may not be a meaningful exercise. The traditional econometric methods ignore many of them and thereby miss a large information set. Hence, there is a growing interest in using high-dimensional big data to capture as much information as possible for forecasting the macro variables.

However, using high-dimensional big data imposes another challenge: they may reduce the precision power of econometric models. The obvious solution is to apply factor models that reduce the dimensions of high-dimensional data by extracting them into a few common factors that help explain the business cycles to a greater extent. Such methods have been the growing practice for forecasting macro variables. The larger the data set, the more information is necessary for better forecasting results. As a result, factor-based time series modelling has gained popularity in economics and finance literature during the last fifteen years because of its ability to model datasets of large dimensions and lengths (see Banerjee et al. 2015, Kurz-Kim 2016).

The idea behind dynamic factor models is that a few latent factors can accommodate a lot of information across time series. Factor models effectively synthesise large sets of data, allowing the application of advanced models to big data. More importantly, the dynamics that exist in time-series data must be exploited. The most advanced methods that deal with them are the Factor-Augmented VAR (FAVAR) and Factor Augmented Time-Varying Coefficient Regression Model (FA-TVCRM). Following this tradition, we attempted to forecast the Indian GDP for a relatively extended period using the Factor-augmented Error Correction Model (FECM). It was the first attempt at using this method for the Indian economy, we used quarterly data from Q1 of FY1996-97 to Q3 of FY2021-22 for forecasting. This model first extracts the information from big data using the dynamic factor model and then establishes a long-run relationship using the error-correction model for forecasting (Banerjee and Marcellino 2009, Banerjee et al. 2014). It then runs the dynamic factor model to reduce 56 macro variables into three factors that capture more than 80% variation. Then, these factors are included in the error correction model to establish a long-run relationship. The co-integration results show that there exists a statistically significant, long-run relationship.

Second, while an exercise for the short-run forecasting of macro variables may ignore the exogenous and policy shocks, an exercise for long-run forecasting cannot do so. Ignoring these shocks may produce misleading results. Hence the long-run prediction included at least a few known exogenous or policy shocks to infer their impacts. For example, rising temperatures may accelerate the extreme climatic events in the economy, and hence can be expected to limit income opportunities to a certain extent. The strategies to deal with this issue must have implications for the growth momentum. Similarly, currently, crude oil prices have started to rise, generating inflationary pressure in the economy. Already, a number of international agencies and gasoline inventories have forecasted a sharp rise in oil prices in the next couple of decades, and this is expected to shoot up even further as a result of geopolitical circumstances (such as the Russia-Ukraine War and the Paris Agreement towards commitment to curtail fossil fuel consumption). Moreover, alternatives to petroleum oil can be costly.

In parallel, the Indian government is engaged in undertaking a series of institutional reforms through digitisation, which may improve the efficiency of fiscal capacity and instruments. Moreover, the Reserve Bank of India (RBI) is encouraging more digital transactions and has decided to launch a digital currency to deal with corrupt practices and improve the transmission mechanism of monetary policy instruments – this may reduce the need for cash holding and serve as an expansionary instrument. To account for this, we added four exogenous variables to the FECM model to capture temperature change, oil price shock, and changes in monetary (measured by cash reserve ratio (CRR))1 and fiscal policies (measured by public expenditure on the completed projects as a share of GDP)2 that are expected to be influenced by the policymakers.

After several permutations and combinations, the automatic ARIMA has been applied to forecast these exogenous variables. The long-run relationship found by the FECM model has been used with forecasted exogenous variables to conduct a forecasting exercise of the long-run GDP of the Indian economy from 2022 to 2035. This model produces a negligible error in the in-sample forecasting and satisfactorily passed all the criteria for estimating a robust forecasting trend.

Forecasting results for 2022 to 2035

The model results show that the coefficient of temperature, oil price and cash reserve ratio are negative and statistically significant, while the coefficient of investment in the completed project (the proxy of public investment efficiency) is statistically significant and positive. This suggests that the temperature rise and the crude oil price adversely affect economic growth. The dampening effect of oil prices was found to be relatively stronger than other factors. This may be because an increase in oil prices is expected to contribute to inflationary pressure by raising production costs and increasing import costs, which will shrink the fiscal space for the government. Similarly, a negative coefficient of CRR suggests that a drop in CRR that serves as an expansionary monetary policy seems to boost economic growth. On the other hand, the positive coefficient of investment in completed projects suggests that an improvement in the effective delivery of government projects boosts economic growth, since government spending on infrastructure and various other development schemes must have produced a multiplier effect. 

The results of out-sample GDP forecasting (that is, using projected variables for years in which actual information is unavailable) show that the economy would grow at 4-5% in the next decade. The growth rate produced by the FECM model seems to be lower than that of the automatic ARIMA model because the FECM has captured the dynamics of factors and exogenous variables. The rise of temperature and oil prices is expected to dampen the favourable effect of expansionary fiscal and monetary policies expected from prescribed institutional reforms, e-governance and expansionary digital transaction. If the government found strategies to deal with rising oil prices and temperature and keep them at the current level,  economic growth could accelerate, with the growth rate going closer to 8%. If not, growth acceleration would heavily depend on the efficiency rise of public investment and public service delivery.


  1. The RBI has already declared its intention to introduce the digital currency soon (Bhowmik, 2021), and steps are being taken to move gradually towards digital transactions. Undoubtedly, the introduction of digital currency would further reduce the demand for holding cash. This change may be reflected in the requirement for the CRR, which would serve as an expansionary strategy. In May 2016, India implemented flexible inflation targeting – the aim was to reduce cash transactions, which could smoothen the transmission mechanism of monetary policies and quicken the process. The introduction of digital currency seems to be a step ahead in this direction, and hence, CRR has been one of the exogenous variables.
  2. While increasing government expenditure plays a multiplier effect in the economy, the actual impact is limited by the effective delivery of projects and money disbursement under government schemes. The government has been planning to govern all activities through e-governance under the Digital India programme to reduce institutional inefficiencies. The public expenditure on the completed project out of GDP captures the efficacy of fiscal policy due to digitisation and institutional reforms. As a result, the spending on completed projects has been considered to be an institutional variable serving as an expansionary fiscal policy. If the fiscal governance or institutional efficacy improves, one can argue that the share of completed projects would accelerate.

Further Reading

  • Banerjee, A and M Marcellino (2009), ‘Factor-augmented error correction models’, in Castle and Shephard, The Methodology and Practice of Econometrics: A Festschrift in Honour of David F. Hendry.
  • Banerjee, Anindya, Massimiliano Marcellino and Igor Masten (2014), “Forecasting with factor-augmented error correction models”, International Journal of Forecasting, 30(3): 589-612.
  • Banerjee, A, M Marcellino and I Masten (2015), ‘An Overview of the Factor-augmented Error-Correction Model’, University of Birmingham Department of Economics Discussion Paper 15-03.
  • Bhowmik, Debesh (2021), “Monetary Policy Implications of Central Bank Digital Currency with Special Reference to India”, Asia-Pacific Journal of Management and Technology (AJMT), 2(3): 1-8.
  • Kurz-Kim, J-R (2016), ‘Macroeconomic Now- and Forecasting Based on the Factor Error Correction Model Using Targeted Mixed Frequency Indicators’, Bundesbank Discussion Paper No. 47/2016. Available here.
  • Maiti, Dibyendu, Naveen Kumar, Debajit Jha and Soumyadipta Sarkar (2023), “Post-COVID Recovery and Long-run forecasting of Indian GDP with Factor-augmented Error Correction Model (FECM)”, Computational Economics.
No comments yet
Join the conversation
Captcha Captcha Reload

Comments will be held for moderation. Your contact information will not be made public.

Related content

Sign up to our newsletter