Home Insights Diving Deep Into Data For Policy Insight

Diving Deep Into Data For Policy Insight

Diving Deep into Data for Policy Insight

Session Report

The Generation Alpha Data Centre, at IMPRI Impact and Policy Research Institute, New Delhi conducted a Four-Day Immersive Online Certificate Training Course on ‘Data Analytics for Policy Research’ from November 4th to 25th, 2023. 

The course, spread over four-consecutive days, helped to equip policymakers, researchers, and data enthusiasts with cutting-edge analytical skills. In this course, we went beyond theory and provided hands-on training in data analytics techniques, empowering participants to derive meaningful insights from complex datasets

On the third day our first speaker, Prof Nilanjan Banik, Professor and Program Director (BA, Economics and Finance), Mahindra University, Hyderabad; Visiting Consultant, IMPRI, commenced with an exploration of the fundamental question: why engage in data analysis? The discussion delved into the significance of two-sample and one-sample analyses and emphasized the importance of choosing an appropriate outcome variable based on the research objective.

Cumulative Distribution and Density Functions:

Prof. Banik guided the audience through the concepts of cumulative distribution and density functions, utilizing Excel for practical applications. The cumulative distribution function, involving the addition of probabilities, was explained with the understanding that it is conducted over the interval from 0 to 1. Prof. Banik highlighted that the density function is obtained through the differentiation of the cumulative distribution function.

The justification for such analyses was articulated in terms of hypothesis testing, particularly when examining issues like income inequality. The discussion underscored the departure from the assumption of a normal distribution, showcasing the relevance of non-parametric tests in certain scenarios.

Normal Distribution and Income Inequality:

The lecture expanded on the normal distribution function, introducing formulas and illustrating how standard deviations relate to the percentage coverage under the curve. Prof. Banik connected these concepts to the examination of income inequality, demonstrating how the shape of the distribution curve provides insights into the disparity between income groups.

Log Transformation and Non-Parametric Tests:

To address the challenges posed by the heterogeneity of income levels, Prof. Banik advocated for the use of log transformations. This approach, he argued, not only scales down income for comparability but also preserves the distribution characteristics during transfer payments. The distinction between parametric and non-parametric tests was further elucidated through examples from the shared paper on the dynamics of income growth in India.

Jaguar Test Statistics and Comparative Policy Analysis:

The session progressed to the Jaguar Test Statistics as a tool to determine whether a dataset adheres to a normal distribution. Prof. Banik shared insights from the paper on income growth dynamics, emphasizing the application of non-parametric tests in contrast to normal distribution assumptions.

The concluding segment centered on comparing investments, using the example of Adani Power and Tata Power. Prof. Banik clarified the formula for comparing standard deviations and reiterated the necessity of considering growth rates or log transformations for meaningful comparisons.

Density Function and Skewness:

The discussion closed with an exploration of density functions and moments, particularly skewness as the third moment. Prof. Banik illustrated how these statistical measures provide insights into the distribution characteristics and the extent of skewness in income distribution.

Autocorrelation and Diagnostic Tests:

As Prof. Nilanjan Banik continued the session, he delved into the practical application of diagnostic tests to identify issues such as autocorrelation, heteroscedasticity, and multicollinearity. Using a dataset related to sales prices, beds, baths, and square footage, Prof. Banik explained how to conduct diagnostic tests to evaluate the quality of the regression model.

The discussion focused on the Durbin-Watson test for autocorrelation, with an emphasis on interpreting the test statistic. Prof. Banik illustrated how values between 1.8 and 2.2 indicate no autocorrelation, values below 1.8 suggest positive autocorrelation, and values above 2.2 suggest negative autocorrelation.

Heteroscedasticity and White Test:

 Moving on to heteroscedasticity, Prof. Banik introduced the concept of residual diagnostic tests and explained their significance. He demonstrated the use of the Breusch-Pagan test for heteroscedasticity, emphasizing the interpretation of p-values. Prof. Banik also touched upon the White test, a more comprehensive diagnostic tool that considers the multiplication of independent variables. He clarified that while the White test provides valuable insights, it comes at the cost of reduced degrees of freedom.

Multicollinearity and Variance Inflation Factor (VIF):

Addressing the issue of multicollinearity, Prof. Banik highlighted the importance of assessing the variance inflation factor (VIF). He explained that if the centered VIF is above 10, it indicates a problem of multicollinearity. Using the dataset, Prof. Banik demonstrated how to access the VIF through the views software, confirming that the variables in the given data did not exhibit multicollinearity.

Conclusion and Future Discussions:

 In conclusion, the session provided practical insights into the application of diagnostic tests in data analytics for policy research. Prof. Nilanjan Banik guided participants through the steps of identifying and interpreting various issues in regression analysis. The focus on real-world data and hands-on examples equipped attendees with valuable skills for conducting meaningful data analytics.

As Prof. Banik concluded the session, he opened the floor for questions and discussions, encouraging participants to seek clarification on the concepts covered. The interactive nature of the session allowed for a deeper understanding of the material, fostering an environment conducive to learning and engagement.

Disclaimer: All views expressed in the article belong solely to the author and not necessarily to the organisation.

Acknowledgment: Rahul Soni is a research Intern at IMPRI.

Read more session reports:

Probability Distributions: Density and Cumulative

Data Analytics for Policy Research

Previous articleWomen's Movement In India: Historical Evolution And Achievements – IMPRI Impact And Policy Research Institute
Next articleData Analytics, Stationarity, And Cointegration In Policy Research
IMPRI, a startup research think tank, is a platform for pro-active, independent, non-partisan and policy-based research. It contributes to debates and deliberations for action-based solutions to a host of strategic issues. IMPRI is committed to democracy, mobilization and community building.


Please enter your comment!
Please enter your name here