Home Insights Key Insights From Data Analytics For Policy Research Training Course – Cohort...

Key Insights From Data Analytics For Policy Research Training Course – Cohort 2

Key Insights from Data Analytics for Policy Research Training Course - Cohort 2

Event Report
Aasthaba Jadeja

A One-Month Immersive Online Hands-On Certificate Training Course on Data Analytics for Policy Research – Cohort 2  was organised by #IMPRI Generation Alpha Data Center (GenAlphaDC), IMPRI Impact and Policy Research Institute, New Delhi from November 4 to 25, 2023.

Day 1|

Probability distributions: Density and Cumulative (Excel-based Session)-

In the first session on “Probability distributions: Density and Cumulative (Excel-based Session)” led by Professor Nilanjan Banikji at Mahindra University, Hyderabad, the focus was on hands-on learning of probability distributions with an emphasis on density and cumulative functions using Excel.

Professor Banikji, who serves as a Professor and Program Director for B.A. Economics and Finance began by providing a fundamental understanding of probability and later delved into concepts such as density and cumulative functions. The session explored the ANOVA interpretation of the model and dummy variables, with a specific focus on the density function’s importance in assessing the performance of outcome variables, particularly in the context of policy-making.

To reinforce the theoretical knowledge, Professor Nilanjan incorporated real-life examples, such as calculating the probability of rainfall in Delhi, to illustrate the practical applications of probability density functions in policy scenarios. The session seamlessly transitioned into a hands-on Excel training segment, where participants learned to implement the acquired knowledge. Practical aspects included drawing density distributions, generating random variables, and calculating statistical measures like mean and standard deviation.

The session concluded with additional examples and a Q&A session to ensure clarity and understanding among the students. The integration of Excel-based exercises and real-life examples enriched the learning experience, empowering participants to apply their knowledge in data analysis and decision-making. 

To read a more elaborate session report: click here

Data Deluge and Public Policy: Promises and Perils-

In the session on “Data Deluge and Public Policy: Promises and Perils” led by Dr. Soumyadip Chattopadhyay, participants explored the intricacies of data deluge in the context of public policy, with a particular focus on regression analysis, time series analysis, and forecasting. The session emphasised the significance of data in policy evaluation, considering economic, institutional, environmental, and technological factors. Dr. Chattopadhyay discussed the evolution of data collection methods, touching on concepts such as digital footprint and highlighting associated challenges.

The exploration of the Data Revolution concept introduced participants to emerging challenges in data analysis, necessitating new theories, methods, and tools. The 5 Frameworks of data—Volume, Variety, Velocity, Veracity, and Value—were detailed, emphasising their interlinkages. The session also covered the importance of data quality, discussing characteristics like accuracy, completeness, timeliness, consistency, and uniqueness, and their application in public policy analysis. The practical implications of these frameworks and characteristics were outlined, emphasising their capacity to address individual problems, enhance macro-level quality of life, and facilitate readiness for change. 

Additionally, the session touched on the economies of scale and scope of data, using examples from various sectors. The structure of government data, including administrative, survey, transactions, and institutional data, was detailed. The concept of an Integrated Data System and its relevance in the current public policy landscape were explored.

The session also covered the Integrated Command and Control Centre in Smart Cities, illustrating how data is generated, compiled, and utilised for policy making. Lastly, the discussion on the Revised National Policy on Official Statistics (NPOS) shed light on its characteristics, drawbacks, and alignment with the Digital India initiative, providing insights into the challenges of data-driven policymaking. The session not only presented the current state of data in public policy but also addressed future challenges in this dynamic landscape.

To read a more elaborate session report: click here

Research Ethics in Data Collection and Analysis-

In the session on “Research Ethics in Data Collection and Analysis” led by Dr. Amar Jesani, the focus was on unravelling the ethical dimensions of data collection and analysis, particularly in the context of policy analysis. Dr. Jesani, an Independent Researcher and Teacher in Bioethics, stressed the paramount importance of ethics in research, especially when dealing with data collected from the public.

The discussion delved into the intricate network of stakeholders involved in research, extending beyond researchers and participants to include funders and gatekeepers. Dr. Jesani underscored the ethical principles that must persist even when data is utilised by others as secondary data, emphasising the need for methodological rigour and cautioning against potentially harmful or misleading research.

The session emphasised ethical considerations related to informed consent, transparency, voluntary participation, and the protection of research participants, addressing issues such as privacy, confidentiality, and selection bias in data collection. Dr Jesani also navigated through the ethical challenges associated with data management, including responsible sharing, anonymization, pseudonymization, and prevention of research misconduct.

The session seamlessly transitioned into a comprehensive exploration of Data Management, where Dr. Amar Jesani elucidated the intricacies of managing different types of data available to analysts. The discussion covered the prevention of research misconduct, encompassing topics such as plagiarism, authorship credits, data fabrications, and falsifications. Dr. Jesani concluded the session by highlighting the critical relationship between research integrity and ethical considerations in data collection and analysis. Overall, the session provided a rich understanding of the ethical nuances intertwined with the research and policy analysis process, underlining the significance of maintaining ethical standards for the credibility and reliability of research outcomes.

To read a more elaborate session report: click here

Day 2| 

How to Carry out an Empirical Project: A Step-by-Step Approach-

Dr. Soumyadip Chattopadhyay’s session on “How to Carry out an Empirical Project: A Step-by-Step Approach” provided participants with a comprehensive and systematic guide for navigating the intricacies of empirical research. The session began by emphasising the fundamental importance of approaching empirical projects with a systematic mindset, ensuring rigour and reliability in research outcomes.

Dr. Chattopadhyay underscored the significance of clear and focused research questions, setting the foundation for a well-structured empirical investigation. The step-by-step guide offered by the speaker spanned crucial stages such as conducting a thorough literature review, formulating testable hypotheses based on a solid theoretical framework, choosing appropriate research designs and methodologies, and ensuring ethical considerations throughout the research process.

Moreover, Dr. Chattopadhyay’s session stood out not only for its technical guidance but also for encouraging critical thinking at each stage. The emphasis on developing a robust theoretical framework provided researchers with a conceptual backbone, fostering a deeper understanding of the relationships between variables and guiding the formulation of hypotheses. The session also touched on ethical considerations, addressing participant privacy and responsible communication of findings. By emphasising the holistic nature of empirical research, Dr. Chattopadhyay’s session equips researchers not only with the procedural intricacies but also with the ethical and thoughtful approach necessary for making meaningful contributions to their academic or practical domains.

To read a more elaborate session report: click here

The Statistical System in India and an Introduction to Various Official and Other Databases-

In the illuminating session led by Dr. Arjun Kumar on “The Statistical System in India and an Introduction to Various Official and Other Databases,” participants gained valuable insights into the pivotal role played by the statistical landscape in shaping evidence-based policymaking, economic planning, and societal development in India. The session report meticulously outlines the multi-tiered structure of India’s statistical system, involving central and state agencies, with the Ministry of Statistics and Programme Implementation (MoSPI) at the helm. Dr. Kumar underscored the significance of institutions like the Central Statistical Office (CSO) and the National Sample Survey Office (NSSO) in collecting and analysing data critical for policy formulation.

The report provides an insightful overview of key official databases, starting with the National Sample Survey (NSS) and the Census of India, which offer granular insights into consumption patterns, employment, and demographic trends. Dr. Kumar highlighted other crucial databases such as the Economic Census, National Accounts Statistics (NAS), Reserve Bank of India (RBI) Database, and Ministry of Finance Databases, each contributing to a nuanced understanding of India’s economic and financial landscape. The Health Management Information System (HMIS) and National Crime Records Bureau (NCRB) were also spotlighted for their role in providing data on healthcare performance and crime statistics, respectively.

Moreover, the report delves into the broader roles of the statistical system in infrastructure development, monetary and fiscal policy formulation, and addressing crime-related challenges. Dr. Kumar acknowledged the existing issues and challenges within the statistical system, such as survey fatigue, coverage gaps, and data accessibility, emphasising the need for ongoing refinement. The session concluded by highlighting the empowering role of statistical data in tailoring policies to address specific challenges, fostering inclusive development, and providing a framework for monitoring and evaluating policy impact over time. Dr. Kumar stressed the continual evolution and refinement of the statistical system as crucial to meeting the dynamic needs of a rapidly changing society, encapsulating the essence of the session’s key takeaways.

To read a more elaborate session report: click here

Day 3|

Interpretation of Model-

On the third day of the program, Professor Nilanjan Banik, an expert in economics and finance, conducted a session on “Interpretation of Model.” The session primarily focused on statistical concepts, including cumulative distribution and density functions, normal distribution, log transformation, and non-parametric tests.

Professor Banik used practical examples and Excel applications to guide participants through these concepts, emphasising their relevance in hypothesis testing, particularly in the context of issues like income inequality. The lecture also delved into diagnostic tests, such as the Durbin-Watson test for autocorrelation, Breusch-Pagan test for heteroscedasticity, and the White test, providing participants with practical tools to assess the quality of regression models. The session concluded with a focus on practical applications of diagnostic tests in real-world datasets related to sales prices, beds, baths, and square footage.

The discussion on diagnostic tests continued, covering issues like autocorrelation, heteroscedasticity, and multicollinearity. Professor Banik provided insights into the interpretation of test statistics, emphasised the significance of p-values, and demonstrated the use of tools like the Variance Inflation Factor (VIF) to address multicollinearity concerns. The practical and hands-on approach equipped participants with valuable skills for conducting meaningful data analytics in the field of policy research. The session’s interactive nature allowed for questions and discussions, fostering a deeper understanding of the material and creating an engaging learning environment.

In conclusion, Professor Nilanjan Banik’s session on data analysis for policy research provided a comprehensive overview of statistical concepts and diagnostic tests. 

To read a more elaborate session report: click here

Regression with Time Series Analysis & Forecasting: A Primer-

Dr. Soumyadip Chattopadhyay, an Associate Professor of Economics, conducted a session on “Regression with Time Series Analysis & Forecasting: A Primer.” The focus was on non-forecasting aspects, with a particular emphasis on the importance of stationarity in ensuring the reliability of time series data. Dr. Chattopadhyay elucidated the concept of stationarity, emphasising its role in maintaining stability and reliability, especially when forecasting. He discussed methods for detecting stationarity, including graphical approaches, the auto-correlation function, and unit root processes.

The session transitioned to an explanation of the auto production function, introducing the concept of cold programming and the coldogram to understand stationarity. Dr. Chattopadhyay discussed the decision rule for interpreting coldograms, providing insights into how the values of Rouke indicate stationarity. Practical considerations, such as determining the maximum lag length and assessing the statistical significance of Rok, were also covered. The session delved into cointegration as a sophisticated approach to regression analysis with non-stationary time series variables, discussing methods for addressing non-stationarity and introducing the Engle-Granger test for cointegration.

In the practical application segment, Dr. Chattopadhyay demonstrated the steps for data import and transformation, emphasising the necessity of transforming data into logarithmic form for analysis. The stationarity of time series variables was assessed using autocorrelation functions, correlograms, and unit root tests. Differencing was applied to make the variables stationary, and cointegration analysis was performed through regression and augmented Dickey-Fuller tests on residuals. The session provided participants with a comprehensive understanding of time series analysis, stationarity, and cointegration, with practical insights for applying these concepts in policy research. 

To read a more elaborate session report: click here

Gender Mainstreaming of Data, Monitoring and Evaluation

On the third day of the program, Dr. Vibhuti Patel, a Visiting Distinguished Professor, conducted a session on “Gender Mainstreaming of Data, Monitoring and Evaluation” and  led a comprehensive discussion on the pivotal role of accurate statistics in addressing gender-based differences and inequalities. Dr. Patel set the stage by highlighting the crucial need for precise statistics to effectively address problems related to men, women, and the gender spectrum across various aspects of life. She traced the historical context back to the Beijing Platform in 1995, where global representatives stressed the importance of gender-disaggregated statistics, laying the foundation for gender mainstreaming in data analytics.

Dr. Patel delved into the challenges associated with data collection, particularly gender stereotypes impacting the accurate representation of women’s economic activities in sectors like the workforce and agriculture. The discussion extended to the critical issue of unpaid care work, shedding light on how prevailing data collection methods often undervalue women’s contributions in household and caregiving activities, leading to biased policy interventions that perpetuate inequalities. The impact of gender stereotypes on policy interventions became evident as Dr. Patel highlighted the oversight of women-headed households, which, despite being among the poorest, often remain unnoticed in development and welfare schemes due to statistical biases.

The session also emphasised the importance of intersectional data collection, considering diverse characteristics such as disability, age, and gender. Dr. Patel illustrated this with examples from countries like Vietnam, where disability statistics played a crucial role in post-conflict economic planning. Emphasising the significance of concepts and definitions in data collection, Dr. Patel argued for inclusive criteria in defining the workforce, especially acknowledging the often overlooked contributions of women in unpaid work. The session concluded by underscoring the indispensable role of gender statistics in decision-making processes, asserting that accurate data is fundamental for governments to address gender-based violence, health outcomes, education, and representation in decision-making bodies. 

To read a more elaborate report: click here

Day 4|

Hands-on Data Learning Session on Dummy Variables

In a comprehensive Hands-on Data Learning Session on Dummy Variables, Professor Nilanjan Banik, provided a thorough exploration of dummy variables within the context of regression analysis. Acknowledging the persistent challenge of integrating qualitative variables into statistical models, Prof. Banik elucidated the practical applications and intricacies of dummy variables. The presentation went beyond theoretical concepts, offering practical examples and emphasising the interpretation of coefficients associated with dummy variables. With a focus on their role in capturing structural breaks in data, Prof. Banik utilised the growth rate of the Indian GDP as a tangible example, showcasing the versatility of dummy variables beyond conventional uses.

The session emphasised the indispensable role of dummy variables in navigating the complexities introduced by qualitative variables in regression analysis. Prof. Banik adeptly addressed the nuances associated with shifts in intercepts and slopes, particularly when manifesting in economic parameters like the GDP growth rate.

The concept of structural breaks, representing significant shifts in the data-generating process, was seamlessly integrated into the discussion, with dummy variables emerging as crucial tools for their identification and capture. The interpretation of coefficients associated with these variables played a central role, offering attendees valuable insights into substantial alterations in economic conditions. The practical application of deseasonalizing sales data further highlighted the real-world utility of dummy variables beyond structural break detection.

The presentation reached advanced levels with the introduction of the “Spike” command in EViews, presenting a powerful tool to address changes in both intercepts and slopes. Prof. Banik demonstrated the application of this command, recognizing its significance even when coefficients may not achieve statistical significance in every instance. The practical implications of Prof. Banik’s insights extended beyond academic realms, emphasising the crucial role of dummy variables in applied research and data-driven decision-making across industries. The session not only equipped attendees with theoretical knowledge but also provided practical tools for analysts to navigate the intricacies of qualitative variables and extract meaningful insights from their data.

To read a more elaborate session report: click here

Regression Analysis with Qualitative variables – Categorical Dependent Variable Regression (including Logit and Probit Model)-

In a Hands-on Data Learning Session, Dr. Soumyadip Chattopadhyay, provided a detailed exploration of “Regression Analysis with Qualitative variables – Categorical Dependent Variable Regression (including Logit and Probit Model).” The session aimed to bridge the gap between theoretical knowledge and practical application, offering the audience valuable insights into the challenges and methodologies associated with categorical dependent variables. Dr. Chattopadhyay navigated the intricacies of binary choice models, focusing on the inadequacies of linear probability models and steering the audience towards more nuanced alternatives, particularly the logit and probit models.

The heart of the discussion delved into the theoretical foundations of the logit model, with Dr. Chattopadhyay dissecting its intricacies and emphasising the indirect relationship between changes in independent variables and their impact on the dependent variable through an intermediate variable. A comparison between logit and probit models shed light on their distinctive characteristics, considering factors that influence the choice between the two.

The presentation delved into the practical application of these models, utilising an empirical dataset to investigate the determinants of employment mode among rural nonfarm workers in West Bengal. Dr. Chattopadhyay’s step-by-step demonstration of the logit regression model using EViews software, including the interpretation of results, assessment of variable significance, and calculation of marginal effects, provided a comprehensive understanding of these regression techniques in real-world scenarios.

In conclusion, Dr. Chattopadhyay’s presentation served as a valuable resource for researchers, offering a holistic view of Regression Analysis with a focus on qualitative variables and categorical dependent variables. 

To read a more elaborate session report: click here

Acknowledgement: Posted by Reet lath, Researcher at IMPRI.

  • IMPRI Desk

    IMPRI, a startup research think tank, is a platform for pro-active, independent, non-partisan and policy-based research. It contributes to debates and deliberations for action-based solutions to a host of strategic issues. IMPRI is committed to democracy, mobilization and community building.

Previous articleThe Left Is Here To Stay: Insights From The Vachathi Case – IMPRI Impact And Policy Research Institute
Next articleUnravelling The India-Maldives Diplomatic Challenge – IMPRI Impact And Policy Research Institute
IMPRI, a startup research think tank, is a platform for pro-active, independent, non-partisan and policy-based research. It contributes to debates and deliberations for action-based solutions to a host of strategic issues. IMPRI is committed to democracy, mobilization and community building.


Please enter your comment!
Please enter your name here