Biostatistics IV: Workshop on Data Management using Stata for Analyzing Secondary Data

Introduction   

Most biostatistical courses teach you how to analyze datasets that are ready for analysis, and they do not inform you on how to create data sets. In real data analysis works, creating the analysis datasets is often required more time and skills than conducting statistical analyses. The purpose of this workshop is to teach participants the core data management skills for creating datasets ready for statistical analysis. These skills will help researchers to create better data quality for the paper or report. The workshop uses Stata software, which offers an excellent combination of data manipulation capabilities, and user-friendliness. Furthermore, Stata provides wider statistical analysis techniques and written programs by many experts. In this workshop, we will use examples from The Indonesian Demographic and Health 2017 (IDHS 2017), the Global Earlier Adolescent Study (GEAS), and INAMHS data available at the Center Reproductive Health. Participants must bring a short proposal before attending this workshop.


Expected outcome.

By the end of the workshop participants will be able to:

  • Formulate secondary data analysis process: 1) define the study question, 2) collect the data, 3) clean the data, 4) analyze the data, and 5) visualize data and share the findings, 
  • Appraise variables in the datasets from other data sources (IDHS and GEAS) using many data formats, including Excel spreadsheets, SPSS, and ASCII files,
  • Formulate main dependent and independent variables as well as covariates in the form of a diagram of analytical frameworks,
  • Investigate data structure, identify errors in data, fix data errors, and confirm that variables have been created correctly,
  • Create analysis datasets that merge data from multiple sources, such as merging from parent and adolescent data sets,
  • Create longitudinal datasets that append data from multiple time periods,
  • Create variables that require calculations across observations and files,
  • Reshape the structure of analyses datasets by converting a dataset that has one row per person and one column for each year to a dataset that has one row for each person-year,
  • Increase efficiency and reproducibility of results by conducting all steps of data analysis from within Stata do-files (reading in data; investigating/cleaning data; creating analysis variables; running analyses; and presenting results)
  • Increase productivity by learning how to automate iterative tasks rather than writing separate commands for each task, and
  • Demonstrate how to make reproducible analysis and report acceptable for scientific journals.


About the instructor

Siswanto Agus Wilopo is a Professor of Population Health (retired) and the Senior Researcher at the Center for Reproductive Health, Faculty of Medicine, Public Health, and Nursing, The Universitas Gadjah Mada, Yogyakarta Indonesia. He is also an adjunct/visiting Full Professor of the College of Health and Agricultural Sciences, University College Dublin, Ireland. In the Global health field, his current main interest is in the global health system and financing, including financing for reproductive health services and gender-based violence (GBV) problems. His current research addresses issues for adolescent groups, including a multi-country study on global early adolescent health (GEAS) and mental health (NAMHS) with researchers from more than 35 countries.


Teaching Assistant:

  1.  Althaf Setiawan, S.Si, MPH (DHS)
  2.  dr. Ifta Choiriyyah, MSPH, PhD (GEAS)
  3.  Mustikanintyas, S.Psi, Psi, MPH  (I-NAMHS)
  4.  Heru Subekti, S.Kep., Ners, MPH (I-NAMHS)