November
09

Hybrid Format

The Human Migration Database (HMigD)

Maciej Danko
Laboratory of Digital and Computational Demography, November 09, 2022

Maciej Dańko from the Laboratory of Digital and Computational Demography overviews the Human Migration Database.

Abstract

Migration has become a significant source of population change at the global level, with broad societal implications. Unfortunately, reliable migration statistics is not available for most of the countries, including the majority of European countries. In developed countries, these data are usually collected by national statistical institutes (NSI) or/and other governmental agencies. Nevertheless, their ability to track migration flows (especially emigration) is extremely limited. The alternative sources used to estimate migration stocks and flows are highly heterogeneous (e.g., population censuses, administrative registers, household surveys, residence and work permits, and registers of foreigners). Their use to assess migration patterns may therefore be hampered by inconsistent definitions and problems with quality and availability.  

The Human Migration Database (HMigD) is designed to provide high-quality data on migration in developed countries and fill the existing data gap. The main goal of the database is to provide reliable evidence on international migration flows. The database follows four main guiding principles first formulated in the HMD project: comparability, flexibility, accessibility, and reproducibility. This way, the HMigD fully adheres to the concept of Open Data. Unlike the HMD, the HMigD is a synthetic database, i.e. main output data are produced by statistical modeling. Nevertheless, it provides original data, exhaustive country-specific metadata and documentation, as well as scripts used for calculations to ensure full reproducibility of the results.  

One of the key aspects of migration data to be considered in migration models is their quality. In general, the quality depends on the ability of governmental agencies to trace migration flows, i.e. the legal incentives for registering the migration event and the methodology used to measure migration. Due to this reason, migration estimates produced by National Statistical Institutes (NSIs) and other sources (e.g. LFS) are not directly comparable. The major migration data quality problems can be classified into several groups:

  1. Accuracy issues related to random, rather than systematic errors made in the data collection process;
  2. Undercounting reflecting a non-systematic bias in migration estimates;
  3. Coverage, which is a special case of undercounting reflecting systematic biases. This bias occurs due to the rules that govern the data collection process, which may exclude certain population segments, such as nationals who are return migrants, or foreigners not being counted in the official immigration and emigration counts;
  4. Inconsistencies in the definition of international migrant due to deviations of national migration criteria (minimum duration of stay) from international (UN/Eurostat) standards.   

In the first stage of the project, the researchers systematically evaluate and classify data quality problems, which is an important task for creating a reliable evidence base for further stages of the project. Ignoring potential systematic errors and misinterpretation of problematic data can lead to misleading conclusions or estimates. The quality of migration data is assessed using available metadata, expert opinion, and data-driven methods. In the case of the undercounting of administrative data the researchers extend prior approaches based on quality assessment using expert knowledge by establishing more objective data-driven criteria. First, the new approach tested on the Eurostat/UN/NSI data relies on the outcomes from the bilateral flow ratio model comparing the same migration flows reported by the country under study and a set of “golden standard” countries with reliable register-based data. Second, more detailed metadata and new expert opinions are used as the complimentary information for deriving the country-period score. An additional important component of this work is a freely available visualization toolkit (a Shiny application), which can be easily used to estimate undercount scores by taking into account various expert opinion information.  

In the second stage of the project, the researchers integrate the available data within the Bayesian modeling framework. In this project, they extend previous work on estimating international migration by developing a hierarchical Bayesian model that integrates and harmonizes different migration data sources (such as produced by NSO or collected by LFS) taking into account differences in data quality and definitions used, as well as socioeconomic and demographic information. More specifically, the statistical model includes information on both the single countries and the relationships between pairs of countries, information that may be predictive of migration patterns. The purpose is to obtain new estimates of migration flows between pairs of countries and across time to improve our understanding of the causes and consequences of migration.

In the last stage the researchers do exhaustive data quality checks (including plausibility checks). The validated data are used to produce a final output that is accompanied by the Shiny application, thanks to which it is possible to access the results under various alternative modeling assumptions.

Initially, they focus only on EU countries. But they plan to extend the database and include more countries. Nevertheless, the database is limited (by design) by countries with the developed statistical system. Thus, the possible limit is 45-50 countries. The HMigD will be updated regularly to include the most recent data. 

About

Maciej Dańko has an interdisciplinary background in mathematics, statistics, biology and demography. He did his PhD at Jagiellonian University in Krakow working on optimal resource allocation models of ageing. In years 2008-2017, he was a post-doc and research scientist in Jim Vaupel's Labolatory of Evolutionary Biodemography, where he was involved in different projects related to formal demography, ageing and statistics. In years 2017-2020 he was Statistical Analyst - Research scientist in Anna Oksuzyan's Laboratory of Gender Gaps in Health and Survival working, among other things, on gender gaps in health reporting styles. Maciej is currently data scientist in Emilio Zagheni's Laboratory of Digital and Computational Demography at MPIDR interested in Human migration. His main project is building human migration database.

The Max Planck Institute for Demographic Research (MPIDR) in Rostock is one of the leading demographic research centers in the world. It's part of the Max Planck Society, the internationally renowned German research society.