Abstract
This document provides technical documentation for the Dynamic Gravity dataset. The Dynamic Gravity dataset provides extensive country and country pair information for a total of 285 countries and territories, annually, between the years 1948 to 2016. This documentation extensively describes the methodology used for the creation of each variable and the information sources they are based on. Additionally, it provides a large collection of summary statistics to aid in the understanding of the resulting Dynamic Gravity dataset.
This documentation is the result of ongoing professional research of USITC Staff and is solely meant to represent the opinions and professional research of individual authors. It is not meant to represent in any way the views of the U.S. International Trade Commission or any of its individual Commissioners. It is circulated to promote the active exchange of ideas between USITC Staff and recognized experts outside the USITC, professional development of Office Staff and increase data transparency by encouraging outside professional critique of staff research. Please address all correspondence to Tamara.Gurevich@usitc.gov or Peter.Herman@usitc.gov.
The Dynamic Gravity dataset contains a collection of variables describing aspects of countries and territories as well as the ways in which they relate to one-another.1 Each record in the dataset is defined by a pair of countries or territories and a year. The records themselves are composed of three basic types of variables: identifiers, unilateral characteristics, and bilateral characteristics. The dataset spans the years 1948–2016 and reflects the dynamic nature of the globe by following the ways in which countries have changed during that period. The resulting dataset covers 285 countries and territories, some of which exist in the dataset for only a subset of covered years.2
The identifying variables (see section 2) describe the countries or territories to which each record applies. Because some of the variables in the dataset are bilateral and directional while others are not, it is often necessary to specify how certain variables relate to the two countries in the record. To do so, each record has a designated “origin” and “destination” country, through which the directionality or applicability of each variable is specified. Throughout, the dataset uses the notation identifier_o and identifier_d to define the “origin” and “destination” country, respectively.3 Other variables, reflecting either country-specific information or directional information, utilize a similar nomenclature. The dataset is square in the sense that for every record in which a country such as Argentina is listed as the origin and a second, Brazil, is listed as the destination, there is a corresponding record in which the designations are reversed and Argentina is the destination while Brazil is the origin.
The unilateral variables are those that pertain exclusively to a single country. For example, these include variables describing GDP, membership in an international organization, and political stability. Because each record conveys information for two countries, each unilateral variable is listed twice with one series pertaining to each of the two countries following the “origin” and “destination” convention. In general, unilateral variables can be quickly identified by their name, which ends in _o or _d.
The bilateral variables are those that reflect information specific to a pair of countries. For example, bilateral variables include the distance between countries, whether they belong to a common trade agreement, or whether either is in some form of conflict with the other. In some cases—such as distance—variables are not directional, in which case the data does not change based on which country is designated as “origin”. In other cases, the data is directional wherein the relationship of the “origin” country to the “destination” country is not necessarily the same as the “destination” to the “origin”. For example, the variables reflecting colonial relationships are typically one-way such that only one of the two countries was a colony of the other. In these cases, the naming of the variable specifies to which country the information applies.
Table 1 below presents a brief description of variables and page numbers corresponding to additional details about those variables. Appendix A1 expands on this list to provide the reader the exact variable names, a brief description of each variable, and page numbers corresponding to additional details on definitions, sources, assumptions, and construction procedures for each variable.
Variable |
Description | Page |
Country Identifiers | ||
country |
Name of origin/destination country | 9 |
iso3 |
3-digit ISO code of origin/destination country | 9 |
dynamic_code |
Year appropriate 3-digit code of origin/destination country | 9 |
year |
Year of observation | 9 |
Macroeconomic Indicators | ||
pop |
Population of origin/destination country | 11 |
capital_cur |
Capital stock at current PPP of origin/destination country | 11 |
capital_const |
Capital stock at constant prices of origin/destination country | 11 |
gdp_pwt_const |
Real, inflation-adjusted, PPP-adjusted GDP of origin/destination country (PWT) | 15 |
gdp_pwt_cur |
Real, current, PPP-adjusted GDP of origin/destination country (PWT) | 15 |
gdp_wdi_const |
Real GDP of origin/destination country (WDI) | 15 |
gdp_wdi_cur |
Nominal GDP of origin/destination country (WDI) | 15 |
gdp_wdi_cap_const |
Real GDP per capita of origin/destination country (WDI) | 15 |
gdp_wdi_cap_cur |
Nominal GDP per capita of origin/destination country (WDI) | 15 |
Geographic Variables | ||
lat |
Latitude coordinate of origin/destination country | 20 |
lng |
Longitude coordinate of origin/destination country | 20 |
distance |
Population weighted distance between country pair | 20 |
contiguity |
Country pair shares a common border | 23 |
landlocked |
Origin/destination country is landlocked | 23 |
island |
Origin/destination country is an island | 23 |
region |
Geographic region of origin/destination country | 23 |
Cultural Variables | ||
common_language |
Residents of country pair speak at least one common language | 25 |
colony_of_destination_current |
Origin country is a colony of the destination country | 26 |
colony_of_origin_current |
Destination country is a colony of the origin country | 26 |
colony_of_destination_ever |
Origin country was ever a colony of the destination country | 26 |
colony_of_origin_ever |
Destination country was ever a colony of the origin country | 26 |
colony_of_destination_after45 |
Origin country was a colony of the destination country after 1945 | 26 |
colony_of_origin_after45 |
Destination country was a colony of origin country after 1945 | 26 |
Trade Facilitation Variables | ||
agree_pta |
Country pair is in at least one active preferential trade agreement | 29 |
agree_pta_goods |
Country pair is in at least one active preferential trade agreement covering goods | 29 |
agree_pta_services |
Country pair is in at least one active preferential trade agreement covering services | 29 |
agree_cu |
Country pair is in at least one customs union | 29 |
agree_eia |
Country pair is in at least one economic integration agreement | 29 |
agree_fta |
Country pair is in at least one free trade agreement | 29 |
agree_psa |
Country pair is in at least one partial scope agreement | 29 |
member_eu |
Origin/destination country is a European Union member | 37 |
member_wto |
Origin/destination country is a World Trade Organization member | 37 |
member_gatt |
Origin/destination country is a General Agreement on Tariffs and Trade member | 37 |
member_eu_joint |
Country pair are both members of the European Union | 37 |
member_wto_joint |
Country pair are both members of the World Trade Organization | 37 |
member_gatt_joint |
Country pair are both members of the General Agreement on Tariffs and Trade | 37 |
Measures of Institutional Stability | ||
polity |
Polity (political stability) score of origin/destination country | 54 |
polity_absolute |
Absolute value of the Polity score of the origin country | 54 |
hostility_level |
Level of the origin/destination country’s hostility toward the destination/origin country | 44 |
sanction_threat |
There exists a threat of sanction between one country in a record towards the other | 59 |
sanction_threat_trade |
There exists a threat of trade sanction between one country in a record towards the other | 59 |
sanction_imposition |
There exists a sanction between one country in a record towards the other | 59 |
sanction_imposition_trade |
There exists a trade sanction between one country in a record towards the other | 59 |
The remainder of this documentation is divided into six sections, each devoted to a group of related variables. Each of these sections provides extensive details on the data sources on which the variables were based; more thorough descriptions of the variables themselves; and detailed notes on the methods and assumptions that were used in the construction of the variables. Section 2 describes the variables that identify observations in the dataset such as country codes and years. Section 3 describes variables that reflect macroeconomic conditions such as GDP, capital stocks, and population. Section 4 describes variables that reflect geographic characteristics of countries such as distance and borders. Section 5 describes variables that reflect cultural characteristics such as shared languages or colonial relationships. Section 6 describes variables that reflect country or country pair specific trade facilitation measures such as trade agreements or the World Trade Organization. Finally, section 7 describes variables that reflect institutional aspects of countries such as political stability and economic sanctions.
Additionally, many of the six primary sections include appendices located at the end of this document. These appendices largely contain tables that extensively describe certain aspect of some variables that we have decided were important to thoroughly document and report, but may not be particularly pertinent for typical readers or users. For example, appendix E provides a detailed list of all trade agreements reflected in the corresponding variables.
In each year t, a record is uniquely identified by a combination of the ISO alpha-3 code of the origin country, iso3_o, and the destination country, iso3_d, assigned to each country and territory by the International Organization for Standardization.4 While these identifiers are unique within each year, in some instances countries significantly change their geographic and political characteristics while retaining the same ISO alpha-3 codes over time. To better track changes occurring to countries over time, we developed an additional country identifier, dynamic_code_o/d, described in detail below.
The universe of coverage of this dataset was constructed by identifying ISO alpha-3 codes of all countries and territories that appear as participants in a reported trade flow within WTO Trade Databases or the UN Comtrade database for at least one year starting in 1948.5 These countries and territories were then tracked through the period of 1948–2016 using the CIA World Factbook.6 We then used Hammond’s atlases of the world for 1948–1972, Rand McNally atlases of the world for 1973–1995, and National Geographic Society atlases of the world for 1996–2014 to identify any geographic changes that occurred in those countries during that period. In situations where a country or territory changed its geographic boundaries, but retained its ISO alpha-3 code, a modification of its ISO alpha-3 code—dynamic_code variant—was assigned to keep track of such changes.
country_o: Name of the origin country.
country_d: Name of the destination country.
iso3_o: ISO alpha-3 of the origin country.
iso3_d: ISO alpha-3 of the destination country.
dynamic_code_o: A dynamic code of the origin country that reflects changes to a country’s
composition that are not indicated by the corresponding iso3_o code.
dynamic_code_d: A dynamic code of the origin country that reflects changes to a country’s
composition that are not indicated by the corresponding iso3_d code.
year: Year upon which the record is based.
ISO 3-alpha codes were assigned to countries using the official designations by the International Organization for Standardization for each year a country or a territory exists in the dataset.
There are several cases in which a country experienced a series of geopolitical events that resulted in substantial changes to the geographic and political composition of the country, but a new ISO 3-alpha code indicating that the country had fundamentally changed was not assigned. For example, following the split of the territory formerly known as East Bengal and later East Pakistan from Pakistan in 1971, a new country—Bangladesh—was formed. This country was assigned ISO 3-alpha code “BGD”. At the same time, Pakistan, which had previously claimed East Pakistan as part of its territory, lost nearly 16% of its land area and over 50% of it’s population when Bangladesh declared independence. In addition, Pakistan lost a land border with Myanmar. It did, however, retain its pre-split ISO 3-alpha code “PAK”.
In order to keep track of changes like these, we developed an extension of the ISO 3-alpha codes that we call dynamic_code. In cases in which a country retains a previously used ISO 3-alpha code following a major change in its geography, the dynamic_code identifies this change by appending an additional indicator to the ISO alpha-3 code. By convention, this addition appends a ‘.X’ to the ISO code. For example, following the separation of Bangladesh from Pakistan in 1971, a modified dynamic_code “PAK.X” is assigned to denote the smaller post-split Pakistan. Appendix table B2 has a full list of countries that underwent significant changes, but retained their original ISO 3-alpha.
Importantly, in these cases, we make no modifications to the original ISO 3-alpha codes assigned by the International Organization for Standardization, which are reflected in the iso3 variables. We recognize that the original ISO 3-alpha codes are likely the most useful identifiers when matching to other data sources and have protected their integrity. Nonetheless, we believe having the additional dynamic_code is valuable as well.
In addition to these concerns, several other special circumstances arose that required further
considerations and special treatment:
Byelorussian SSR overlaps with the Soviet Union: UN Comtrade reports data for Byelorussian
SSR (iso3 : BYS, dynamic_code : BYS) while it is a part of the Soviet Union. For this reason, we
have decided to include an identifier for Byelorussian SSR during appropriate years
(1950–1990).7
Western Sahara: Western Sahara (iso3 : ESH, dynamic_code : ESH) is a disputed territory under
Moroccan (iso3 : MAR, dynamic_code : MAR) control. Nonetheless, we have decided to
include an identifier for Western Sahara because UN Comtrade does report trade for it.
Prior to 1975 this region was known as Spanish Sahara (iso3 : ESH, dynamic_code
: ESH), an occupied territory under Spain’s (iso3 : ESP, dynamic_code : ESP)
control.8
Omitted territories: There are three territories for which Comtrade reports trade data: Belgium-Luxembourg (iso3 : BLX), European Union (iso3 : EUN), and Free Zones (iso3 : FRE). These territories do not have corresponding gravity data and were omitted from this dataset.
This section describes population and economic aggregates of countries and regions that may influence cross-country trade. Currently we include 18 variables measuring macroeconomic performance of the reporting economies. Data sources and construction methodology of these variables are discussed in the following subsections.
The variables covering population and capital stock were sourced from the Penn World Tables (PWT), version 9.0 [Feenstra et al., 2015]. This dataset provides information for the years 1950–2014 for a large set of countries. However, while our dataset covers 126 countries and territories in 1950, PWT only covers 55. By 2014, our dataset has information on 251 countries and territories, while PWT covers only 182. Coverage also varies by variable, driven by the availability of underlying data used by the PWT. Figure 1 below shows the percent of countries and territories in our dataset that have information about population and capital stock in a given year.
Capital stock variables are derived by the PWT from investment data by asset using information about four assets: residential and non-residential structures, machinery, transport equipment, and other assets such as software, intellectual property, and cultivated assets.9 Capital stock data is reported using two sets of prices: current prices adjusted for purchasing power parity (PPP) and constant prices that are not adjusted for the differences in PPP but use 2011 domestic prices for each country to calculate inflation-adjusted value of capital stock. The PPP adjustment takes into account the fact that the real prices in developing countries are often lower than those in the developed countries. Therefore, using PPP-adjusted measures of capital stock may provide a more accurate comparison of countries within a given year. On the other hand, the constant value of capital stock may facilitate within-country comparisons over time.
pop_o: Population (in millions) of country_o in year t.
pop_d: Population (in millions) of country_d in year t.
capital_cur_o: Capital stock at current PPP-adjusted prices (in millions US$) of country_o in
year t, with 2011 as the base year.
capital_cur_d: Capital stock at current PPP-adjusted prices (in millions US$) of country_d in
year t, with 2011 as the base year.
capital_const_o: Capital stock at constant 2011 national prices (in millions US$) country_o in
year t.
capital_const_d: Capital stock at constant 2011 national prices (in millions US$) of country_d in
year t.
We have not altered the data reported by the source in any way other than to correctly map them to the identifiers in the Dynamic Gravity dataset.
GDP and GDP per capita come from two datasets: the Penn World Tables (PWT) version 9.0 [Feenstra et al., 2015] and the World Bank’s World Development Indicators (WDI) [World Bank 2016]. Several of the variables—gdp_pwt_const and gdp_pwt_cur—were sourced from the PWT. This dataset provides information for the years 1950–2014, covering 55 countries in 1950 and increasing to 182 countries in 2014. The remaining GDP and GDP per capita variables—gdp_wdi_const, gdp_wdi_cur, gdp_wdi_cap_const, and gdp_wdi_cap_cur— were sourced from the WDI. This dataset provides information for the years 1960–2015, beginning with 135 countries in 1960 and expanding the coverage to 233 countries by 2015. While neither PWT nor WDI provide coverage for all years and countries in our dataset, together they cover nearly 90% of the gravity dataset observations overall and over 95% of the observations from 1995–2015. Figure 2 below shows the percent of countries and territories in our dataset that have information about some GDP measures in a given year.
The two data sources allow for different comparability of countries’ GDP over time and across borders. The PWT computes GDP as the value of a country’s output at prices adjusted for purchasing power parity (PPP). Such adjustment takes into account the fact that the real prices in developing countries are often lower than those in developed countries. Therefore, using PPP-adjusted measures of GDP may provide a more accurate comparison between countries in a given year. On the other hand, the WDI measures real and nominal GDP without price adjustments, thereby facilitating within-country comparisons over time.10
gdp_pwt_const_o: Real GDP measured at inflation-adjusted and PPP-adjusted prices (in
millions US$) of country_o in year t, with 2011 as the base year.
gdp_pwt_const_d: Real GDP measured at inflation-adjusted and PPP-adjusted prices (in
millions US$) of country_d in year t, with 2011 as the base year.
gdp_pwt_cur_o: Real GDP measured at current PPP-adjusted prices (in millions US$) of
country_o in year t, with 2011 as the base year.
gdp_pwt_cur_d: Real GDP at current PPP-adjusted prices (in millions US$) of country_d in
year t, with 2011 as the base year.
gdp_wdi_const_o: Real GDP (in US$) of country_o in year t, with 2010 as the base
year.
gdp_wdi_const_d: Real GDP (in US$) of country_d in year t, with 2010 as the base
year.
gdp_wdi_cur_o: Nominal GDP (in US$) of country_o in year t, measured at current prices in
year t.
gdp_wdi_cur_d: Nominal GDP (in US$) of country_d in year t, measured at current prices in
year t.
gdp_wdi_cap_const_o: Real GDP per capita (in US$) of country_o in year t, with 2010 as the
base year.
gdp_wdi_cap_const_d: Real GDP per capita (in US$) of country_d in year t, with 2010 as the
base year.
gdp_wdi_cap_cur_o: Nominal GDP per capita (in US$) of country_o in year t, measured at
current prices in year t.
gdp_wdi_cap_cur_d: Nominal GDP per capita (in US$) of country_d in year t, measured at current prices in year t.
We have not altered the data reported by the source in any way other than to correctly map them to the identifiers in the Dynamic Gravity dataset.
Geographic variables describe the physical characteristics of a country that affect its level of trade with other countries. There are seven geographic variables in the current dataset that provide measures of geographic determinants of bilateral trade. In particular, they reflect location and connectedness, and are often used as a proxy for shipping or other transport costs. Of these measures, two are bilateral variables that measure relative proximity (distance and contiguity), while the remaining describe a country’s location and features (latitude, longitude, region, island, and landlocked).
In order to account for the potentially large geographic area that many countries cover, and in recognition that their economic activity and trade occur in multiple places within their borders, extra care was taken in the definition of the physical location and proximity variables latitude, longitude, and distance. To more accurately capture the location of economic activity within each country, these variables were based on city-level data. The latitude and longitude values reflect the simple midpoint between these cities. Meanwhile, the geographic distance between countries is based on the methodology developed by Mayer and Zignago [2005] and reflects the distance between pairs of cities, weighted by the proportion of the country’s population residing in each city, in kilometers. Defining distance in this way is meant to more accurately capture the distance economic activity must travel between two countries. To illustrate, New York City and Shanghai are 11,861 km apart, while Los Angeles is only 10,072 km from Beijing, a difference of 1,789 km. The value provided in the Dynamic Gravity dataset, 11,454 km, takes into consideration not only these three cities but numerous others throughout the United States and China in order to provide an economically meaningful average distance between the two countries. This city-based methodology also permits the calculation of an internal distance within a country, reflecting the distance that domestically produced and consumed goods and services must travel.
The majority of the latitudinal, longitudinal, and population data used to calculate latitude, longitude, anddistance was collected from the basic version of Simplemaps.com’s “World Cities Database”.11 Simplemaps.com has compiled data for a set of about 7,300 cities. Simplemaps.com draws on multiple sources to compile their database. The data pertaining to U.S. cities stems from the U.S. Census Bureau and the U.S. Geological Survey and the remainder of the non-U.S. data is from the National Geospatial-Intelligence Agency. This one dataset covers 221 countries, or 78 percent of all countries in the Dynamic Gravity dataset. However, for two countries, Samoa and Gibraltar, the populations reported by Simplemaps.com were erroneously large compared to other sources’ estimates. In these cases, we substituted the city populations with data from Brinkhoff and CIA [2017], respectively.
Despite the relatively thorough coverage provided by Simplemaps.com, there were thirty countries present in the Dynamic Gravity dataset not covered by Simplemaps.com. For those countries, latitudes and longitudes were collected from the Geohack website [Geohack]. Additional city-level population data were collected from census reports (16 countries), the U.N. Statistics Division’s World Statistics Pocketbook [World Statistic Pocketbook, 2016](6 countries), and the C.I.As World Factbook [CIA, 2017] (8 countries).12 The U.N. and C.I.A. websites were consulted first but if they too lacked city level data, we searched for census reports. In some cases for small islands, it was not possible to find city level data so the entire island’s population was used. Table 4 in Appendix B provides further details on the cities and sources used.
Similarly, there were forty-three countries that no longer exist and were not present in the Simplemaps.com data. For twenty-two of these countries, population data was sourced from the UNDESA, 2016, which tracks cities with populations over 300,000 back to 1950. If the country did not have any cities in that data set, we relied on historical census data found individually or through one of two websites: Brinkhoff and Lehmeyer [1999–2006]. There are also 8 countries, which were renamed, but did not otherwise change their geography, so we applied the data from Simplemaps.com [2015] to those countries.13 Table 5 in Appendix B provides further details on the cities and sources used for each country.
All data from Simplemaps.com is from 2015, all other data is for the closest year to 2015 that was available (either before or after 2015). For countries that no longer exist, the data is from the year closest to the country’s last year in existence in our data set.
lat_o: The variable lat_o is the average of the latitudes of major cities in country_o
lat_d: The variable lat_d is the average of the latitudes of major cities in country_d
lng_o: The variable lng_o is the average of the longitudes of major cities in country_o
lng_d: The variable lng_d is the average of the longitudes of major cities in country_d
distance: The variable distance is the population-weighted average of city-to-city bilateral distances in kilometers between each major city in country_o and country_d.
The variables latitude and longitude are based on the respective locations of the largest city or cities in each country, as described in section 4.1.1. The reported values in the Dynamic Gravity dataset are the simple average of the city-level coordinates. As a result, coordinates reported for each country represent an average location for each country.
The variable distance is the population-weighted distance between country_o and country_d. This variable was calculated by weighting the distance between major cities of each country by each city’s population. The following formula, adapted from Mayer and Zignago [2011], was used to calculate the distance:
where i is a city in country_o and j is a city in country country_d. Note that
and
reflect
only the total population residing in the cities of country_o and country_d that were used
in calculation of distance, not the total population of the countries overall. We set
, which is the
sensitivity of trade flows to bilateral distance, equal to 1, which is the sensitivity used by Mayer and
Zignago [2011].14
The distance between cities is calculated using a greater circle distance formula. Greater circle distance, or geodesic distance, is the shortest distance between two points on a sphere.15 The formula uses the Spherical Law of Cosines to determine distance, which assumes the Earth is a perfect sphere, with a constant radius of 6,372.795 km [Weisstein, 2017]. One potential downfall of this method is that it takes the inverse of a cosine, which is ill-conditioned and can lose accuracy if the distance is small. The distances between cities are sufficiently large to avoid this issue [Chamberlain, 1996]. The benefits are that the Law of Cosines has the advantage of being computationally efficient and clear. The greater circle distance formula is: where is the
latitude, is the
longitude, and
is the earth’s radius. Internal Country Observations: In line with gravity trade theory, if there is only one city level observation in our data, we have assumed that internal distance is one so that the natural log of distance is well defined and equal to zero for internal trade. There are 80 countries with an internal distance of one, primarily islands or small countries (e.g. Andorra, Liechtenstein, Qatar).
4.2 Border CharacteristicsThe border characteristics variables describe the geographic features of each country’s borders. These features include the countries with which each shares a border, the types of geographic borders of each country, and the general geographic region to which each country belongs.
4.2.1 Data SourcesFor the variables that describe geographic features of the country, contiguity, landlocked, island, and region, the information was collected from maps at the United States Library of Congress. We were unable to identify a single publisher or source covering all years in the Dynamic Gravity dataset. Instead, we were forced to segment the years based on the availability of atlases from three prominent publishers. For the years 1948–1972, we used annual C.S. Hammond & Company (Hammond’s Incorporated) world atlases and gazetteers.16 Hammond’s Incorporated stopped publishing the world atlases in 1972. For the years 1973–1995, we used periodic illustrated atlases of the world published by Rand McNally.17 Finally, for 1996 to the present, we used National Geographic Society atlases of the world.18 In many cases—particularly ones involving short, difficult to see borders—we used Google LLC [2017] maps to confirm contiguity.
4.2.2 Variablescontiguity: The variable contiguity is a binary indicator that is equal to 1 if country_o and
country_d share a border in year t. A border is defined as a stretch of land or river.
Countries jointly bordering to a lake or other large body of water but are otherwise
noncontiguous are defined as not sharing a border and contiguity takes the value 0.
landlocked_o: The variable landlocked_o is a binary indicator that is equal to
1 if country_o is landlocked. A country or territory is considered landlocked
if it does not border an ocean or a body of water directly connected to an
ocean.19 landlocked_d: The variable landlocked_d is a binary indicator that is equal to 1 if country_d is
landlocked. A country or territory is considered landlocked if it does not border an ocean or a
body of water directly connected to an ocean.19 island_o: The variable island_o is a binary indicator that is equal to 1 if country_o is an island.
A country or territory is considered an island if it does not share any land borders with another
country or territory. island_d: The variable island_d is a binary indicator that is equal to 1 if country_d it is an
island. A country or territory is considered an island if it does not share any land borders with
another country or territory. region_o: The variable region_o defines location region of country_o. The potential regions are:
Africa, Caribbean, Central America, Central Asia, East Asia, Eurasia, Europe, Middle East,
North America, Pacific, South America, South Asia, Southeast Asia, and Southern
Pole. region_d: The variable region_d defines location region of country_d. The potential regions are:
Africa, Caribbean, Central America, Central Asia, East Asia, Eurasia, Europe, Middle East,
North America, Pacific, South America, South Asia, Southeast Asia, and Southern
Pole.
4.2.3 Variable ConstructionThe border variables were constructed based on the analysis of a time series of atlases at the United States Library of Congress. For each country or territory in the dataset, we examined its representation in these atlases for at least several different points in time between 1948 and 2014. If a border changed for any country during the time frame of the database, we identified the year in which that change was reflected in the atlases and altered its status in the dataset accordingly. Countries and territories are defined as being contiguous if they share a land or river border of any length. We do not, however, recognize water borders beyond rivers, such as lakes or seas. For example, we do not considered Turkmenistan and Azerbaijan to be contiguous despite both being situated on opposite sides of the Caspian Sea. As a result of this, the variable contiguity can reasonably be used to reflect two countries’ ability to trade directly with one another via land transport. We defined a country or territory as being landlocked if it did not border an ocean or other major body of water with direct access to the ocean or major sea. For example, this definition does not consider countries on the Mediterranean Sea as being landlocked. However, countries or territories for whom the only access to the ocean is through a river controlled by other parties are considered landlocked. Similarly, countries bordering only a large inland sea such as the Caspian Sea, are considered landlocked because such a sea is not reflective of significant access to other countries via water. As a result of this definition, the variables landlocked_o and landlocked_d reflect the ability of a country to widely and directly import or export using water transport. We define a country or territory as an island if it does not share a common land or river border with any other country or territory. This definition implies that even countries such as Indonesia, which intuitively seem like island nations, are not technically considered islands in the dataset. As a result, this variable definition reflects the inability of a country to trade with any foreign parties via land transport. The region variables were defined according to general, consistent categorizations in the
atlases consulted during the construction of the other border variables. The variables can
reasonably be used as a general measure of regional proximity. However, no official
source for regional location was consulted so users should be careful about the types of
inference drawn from these variables as they may not accurately reflect other, more
substantial definitions of any of the regions into which each country or territory was
placed. Special Considerations for Contiguity: In general, we have chosen not to extend
common border across colonial relationships. That is, countries or territories do not
share borders with any parties that are contiguous to their colonies or hegemons unless
they, themselves, are contiguous with the party. For example, Mexico does not share a
border with Great Britain during the period when British Honduras exists as a British
territory.20
Western Sahara: (iso3 : ESH, dynamic_code : ESH) is a disputed territory under Moroccan (iso3 :
MAR, dynamic_code : MAR) control. Because Comtrade features trade data for Western Sahara,
we have chosen to recognize this territory. If you do not wish to treat Western Sahara separately
from Morocco, set common border of Morocco and Mauritania (iso3 : MRT, dynamic_code :
MRT) to 1. Byelorussian SSR and Soviet Union: Despite being a member of the Soviet Union (iso3 : SVU,
dynamic_code : SVU), Byelorussian SSR (iso3 : BYS, dynamic_code : BYS) features trade
reported independently while the Soviet Union existed. For this reason, we have chosen
to include independent observations for it as well. Contiguity for Byelorussian SSR
reflects not only the countries or territories to which it is contiguous but also any to
which the Soviet Union is contiguous. Similarly, despite being itself landlocked, we
have defined it as not being landlocked because the Soviet Union is not landlocked.
Special Considerations for Landlocked: Serbia is technically landlocked, but has access to a sea through Montenegros
Port of Bar.21
Observations in the dataset assume it is landlocked. Anyone wishing to recognize this access to sea
could set landlocked_o/d = 1 for Serbia (iso3 : SRB) Internal Country Observations: We have assumed that countries or territories are not contiguous to themselves. The variable contiguity reflects this assumption by setting the variable equal to zero whenever country_o is equal to country_d.
5 Cultural VariablesThis section describes the set of variables that characterize cultural and historical relationships between country pairs. The measures included are those reflecting common spoken languages and former or current colonial ties. The current release contains seven cultural indicators described below.
5.1 Common LanguageCommon spoken languages are thought to be trade facilitating. The variable defined below, common_language, is an indicator of whether a language is spoken by at least some residents of a country pair. In order to develop this indicator, we identify “commonly spoken” languages by residents of each country and territory using the CIA World Factbook definition of languages: [...] a listing of languages spoken in each country and specifies any that are official national or regional languages. When data is available, the languages spoken in each country are broken down according to the percent of the total population speaking each language as a first language. For those countries without available data, languages are listed in rank order based on prevalence, starting with the most-spoken language.22
5.1.1 Data SourcesThe data used for construction of the common_language variable comes from the CIA World Factbook [CIA, 2017].23 The current edition of the CIA World Factbook lists 375 languages, one or more of which are commonly spoken in 276 countries and territories.24
5.1.2 Variablescommon_language: The variable common_language takes value equal to 1 if residents of both country_o and country_d speak at least one common language, as listed in the CIA World Factbook [CIA, 2017].
5.1.3 Variable ConstructionThe common_language variable was constructed by cross-referencing languages listed for each
country in the CIA World Factbook and setting the binary indicator for common language
equal to 1 for all country pairs that share at least one listed common spoken language.
A considerable limitation of this approach is that the source does not always list a
proportion of the population of each country that speak a given language, implying that it
is not generally possible to identify the extent to which two countries speak similar
languages. Nonetheless, it was the most complete data source that we were able to
identify. Dissolved Countries: The most recent edition of the CIA World Factbook at the time of data
construction does not list countries that have ceased to exist such as the Soviet Union or
Yugoslavia. For those countries, the list of languages was supplemented by the earlier editions of
the CIA World Factbook, taking the languages in the last year of each country’s existence as
commonly spoken in that country. Note that in the current version of the Dynamic Gravity
dataset, the common_language does not change over time. It is current as of 2015 for the countries
that exist in that year. For the countries that do not exist, it is current as of the last year of that
country’s existence. Internal Country Observations: We have assumed that for observations in which the origin and destination country are the same, there is at least one spoken language common to residents of that country. The variable common_language reflects this assumption by setting the value of the variable equal to one whenever country_o is equal to country_d.
5.2 Colonial RelationshipsThe set of variables describing colonial relationships consists of six bilateral variables that describe historical colonial relationships between countries. These variables exclusively reflect relationships in which one country was a “colony” of its trading partner and, therefore, does not account for “protectorates”, “possessions”, or other types of possible entities lacking full sovereignty. These six variables are all directional. Three of them indicate that the origin country in a country pair was the colony of the destination country, implying that the destination country was the hegemon or colonizer. The remaining three variables reflect the reversal of that relationship so that the destination country is the colony. Additionally, these six variables also refer to three potential time frames. The first indicates whether the country pair was currently in a colonial relationship during the year of the record, the second indicates whether they were ever in a colonial relationship, and the third indicates whether they were in a colonial relationship at any point after 1945. Below are definitions for each of these six variables, along with the methods and data sources used to construct them.
5.2.1 Data SourcesThe colonial relationship variables are based primarily on two sources, which were necessarily supplemented with several other sources in order to fill in some relationships missing from the two main sources. These combined sources were used to extract data for 325 unique colonial relationships (i.e., colonies with a combination of a distinct colonizer and distinct period of colonization), 263 unique entities that have been colonies, and 23 unique colonizers. The start and end dates for each of these colonial relationships were also extracted from these data sources. The primary data source is Colonial Contiguity Data, 1816-2016, Version 3.1 dataset provided by the Correlates of War (CoW) Project [Correlates of War Project, 2017]. The dataset provides an extensive historical list of dependency relations between hegemons and their respective dependencies between 1816 and 2016. While the Dynamic Gravity dataset utilizes only those dependencies considered “colonies”, the source dataset contains other types of relationships such as protectorates and occupations. Each record provides the names of the two entities in a dependency relationship, the start and end year of the relationship, and the type of dependency. In total, the CoW data describes 2,697 such records. In order to accurately include colonies that have gained independence from their colonizer prior to 1816, and thus are not listed in CoW dataset, we incorporate some additional data from WorldStatemen.org Cahoon [2001–2017], Kammen [1996], Johnson [1915], and Weber [1992]. Specifically, these sources were used to identify the colonial histories of Haiti, Morocco, Oman, Paraguay, Tunisia, and the United States.
5.2.2 VariablesFor each variable, a country or territory is considered a “colony” if it is defined as a colony (i.e.,
lacks any sovereignty from its hegemon) but is not an occupied territory in the CoW Entities
dataset.25
Given that most of the source data provides a start year and end year for colonial relationships,
but not the exact day and month on which the relationship began and ended, all of the variables
below were set equal to 1 for a specific year of the panel if the relationship existed at any time
during that year, even if only for a day. colony_of_destination_current: The binary variable colony_of_destination_current denotes
whether country_o was a colony of country_d in year t. colony_of_origin_current: The binary variable colony_of_origin_current denotes whether
country_d was a colony of country_o in year t. colony_of_destination_ever: The binary variable colony_of_destination_ever denotes whether
country_o has ever been a colony of country_d. colony_of_origin_ever: The binary variable colony_of_origin_ever denotes whether country_d
has ever been a colony of country_o. colony_of_destination_after45: The binary variable colony_of_destination_after45 denotes
whether country_o was a colony of country_d for at least one year after 1945. colony_of_origin_after45: The binary variable colony_of_origin_after45 denotes whether
country_d was a colony of country_o for at least one year after 1945.
5.2.3 Data ConstructionConsiderable care was taken to align the 3-letter CoW country codes with ISO 3-alpha codes to line up bilateral country pairs with the rest of our dataset. In many cases, the codes used by the CoW Project did not match those used by Dynamic Gravity dataset or the International Organization of Standards, requiring the development of an extensive concordance. Records for dependency relationships in which the dependent entity is not a colony have been dropped from the CoW Entities dataset (e.g., protectorates, possessions, etc.). This partly explains why there are 1,142 unique dependencies in the raw, unfiltered data from the CoW Entities dataset, which exceeds the 300 total countries in our gravity dataset. Furthermore, the entities that are not currently a nation-state or have never been a nation-state (e.g., U.S. states such as Alaska or cities such as Aleppo) were excluded when constructing colonial relationship variables for this gravity dataset from the CoW Entities dataset. Therefore all records in the CoW Entities dataset that include an entity that is not part of our gravity dataset were excluded. The list of CoW countries was supplemented by data from WorldStatesmen.org and three
books (described in section 5.2.1) for six countries that gained independence prior to 1816 and
were not covered by the CoW Project dataset. Although WorldStatemen.org includes
information on protectorates and other forms of dependencies beyond colonies, the only
information extracted for this gravity dataset was for entities that were missing from
the CoW data. These countries and their hegemons are Tunisia, independent from
Turkey in 1591; Morocco, independent from France in 1666; Oman, independent from
Portugal in 1741; the United States of America, independent from Great Britain in 1783;
Haiti, independent from France in 1804; and Paraguay, independent from Spain in 1811.
Internal Country Observations: We have assumed that countries cannot be their own colonies or
hegemons. The colonial variables reflect this assumption by setting each of the variables equal to
zero whenever country_o is equal to country_d.
6 Trade Facilitation VariablesThe trade facilitation variables are those that reflect policies put in place by nations for the sake (at least partially) of influencing aspects of international trade. The current data release includes seven variables describing preferential trade agreements between countries and territories and three sets of three variables, each set describing membership in the General Agreement on Tariffs and Trade (GATT), the World Trade Organization (WTO), and the European Union (EU), respectively. These variables, the data sources they were based on, and the details of their construction are discussed in the following subsections.
6.1 Preferential Trade Agreements
6.1.1 Data SourcesThe variables describing active preferential trade agreements (agree_pta, In some cases, the information provided by the WTO RTA-IS is supplemented with additional sources to better account for changes in agreement membership overtime. These cases are detailed below in section 6.1.3 and in appendix E, table E6.
6.1.2 Variablesagree_pta: The variable agree_pta takes a value equal to 1 if country_o and country_d are
engaged in a preferential trade agreement of any type within the given year. The WTO defines
such an agreement as one that grants more favorable conditions to the agreement members than
those faced by other WTO members [WTO User Guide]. Specifically, it is equivalent to the
maximum of agree_pta_goods, agree_pta_services, agree_cu, agree_eia, agree_fta, and agree_psa.
agree_pta_goods: The variable agree_pta_goods takes a value equal to 1 if country_o and
country_d are engaged in a preferential trade agreement that covers goods within the given year.
agree_pta_services: The variable agree_pta_services takes a value equal to 1 if
country_o and country_d are engaged in a preferential trade agreement that covers
services within the given year. Additional information about services provisions in trade
agreements can be found in Article 5 of GATS [General Agreement on Trade in Services
(GATS), 1994].28 agree_cu: The variable agree_cu takes a value equal to one if country_o and country_d
are engaged in a customs union within the given year. Paragraph 8(a) of Article
XXIV of GATT 1994 defines a customs union as territory in which (i) duties and
other regulations of commerce are eliminated for substantially all trade between
members and (ii) substantially the same duties and regulations are are extended by all
members to non member territories [General Agreement on Tariffs and Trade (GATT
1994), 1994].29
agree_eia: The variable agree_eia takes a value equal to one if country_o and country_d are engaged
in an economic integration agreement within the given year. Article V of GATS defines an EIA
agreement pertaining to services that (a) features substantial sectoral coverage and (b) eliminates
substantially all discrimination for these sectors [General Agreement on Trade in Services
(GATS), 1994].30 agree_fta: The variable agree_fta takes a value equal to one if country_o and country_d are engaged in a free trade agreement within the given year. The RTA-IS defines a “free trade agreement” using Paragraph 8(b) of Article XXIV of GATT 1994, which defines a “free-trade area”. This definition describes an FTA as a group of customs territories where duties and regulations of commerce for all products originating within those territories are suspended for all members of the agreement [General Agreement on Tariffs and Trade (GATT 1994), 1994].31 agree_psa: The variable agree_psa takes a value equal to one if country_o and country_d are
engaged in a partial scope agreement within the given year. A partial scope agreement is defined
as one in which only certain products are covered [WTO User Guide]. 6.1.3 Variable ConstructionEach agreement listed by the WTO was expanded across its listed members based primarily on those listed as “original signatories” to the agreement. For each country pair in which both countries were listed as signatories, the agreement was considered active in a given year so long as the agreement was active and the members were signatories for at least one day during that year. An agreement was considered active based on the listed “Date of entry into force” and, if applicable, the “Inactive date”. A considerable limitation of the WTO RTA database is that it often lacks clear information about member countries that enter or exit trade agreements after they have been signed and entered into force. In the database, active agreements report member countries at only two points in time: the point the agreement was signed (original signatories) and the current period (current signatories). Agreements that have become inactive report only original signatories. Thus, from the dataset alone, it can be difficult to identify changes in member countries over time. In many cases, the addition of new member countries is accounted for through the addition of new trade agreements that reflect accessions, such as the “Central American Common Market (CACM) - Accession of Panama” of 2013 that followed the original “Central American Common Market (CACM)” agreement of 1961. In some other cases, the accession or exit of countries can only be identified by comparing the original and current signatories. Of the 289 active trade agreements recognized by the WTO, 48 featured differences in original and current signatories that needed to be addressed. Because agreements that have become inactive do not report a second list of members, we are unable to identify changes in members for those agreements using the RTA-IS. For the active agreements that we are able to identify differences, we used additional sources to determine the years in which countries entered or exited and edited agreements or added accession agreements to capture this variation. A full list of the agreements that required changes to membership can be found in Table E6 in appendix E. Of the 48 agreements that required special attention, each one was addressed in order to improve its accuracy to the best of our ability. In 31 of these cases, the difference in original and current members was a result of European Union (EU) accessions, requiring only the introduction of new accession agreements that reflect the progressive growth in EU membership. In these cases, new EU members were added to trade agreements in the year in which the entered the EU. Other agreements required additional information to establish the timeline of countries exiting and entering. This process has corrected many of these issues but it is likely that there still exist situations in which some members are either missing from agreements or included in agreements erroneously. In addition to these issues, several other special circumstances arose that required additional
considerations and special treatment: Partially Missing Trade Agreements: The WTO lists trade agreements in two ways. The first is through a table of trade agreements that provides some information such as the date of entry into force and the type of coverage. This list does not, however, list signatories. Thus, we must turn to a second collection of “cards” that correspond to each of the trade agreements. The cards provide much of the same information as the aforementioned list but also include information on member countries. The construction of the data required the extraction of member countries from these cards to be combined with the list of trade agreements. In three cases, there are trade agreements listed that do not have corresponding cards.
These agreements are “EU - Colombia and Peru”, “Eurasian Economic Community
(EARC)”, and “Russian Federation - Tajikistan”. Because there was no listing of member
countries, we have not included these agreements in the data set. Nonetheless, in the
first two cases, there do exist similarly scoped trade agreements. Specifically, “EU -
Colombia and Peru and Ecuador” and several “Eurasian Economic Union” agreements are
included. Issues With Listed Member Countries: In several cases, the WTO recognizes agreements that do not list member countries or list member countries that did not exist when the card suggests they do. These situations are described below. The “Borneo Free Trade Area” agreement does not specify any member countries. The original text of the agreement states that it existed between North Borneo and Sarawak between 1962 and 1969. We do not include this agreement in the data because neither member country was independent prior to their joining Malaysia in 1963 and are not included in our collection of countries. The WTO does not appear to list Czechoslovakia as a member country during years in which it existed but does include Slovakia and the Czech Republic prior to their existence. For example, the “EFTA - Czechoslovakia” agreement, which lasted from 1992-1993, does not list Czechoslovakia as a member but does list the Czech Republic, despite it not yet existing. To address this, we have classified both Slovakia and and the Czech Republic as Czechoslovakia between 1948 and 1992. Several countries appear to be listed with inappropriate country names for certain periods of time. In each case, we have attempted to correct this where it appears to be appropriate. Between the years 1949 and 1990, we have assumed that “Germany”, as listed by the WTO, is “West Germany”. “Saint Kitts and Nevis” was renamed to “Saint Christopher-Nevis-Anguilla” for 1967–1982. “Burkina Faso” was renamed “Upper Volta” for 1960-1983. “Congo, Democratic Republic” was renamed “Zaire” for 1971–1997. Several agreements list member countries prior to their gaining independence. In these cases, we have not included countries as members until they become independent and appear in our data set. The “Arab Common Market” agreement lists Yemen as a member prior to the unification of
North and South Yemen. The original text of the agreement lists the Yemenite Arab Republic as
an original member so we include North Yemen as a member and exclude South Yemen during the
relevant years (1948–1989). Internal Country Observations: Observations in which the origin and destination country are the same are included in the data set. We have assumed that countries cannot be in a trade agreement with themselves by definition. The trade agreement variables reflect this assumption by setting each of the variables equal to zero whenever country_o is equal to country_d.
6.2 EU, WTO, & GATT Membership
6.2.1 Data SourcesData describing membership in the European Union was based on information made available directly from the European Union. The European Union provides a list of the 28 members and the dates at which they joined Union between 1958 and 2016.32 Data describing membership in the World Trade Organization and the General Agreement on Tariffs and Trade was based on information made available by the World Trade Organization.33 For both groups, the WTO provides a comprehensive list of member countries and their date of membership [The World Trade Organization, a].34 Table E7 in appendix E lists the members and accession dates of the GATT and WTO, respectively.
6.2.2 Variablesmember_eu_o: The variable member_eu_o takes the value one if country_o is a member of the
European Union in the given year. member_eu_d: The variable member_eu_d takes the value one if country_d is a member of the
European Union in the given year. member_wto_o: The variable wto_o takes the value one if country_o is a member of the World
Trade Organization in the given year. member_wto_d: The variable member_wto_d takes the value one if country_d is a member of the
World Trade Organization in the given year. member_gatt_o: The variable member_gatt_o takes the value one if country_o is a member of the
General Agreement on Tariffs and Trade in the given year. member_gatt_d: The variable member_gatt_d takes the value one if country_d is a member of
the General Agreement on Tariffs and Trade in the given year. member_eu_joint: The variable member_eu_joint takes the value one if both country_o and
country_d are members of the European Union in the given year. member_wto_joint: The variable member_wto_joint takes the value one if both country_o and
country_d are members of the World Trade Organization in the given year. member_gatt_joint: The variable member_gatt_joint takes the value one if both country_o and
country_d are members of the General Agreement on Tariffs and Trade in the given
year. Some summary information for these variables is depicted in figures 5 and 6 6.2.3 Variable ConstructionUsing the sources described above, countries were assigned membership into each of these groups
based on the dates on which they joined. Countries are considered members in a given year
so long as they belonged to the organization for at least one day during that year.
Internal Country Observations: For observations in which the origin and destination countries are the same, the “joint” variables member_eu_joint, member_wto_joint, and member_gatt_joint are assumed to take the value of 1 if the country is a member of the respective organization and 0 otherwise. In doing so, we have implicitly assumed that internal, domestic trade qualifies as within organization trade.
7 Measures of Institutional StabilityThe institutional stability variables are those that measure various events a country may be involved in or characteristics of a country that may influence its propensity or desire to conduct international trade with a given trading partner due to different aspects of the country’s stability. The current data release includes three sets of variables, each describing a different form of institutional stability in either a bilateral (hostility level towards another country or sanctions with another country) or unilateral (level of political stability) manner. These variables, the data sources they were based on, and the details of their construction are discussed in the following subsections.
7.1 Hostility
7.1.1 Data SourcesSource data for the variables reflecting hostility (hostility_level) comes from The Correlates of War Project’s (CoW) Militarized Interstate Disputes dataset (MIDB v4.01) [Palmer et al., 2015].35 The dataset is organized by dispute with each observation identifying a country that participated in a particular dispute. A variable denoting what “side” of the conflict a given participant existed indicates a given country’s allies and enemies in the conflict. From that we are able to derive bilateral lists of country pairs exhibiting some level of conflict within a given year as well as the level of hostility of each conflict.
7.1.2 Variableshostility_level_o: The variable hostility_level_o is coded 1–5 and denotes the level of hostility of country_o towards country_d in year t.36
hostility_level_d: The variable hostility_level_d is coded 1–5 and denotes the level of hostility of country_d towards country_o in year t.
Figure ?? shows the number of conflicts by hostility_level by year. 7.1.3 Variable ConstructionData was constructed by expanding the information provided by the source data into a bilateral format over time to fit the structure of the Dynamic Gravity dataset. Care was then given to properly concord the country-identifying codes used by the CoW Project to those used in the Dynamic Gravity dataset. Some pairs of countries feature more than one conflict in a given year. In these cases, hostility_level takes maximum of the levels of hostility a country has towards another country in that given year. For example, Israel has three distinct conflicts with Syria in 1950; two of which are coded as having a hostility level of 4 while the other reflects a hostility level of 1. Therefore, hostility_level_o = 4 for the observation where Israel is the origin country, Syria is the destination country, and the year is 1950. It is worth noting that hostility_level is not necessarily symmetrical with respect to a country
pair, reflecting the possibility that the level of hostility one country extends towards another in a
given year is not the same is it experiences from that country. For example, in 1948 the CoW data
observes the hostility level of France towards Thailand as a 4, while the hostility level of Thailand
towards France is a 1. Assigning Conflicts to Countries: Considerable care was taken to align CoW country codes with ISO3 codes to line up bilateral country pairs with the rest of our dataset, and then various observations needed additional special attention.37 In total, 122 countries had CoW and ISO codes that differed. In these cases manual changes needed to be made to the CoW data for merging purposes. Additionally, since the CoW data is intended to list participants in a given dispute, it often
lists participants that do not immediately concord to a country. For example, participants
can take part in a conflict before or after their country exists (e.g. civil wars), or the
country can change identity during conflicts (USSR conflicts post 1992). For example,
North Vietnam and South Vietnam were separately “at conflict” with various other
members of the Vietnam War even after the country officially reunited as Vietnam in
the CoW data. Similarly, there were still conflicts involving the Soviet Union after
it dissolved. For instances where a participant in a conflict logically concorded to a
different country that we observe in that particular year due to changes in the geopolitical
landscape, we have accounted for these and assigned them appropriately. However, conflicts
where a participant did not logically concord to an existing country in a given year
that we observe as existing, we chose to exclude those participant-year pairs from our
dataset. Internal Country Observations: The CoW MIDB dataset does not include civil wars, explicitly describing the disputes it catalogs as, “historical cases of conflict in which the threat, display or use of military force short of war by one member state is explicitly directed towards the government, official representatives, official forces, property, or territory of another state.”38 As a result, we do not observe civil wars, domestic unrest, or other such situations and set internal hostility levels equal to zero.
7.2 Polity
7.2.1 Data SourcesThe variables describing political stability (polity and polity_absolute) are based on data from the Polity IV Project [Marshall et al., 2016]. The Polity data scores countries annually on an ordinal scale ranging from -10 to 10 denoting their level of democracy or autocracy.39 A score of -10 denotes a country that is strongly autocratic, a 10 denotes a country that is strongly democratic, and a 0 denotes a country neither autocratic nor democratic (i.e. lacking in government) with intermediate values describing situations within that spectrum.40 In total, the current dataset includes 194 countries, although not all countries exist each year. For example, countries like Germany, Japan, and East Germany are all omitted in certain years, typically around with years following conflicts or changes in the state (e.g. West Germany and East Germany unifying). In their most recent release, Polity scores cover 1948 to 2016. Appendix tables F8 and F9 list countries that exist in our dataset but are not tracked by the Polity IV project for any year, and countries that are tracked in some years but not the entire span of our data.
7.2.2 Variablespolity_o: The variable polity_o is the polity score in year t of country_o. polity_d: The variable polity_d is the polity score in year t of country_d. polity_absolute_o: The variable polity_absolute_o is the absolute value of the polity score in year
t of country_o. polity_absolute_d: The variable polity_absolute_d is the absolute value of the polity score in year
t of country_d.
7.2.3 Variable ConstructionIn addition to reporting the Polity scores reported by the Polity Project, we also provide an alternative measure (polity_absolute) that reflects a slightly different interpretation of the information. A considerable limitation to the way in which Polity indexes stability is a lack of a real understanding about the relative impact of moving from one value to the next (i.e. -10 to 1 means that the country is now more democratic but less stable). In light of this, we have also provided a secnd measure polity_absolute that is the absolute value of these scores. The motivation for this measure is the notion that a higher nominal value corresponds to a higher level of stability within a country regardless of its form of governance. There are also a few instances in the data where we observe a country twice within a single year. These cases are due to Polity identifying the country under multiple different regimes within a year that concord to the same country in the Dynamic Gravity dataset. In these situations, we had to make special decisions about which of two polity scores to assign them. For example, The Polity project assigns polity scores to Sudan, South Sudan, and North Sudan in 2011, while we only observe Sudan from 1956-2016 and South Sudan from 2011-2016. We therefore used North Sudan’s polity scores for 2011 and onwards for “Sudan” coinciding with the birth of South Sudan. Similar issues arise with Ethiopia in 1993, West Germany/Germany in 1990, and Yugoslavia/Serbia and Montenegro in 1991. The Polity Project group has both increased its country coverage overtime and updated past
polity scores to reflect changes in their institutions. In general, these updates are indicative of
countries trending towards the “strongly democratic” end of the spectrum. Table 2 shows the spread
of polity and polity_absolute scores across several years where the Polity Project made substantial
updates.41
As shown in the table, countries appear to be becoming more stable—and more democratically
stable in particular—over time.
As was the case with the hostility_level variables, the Polity Project uses CoW codes for its country classification system, so the same careful considerations had to be taken here as noted previously. Along with aligning differences in labeling conventions, if a polity score logically belonged to a country we observe under a different name in a given year, those adjustments have been made. In the case that a country received a polity score before we observe its existence, its score in those years have not been included. 7.3 Economic Sanctions
7.3.1 Data SourcesThere are four variables describing economic sanctions directed towards partners. These variables are derived from the Threat and Impositions of Sanctions (TIES) dataset [Morgan et al., 2014]. The TIES dataset records economic sanctions imposed between 1945–2005 and provides end dates for these sanctions through 2012 (when the dataset was last updated). For sanctions still in place in 2012, it provides an “as of” date for the last time they verified the sanction remained in place. We used these “as of” dates as the end year for sanctions that TIES does not feature a confirmed end date.
7.3.2 Variablessanction_threat: The binary variable sanction_threat denotes whether or not a threat to impose
any sort of sanction existed between country_o and country_d in year t. sanction_threat_trade: The binary variable trade_sanction_threat denotes whether or not a
threat to impose a trade sanction existed between country_o and country_d in year
t. sanction_imposition: The binary variable sanction_imposition denotes whether or not any sort
of sanction existed between country_o and country_d in year t. sanction_imposition_trade: The binary variable trade_sanction_imposition denotes whether
or not any sort of trade sanction existed between country_o and country_d in year
t. 7.3.3 Variable ConstructionThe variables sanction_threat and sanction_imposition correspond to dummy variables in the
TIES dataset and indicate whether or not a pair of countries has a sanction threatened or in place
between them in a given year. The trade specific sanction variables, sanction_threat_trade
and sanction_imposition_trade, are a refinement of this information, taking a value
of one one if the TIES variables for type of sanctioned threatened and/or imposed
correspond to trade specific policy decisions. Specifically we coded sanction_threat_trade = 1
if the “sanctioned type threatened” variable in TIES was 2, 3, 4, 5, 6, 9, or 10. For
sanction_imposition_trade, we coded that as 1 if “sanction type” in TIES was 1, 2, 3, 4, 5, 8, or
9.42
Figure 8 demonstrates the frequency of sanction threats and impositions over time. It’s
worth noting here that the threat of sanction and imposition of sanction variables are
constructed independently of one another and that threat is not a requirement for eventual
imposition. In total, the TIES dataset has 1,412 unique cases of a sanction either being
threatened or imposed (or both). In 567 cases a sanction was only threatened, in 359 cases a
sanction was only imposed, and in 486 cases a sanction was both threatened and imposed.
Assigning Sanctions to Countries: Careful attention had to be paid to lining up counties in TIES to countries in the Dynamic Gravity dataset due to differences in the naming conventions between the two. Conveniently, the TIES dataset identifies countries using the same three digit code as the Correlates of War datasets. This permitted us to use the same concordance developed and described in section 7.1.3. Many sanctions in the TIES database feature not only a principal sender or senders, but also sanctioning institutions. The Dynamic Gravity dataset currently only pairs the senders specifically observed in the TIES dataset to the target of a given instance of economic sanctions.43 TIES includes three instances in which a sender sanctions a single country twice within the same year. In these cases, we assigned the maximum as we did with hostility_level above. Additionally, there were 38 instances in which a country involved in a sanction in a given year
did not exist in the Dynamic Gravity dataset. An example of this is the United States
sanctioning East Germany in 1948. In these cases, the observations were not included.
Internal Country Observations: The source data does not report sanctions threatened or imposed by parties on themselves for the countries we have identified in the Dynamic Gravity dataset. Therefore, all sanctions variables for observations where country_o is equal to country_d are set to zero. References
The World Factbook 2017. Central Intelligence Agency, Washington, DC, 2017. Thomas Brinkhoff. City Population. URL https://www.citypopulation.de. (Last accessed on February 7, 2018). Ben M. Cahoon. World Statesmen, 2001–2017. URL https://www.worldstatesmen.org/COLONIES.html. (Last accessed on February 7, 2018). CBS—Statistics Netherlands. Bevolkingsontwikkeling Caribisch Nederland; geboorte, sterfte, migratie, 2012. Central Intelligence Agency. Cocos (Keeling) Islands. In The World Factbook. 2012. Robert G. Chamberlain. Q5.1: What is the best way to calculate the great circle distance between 2 points? Geographic Information Systems FAQ, October 1996. URL https://www.faqs.org/faqs/geography/infosystems-faq/. (Last accessed on February 7, 2018). William A. Cleveland, editor. Britannica Atlas. Encyclopaedia Britannica, Inc, Chicago, 1994. Correlates of War Project. Colonial Contiguity Data, 1816–2016. Version 3.1. 2017. Department of Economics and Social Affaris, Statistics Division. World Statistics Pocketbook 2016 edition. United Nations, 2016. European Union. Countries. URL https://europa.eu/european-union/about-eu/countries_en. (Last accessed on February 7, 2018). Robert C. Feenstra, Robert Inklaar, and Marcel P. Timmer. The Next Generation of the Penn World Table. American Economic Review, 105(10):3150–3182, 2015. available for download at www.ggdc.net/pwt. French National Institute of Statistics and Economic Studies. Base communale des aires urbaines 2012, 2012. General Agreement on Tariffs and Trade (GATT 1994). 1867 U.N.T.S. 187, 1994. URL https://www.wto.org/english/tratop_e/region_e/regatt_e.htm. (Last accessed on February 8, 2018). General Agreement on Trade in Services (GATS). 1869 U.N.T.S. 183, 1994. URL https://www.wto.org/english/docs_e/legal_e/26-gats_01_e.htm. (Last accessed on February 8, 2018). Geohack. URL https://tools.wmflabs.org/geohack/. Google LLC. Google Maps, 2017. URL https://www.google.com/maps. (Last accessed on February 7, 2018). Tamara Gurevich and Peter Herman. The Dynamic Gravity Dataset: 1948–2016, 2018. USITC Working Paper 2018–02–A. Hammond Incorporated. Hammond’s atlas of the world. C.S. Hammond & Company, New York, 1948. Hammond Incorporated. Hammond’s atlas of the world. Hammond Incorporated, Maplewood, New Jersey, 1972. K. Head, T. Mayer, and J. Ries. The erosion of colonial trade linkages after independence. Journal of International Economics, 81(1):1–14, 2010. Keith Head and Thierry Mayer. Illusory Border Effects: Distance Mismeasurement Inflates Estimates of Home Bias in Trade. CEPII, 2002. Amandus Johnson. The Swedes on the Delaware, 1638–1664. Swedish Colonial Society, Philadelphia, 1915. Michael G. Kammen. Colonial New York: A History. Oxford University Press on Demand, New York, 1996. Michael R Kenwick, Matthew Lane, Benjamin Ostick, and Glenn Palmer. Codebook for the Militarized Interstate Dispute Data, Version 4.0. December 2013. Jan J. Lehmeyer. Population Statistics, 1999–2006. URL www.populstat.info. (Last accessed on February 7, 2018). Monty G. Marshall, Ted Robert Gurr, and Keith Jaggers. Polity IV Project Dataset User’s Manual, v.2016. Polity IV Project, 2016. Thierry Mayer and Soledad Zignago. Market Access in Global and Regional Trade. CEPII Working Paper No 2005–02, January 2005. Thierry Mayer and Soledad Zignago. Notes on CEPIIs Distances Measures: The GeoDist Database. CEPII Working Paper No. 2011–25, December 2011. T. Clifton Morgan, Navin Bapat, and Yoshi Kobayashi. The Threat and Imposition of Economic Sanctions 1945–2005: Updating the TIES dataset. Conflict Management and Peace Science, 31(5):541–558, 2014. National Geographic Society. National Geographic atlas of the world. National Geographic Society, 6th edition, 1996. National Geographic Society. Atlas of the world. National Geographic Society, Washington, D.C., 10th edition, 2014. Palestinian Central Bureau of Statistics. Estimated Population in the Palestinian Territory Mid-Year by Governorate, 1997–2016. URL https://pcbs.gov.ps/Portals/_Rainbow/Documents/gover_e.htm. (Last accessed on February 7, 2018). Glenn Palmer, Vito D’Orazio, Michael Kenwick, and Matthew Lane. The MID4 Dataset 2002–2010: Procedures, Coding Rules, and Description. Conflict Management and Peace Science, 32(2):222–242, 2015. M L Pinkovskiy and X Sala-i Martin. Lights, camera, ... income! Illuminating the national accounts-household surveys debate. Quartrly Journal of Economics, 131(2): 579–631, 2016a. M L Pinkovskiy and X Sala-i Martin. Newer need not be better: Evaluating the Penn World Tables and the World Development Indicators using night-time lights. 2016b. Pitcairn Islands Study Center, Pacific Union College. Census Data. URL https://library.puc.edu/pitcairn/pitcairn/census.shtml. (Last accessed on February 7, 2018). Rand McNally. Rand McNally illustrated world atlas. Rand McNally and Company, Chicago, 1973. Rand McNally. Picture atlas of the world. Rand McNally, Skokie, IL, 1995. Republic of Kiribati Ministry of Home Affairs. Republic of Kiribati: Report of the 1978 Census of Population and Housing, 1980. Simplemaps.com. World Cities Database (Basic), 2015. URL https://simplemaps.com/data/world-cities. (Last accessed on February 7, 2018). St Helena Statistics Office. St Helena 2016 Population and Housing Census: Summary Report, 2016. States of Jersey Statistics Unit. Jersey Census 2011. 2011. Douglas Stinnett, Jaroslav Tir, Philip Schafer, Paul Diehl, and Charles Gochman. The Correlates of War Project Direct Contiguity Data, Version 3. Conflict Management and Peace Science, 19(2):59–67, 2002. The World Bank, 2016. URL https://data.worldbank.org/data-catalog/world-development-indicators. The World Trade Organization. The 128 countries that had signed GATT by 1994, a. The World Trade Organization. The GATT years: from Havana to Marrakesh, b. URL https://www.wto.org/english/thewto_e/whatis_e/tif_e/fact4_e.htm. (Last accessed on February 8, 2018). The World Trade Organization. Members and Observers, c. The World Trade Organization. Regional Trade Agreements Information System (RTA-IS); User Guide, d. URL https://rtais.wto.org/UserGuide/RTAIS_USER_GUIDE_EN.html#_Toc201649641. (Last accessed on February 7, 2018). The World Trade Organization. Regional Trade Agreements Information System (RTA-IS), e. URL rtais.wto.org. (Last accessed on February 7, 2018). Tokelau National Statistics Office. 2006 Tokelau Census of Population and Dwellings: 2006 Census of Tokelau Analytical Report, 2006. Produced by Statistics New Zealand and the Office of the Council for the Ongoing Government of Tokelau. United Nations, Department of Economic and Social Affairs, Population Division. World urbanization prospects: The 2014 revision cd-rom edition. 2014. U.S. Census Bureau. 1980 Census of Population and Housing. Washington, DC, 1983. U.S. Census Bureau. 2000 Census of Population and Housing, Population and Housing Unit Counts, United States Summary. Washington, DC, 2004. David J. Weber. The Spanish Frontier in North America. Yale University Press, 1992. Eric W. Weisstein. Great Circle. In MathWorld - A Wolfram Web Resource. 2017. Appendices
A List of Variables
B List of Unique dynamic_code’s
C Matching Dynamic Gravity to Comtrade and WITS
D List Supplemental Cities for Geographic Variables
E List of Trade Agreements and International Organization Members
F List of Countries Without Polity Scores
∗We thank Renato Barreda, Fernando Gracia, Nuhami Mandefro, and Richard Nugent for research assistance in completion of this project. 1Throughout this documentation, we often abbreviate “countries and territories” with “countries” when referring to a record for the sake of readability. 2The dataset is available for download at gravity.usitc.gov. For comparisons of this dataset with other existing gravity datasets, see Gurevich and Herman [2018]. 3For example, country_o and country_d or iso3_o and iso3_d. 4For details about ISO alpha-3 codes see https://www.iso.org/home.html. When matching this dataset with trade or other data, the user should use a combination of iso3_o, iso3_d, and year to uniquely identify each data row. Appendix table C3 provides a concordance between Dynamic Gravity dataset, Comtrade dataset, and WITS dataset for countries where those identifiers differ. 5WTO data can be accessed at https://stat.wto.org/Home/WSDBHome.aspx?Language=E; UN Comtrade data are available at https://comtrade.un.org/data/. 6For the countries that currently exist, we used https://www.cia.gov/library/publications/the-world-factbook/fields/2088.html For those that dissolved or changed names between 1948 and 2016, we used CIA World Factbook website to search for dissolution dates and dates of name changes. 7See section 4.2 for notes on special treatment. 8Western Sahara and Morocco are in dispute over independence. See section 4.2 for a note on special treatment. 9More detailed information about derivations of capital stock can be obtained from the PWT documentation at https://www.rug.nl/ggdc/docs/user_guide_to_pwt90_data_files.pdf 10For a more detailed discussion of differences between the PWT and the WDI measures see Pinkovskiy and Sala-i Martin [2016a] and Pinkovskiy and Sala-i Martin [2016b]. 12Palestinian Central Bureau of Statistics, Tokelau National Statistics Office [2006], CIA [2017], St Helena Statistics Office [2016], Pitcairn Islands Study Center, Pacific Union College, Department of Economics and Social Affaris, Statistics Division [2016], U.S. Census Bureau [2004], States of Jersey Statistics Unit [2011], Central Intelligence Agency [2012], French National Institute of Statistics and Economic Studies [2012], Brinkhoff, CBS—Statistics Netherlands [2012]. 13These countries (and their former names) are: Sri Lanka (Ceylon), Vanuatu (New Hebrides), the Democratic Republic of the Congo (Zaire), Myanmar (Burma), Benin (Dahomey), Burkina Faso (Upper Volta), and Zimbabwe (Rhodesia). The last country, Romania, did not change its name, but changed its ISO code from ROM to ROU in 2002. 14The U.S. Miscellaneous Pacific Islands, Bouvet Island, Heard and McDonald Islands, and the Neutral Zone between Iraq and Saudi Arabia have no permanent residents, but we assumed a population of 1. 15This approach does not take topography into consideration. 16The first edition used is Hammond’s, 1948, the last edition used is Hammond’s, 1972. 17The first edition used is Rand McNally, 1973, the last edition used is Rand McNally, 1995. 18The first edition used is National Geographic Society, 1996, the last edition used is National Geographic Society, 2014. 19See the section on variable construction for a more thorough definition. 20British Honduras renamed and gained independence as Belize in 1981. 21https://serbianna.com/blogs/michaletos/archives/633 22https://www.cia.gov/library/publications/the-world-factbook/docs/notesanddefs.html accessed on Dec-20-2017 23The CIA World Factbook is available at https://www.cia.gov/library/publications/the-world-factbook/. 24Full list of these languages by country is available at https://www.cia.gov/library/publications/the-world-factbook/fields/2098.html 25We have also included several known colonial relationships predating the coverage of the CoW data. Please see the preceding subsection for details. 27See WTO User Guide for more complete documentation on the RTA-IS. 28https://www.wto.org/english/docs_e/legal_e/26-gats_01_e.htm 29GATT Article XXIV - https://www.wto.org/english/tratop_e/region_e/regatt_e.htm 32https://europa.eu/european-union/about-eu/countries_en 33The GATT was an agreement signed in 1948 and later replaced with the formation of the WTO in 1995 that governed international trade among its members. The agreement featured 23 original members and grew to 128 by 1994 [The World Trade Organization, b,a]. 34https://www.wto.org/english/thewto_e/whatis_e/tif_e/org6_e.htm\# and https://www.wto.org/english/thewto_e/gattmem_e.htm 35For more information, the MID data, or the MID codebooks, see: https://www.correlatesofwar.org/data-sets/MIDs. 36MIDB data aggregates another variable, HiAct, to create their HostLev variable. For more information, see MID v4.0 codebook here: https://www.correlatesofwar.org/data-sets/MIDs. 37For the complete list of codes see the CoW country codes dataset here: https://www.correlatesofwar.org/data-sets/cow-country-codes. 38For more information see: https://www.correlatesofwar.org/data-sets/MIDs. 39For more information on the Polity Project, see: https://www.systemicpeace.org/polityproject.html. Worth noting is that a Polity V version of the dataset is currently in development as of the time of writing. 40For more information on polity score coding, see: https://www.systemicpeace.org/inscr/p4manualv2016.pdf 41For more information on specific country coverage in a certain year, see: https://www.systemicpeace.org/polityproject.html. 42For more information on TIES sanction types see: https://www.unc.edu/~bapat/TIES.htm. 43The variable institutionid in the TIES database lists any international institution involved in a given international sanction using a CoW institution code. We intend to incorporate the members of these various international institutions on a time-variant basis in a future update. For the complete list of codes see the CoW intergovernmental organizations dataset here: https://www.correlatesofwar.org/data-sets/IGOs. |