Skip to content

Construction

This section describes the principal steps involved in the construction of the ITPD-E and discusses the database’s key features and dimensions. These steps are common to all four broad sectors included in the ITPD-E. More detail regarding data sources for each sector and any sector-specific steps undertaken to construct the data are described in the accompanying paper.

Construction of the International Trade Data

Each international trade flow is recorded and reported separately by the two parties in the transaction, exporter and importer. In order to take full advantage of all reported international trade data we use a mirroring procedure described below. In goods trade, data on imports is considered more reliable than data on exports because of the oversight from governments enforcing their tariff schedule and other import regulations.1 In addition, importer-reported trade values, which are reported on the c.i.f. basis, are consistent with gravity theoretical methodology. Therefore, we primarily use importer-reported values for goods trade, as is also done for example by Feenstra et al. (2005).

Since there are no tariffs levied on services trade, importing countries lack the fiscal incentive to carefully keep track of services imports. Services export data, in turn, are often collected as part of mandatory surveys run by national statistical agencies and/or central banks, and therefore are considered more accurate than imports. For these reasons, and because there is no c.i.f./f.o.b. distinction in services trade, we primarily use exporter-reported values for services trade.

In our mirroring procedure for goods, we use exports reported by partner countries to fill in missing values for the import values. For services, we use reported imports to fill in missing values of exports. To denote mirrored cases, the ITPD-E includes a flag variable named flag_mirror. In Section 5 we demonstrate that the additional observations that arise from mirroring do not affect estimated coefficients of standard gravity variables. Nevertheless, their inclusion is potentially important to ensure proper country and industry coverage.

Construction of the Domestic Trade Data

Domestic trade is calculated as the difference between the (gross) values of total production and total exports. Total exports are constructed as the sum of bilateral trade, as reported in the ITPD-E, for each exporting country. In the relatively few instances in which our procedure results in negative domestic trade values, we delete those observations from the database. The sources of output and trade data are described in the accompanying paper.

Final Procedures

We combine the domestic and international trade flows for each of the 170 ITPD-E industries into a single database. Then, we create a balanced database across all dimensions of the ITPD-E by filling all missing international trade observations with zeroes. In order to distinguish between the trade zeroes that exist in the original raw data and the new zeroes that are added to balance the data, we create a flag variable called flag_zero, which is equal to ‘r’ for observations with zeroes coming from original data sources, ‘p’ for observations with positive trade flows, and ‘u’ for observations filled with zeroes in this step. We do this for international trade observations only, not for domestic trade observations.

Since the previous procedure results in the addition of many zeroes that are irrelevant for gravity estimations, in order to eliminate outliers (e.g., countries with very few observations that would be dropped in standard gravity regressions), and to ensure that ITPD-E is suitable for even the most demanding gravity specifications, we use the Poisson pseudo-maximum likelihood (PPML) estimator with a demanding set of fixed effects (i.e., exporter-time, importer-time, and directional country-pair fixed effects) to estimate gravity for each of the 170 ITPD-E industries. Then, we retain the estimating sample for each industry as our final industry-level data. Note that this procedure will eliminate all irrelevant zeroes from our sample, e.g., if a country does not export a given industry in a given year, then the corresponding zeroes will be captured perfectly by this country’s exporter-time fixed effect. Thus, even after the rectangularisation of the data that we described in the previous paragraph, ITPD-E remains an unbalanced database as some countries do not appear in some years and/or ITPD-E industries.


  1. See for example World Bank WITS documentation and Timmer et al. (2012). Specifically for developing countries, Rozanski and Yeats (1994) show that export data are less reliable than import data.