EstimationModel

Class

gme.EstimationModel(estimation_data: gme.EstimationData = None, lhs_var: str = None, rhs_var: List[str] = None, sector_by_sector: bool = False, drop_imp_exp: List[str] = [ ], drop_imp: List[str] = [ ], drop_exp: List[str] = [ ], keep_imp_exp: List[str] = [ ], keep_imp: List[str] = [ ], keep_exp: List[str] = [ ], drop_years: List[str] = [ ], keep_years: List[str] = [ ], drop_missing: bool = True, variables_to_drop_missing: List[str] = None, fixed_effects:List[Union[str,List[str]]] = [ ], omit_fixed_effect:List[Union[str,List[str]]] = ['exporter','exporter-year', 'year'], std_errors:str = 'HC1', iteration_limit:int = 1000, drop_intratrade:bool = False, retain_modified_data:bool = False, full_results:bool = False)

Description

The class used to specify and run an gravity estimation. A gme.EstimationData must be supplied along with a collection of largely optional arguments that specify variables to include, fixed effects to create, and how to perform the regression, among other options. After the definition of the model, additional methods such as .estimate(), which performs the PPML estimation, or .combine_sector_results(), which combines the results for each sector (if applicable) can be called.

Arguments

estimation_data: gme.EstimationData
A GME EstimationData to use as the basis of the gravity model.

spec_name: (optional) str
A name for the model.

lhs_var: str
The column name of the variable to be used as the dependent or 'left-hand-side' variable in the regression.

rhs_var: List[str]
A list of column names for the independent or 'right-hand-side' variable(s) to be used in the
regression.

sector_by_sector: bool
If true, separate models are estimated for each sector, individually. Default is False. If True,
a sector_var_name must have been supplied to the EstimationData.

drop_imp_exp: (optional) List[str]
A list of country identifiers to be excluded from the estimation when they appear as an
importer or exporter.

drop_imp: (optional) List[str]
A list of country identifiers to be excluded from the estimation when they appear as an
importer.

drop_exp: (optional) List[str]
A list of country identifiers to be excluded from the estimation when they appear as an
exporter.

keep_imp_exp: (optional) List[str]
A list of countries to include in the estimation as either importers or exporters. All others not
specified are excluded.

keep_imp: (optional) List[str]
A list of countries to include in the estimation as importers. All others not specified are
excluded.

keep_exp: (optional) List[str]
A list of countries to include in the estimation as exporters. All others not specified are
excluded.

drop_years: (optional) list
A list of years to exclude from the estimation. The list elements should match the dtype of
the year column in the EstimationData.

keep_years: (optional) list
A list of years to include in the estimation. The list elements should match the dtype of the
year column in the EstimationData.

drop_missing: bool
If True, rows with missing values are dropped. Default is true, which drops if observations
are missing in any of the columns specified by lhs_var or rhs_var.

variables_to_drop_missing: (optional) List[str]
A list of column names for specifying which columns to check for missing values when
dropping rows.

fixed_effects: (optional) List[Union[str,List[str]]]
A list of variables to construct fixed effects based on. Can accept single string entries, which
create fixed effects corresponding to that variable or lists of strings that create fixed effects
corresponding to the interaction of the list items. For example,
fixed_effects = ['importer',['exporter','year']] would create a set of importer fixed effects
and a set of exporter-year fixed effects.

omit_fixed_effect: (optional) List[Union[str,List[str]]]
The fixed effect category from which to drop a fixed effect to avoid collinearity. The entry
should be a subset of the list supplied for fixed_effects. In each case, the last fixed effect
is dropped. If not specified, the colinearity diagnostics will identify a column to drop on its
own.

std_errors: (optional) str
Specifies the type of standard errors to be computed. Default is HC1, heteroskedascticity
robust errors. See statsmodels documentation for alternative options.

iteration_limit: (optional) int
Upper limit on the number of iterations for the estimation procedure. Default is 1000.

drop_intratrade: (optional) bool
If True, intra-national trade flows (importer == exporter) are excluded from the regression.
Default is False.

retain_modified_data: (optional) bool
If True, the estimation DataFrames for each sector after they have been (potentially) modified
during the pre-diagnostics for collinearity and convergence issues. Default is False.
WARNING: these object sizes can be very large in memory so use with caution.

full_results: bool
If True, estimate() returns the full results object from the GLM estimation. These results can
be quite large as each estimated sector's results will contain a full copy of the data used for
its estimation, vectors of predicted values, and other memory intensive pieces of data.
If False, estimate() returns a smaller subset of the results that are likely most useful (e.g.
.params, .nobs, .bse, .pvalues, .aic, .bic). For a list of these attributes, see the documentation
for the function SlimResults.

Attributes

estimation_data:
Return the EstimationData.

results_dict:
Return the dictionary of regression results (after applying estimate method).

modified_data:
Return data modified data after removing problematic columns (after applying estimate
method)

ppml_diagnostics:
Return PPML estimation diagnostic information (after applying estimate method). See
estimate.

Methods

estimate:
Estimate a PPML model. See estimate.

combine_sector_results:
Combine multiple result_dict entries into a single DataFrame. See combine_sector_results.

format_regression_table:
Format regression results into a text, csv, or LaTeX table for presentation. See
format_regression_table

Examples

# Declare an EstimationModel
>>> sample_estimation_model = gme.EstimationModel(data_object = gme_data,
...                                              lhs_var = 'trade_value',
...                                              rhs_var = ['log_distance',
...                                              'agree_pta',
...                                              'common_language',
...                                              'contiguity'])

# Estimate the model
>>> sample_estimation_model.estimate()

# Extract the results
>>> results_dictionary = sample_estimation_model.results_dict

# Write the estimates, p-values, and std. errors from all sectors to a .csv file.
>>> sample_estimation_model.combine_sector_results("c:\\folder\\saved_results.csv") 

# Create and export a formatted table of estimation results
>>> sample_estimation_model.format_regression_table(format = 'csv',
...                                                 path = "c:\\folder\\saved_results.csv")