3 Merge Emissions Datasets
This chapter describes the process used to merge emissions datasets.
3.1 Datasets
Two emissions datasets, obtained from emLab, were used in this analysis:
Broadcasting emissions:
meds_capstone_ais_emissions_data_v20241121.csvNon-broadcasting emissions:
meds_capstone_non_broadcasting_emissions_data_v20250116.csv
The data were pre-filtered by emLab from a larger emissions dataset to select for fishing vessels.
The following columns are required:
monthflagvessel_classlon_binlat_binemissions_{pollutant}_mt
define broadcasting vs. non-broadcasting?
3.2 Packages
- {tidyverse}
- {janitor}
- {lubridate}
3.3 Methods
3.3.1 Join Emissions Data
Emissions datasets (datasets 1 and 2 above) were read into the pipeline, the column names were converted to snake case, and a new year-month column was created for both datasets. In the broadcasting dataset, NA values in the flag column were filled with “UNK” to represent flag unknown, and vessel_class was filtered for gear types identified with a high degree of confidence (i.e. “squid_jigger”, “drifting_longlines”, “pole_and_line”, “trollers”, “pots_and_traps”, “set_longlines”, “set_gillnets”, “trawlers”, “dredge_fishing”, “tuna_purse_seines”, “other_purse_seines”, “other_seines”). This eliminated gear types such as “passenger” that were likely mis-identified as “fishing” or as “passenger” by GFW’s machine learning algorithm.
[1] "trawlers" "set_longlines" "drifting_longlines"
[4] "trollers" "squid_jigger" "pots_and_traps"
[7] "other_seines" "pole_and_line" "other_purse_seines"
[10] "tuna_purse_seines" "set_gillnets" "dredge_fishing"
[13] NA
In the non-broadcasting dataset, emissions estimate columns for each of the 9 pollutants (CO2, CH4, N2O, NOX, SOX, CO, VOCS, PM2.5, PM10) are renamed to match the broadcasting dataset, and a flag column is created and populated with “DARK” to distinguish non-broadcasting emissions from the broadcasting emissions. Then, the datasets were concatenated.
A year column was created, and the combined dataset was filtered to 2016 and beyond to match the available data for the non-broadcasting dataset. Emissions estimates are then aggregated (summed) by year and flag for each one-by-one degree pixel (distinguished by lat_bin and lon_bin).
3.3.2 Assumptions
By filtering out certain gear types,… implications for non-broadcasting