3  Merge Emissions Datasets

This chapter describes the process used to merge emissions datasets.

3.1 Datasets

Two emissions datasets, obtained from emLab, were used in this analysis:

  • Broadcasting emissions: meds_capstone_ais_emissions_data_v20241121.csv

  • Non-broadcasting emissions: meds_capstone_non_broadcasting_emissions_data_v20250116.csv

The data were pre-filtered by emLab from a larger emissions dataset to select for fishing vessels.

The following columns are required:

  • month
  • flag
  • vessel_class
  • lon_bin
  • lat_bin
  • emissions_{pollutant}_mt

define broadcasting vs. non-broadcasting?

3.2 Packages

  • {tidyverse}
  • {janitor}
  • {lubridate}

3.3 Methods

3.3.1 Join Emissions Data

Emissions datasets (datasets 1 and 2 above) were read into the pipeline, the column names were converted to snake case, and a new year-month column was created for both datasets. In the broadcasting dataset, NA values in the flag column were filled with “UNK” to represent flag unknown, and vessel_class was filtered for gear types identified with a high degree of confidence (i.e. “squid_jigger”, “drifting_longlines”, “pole_and_line”, “trollers”, “pots_and_traps”, “set_longlines”, “set_gillnets”, “trawlers”, “dredge_fishing”, “tuna_purse_seines”, “other_purse_seines”, “other_seines”). This eliminated gear types such as “passenger” that were likely mis-identified as “fishing” or as “passenger” by GFW’s machine learning algorithm.

 [1] "trawlers"           "set_longlines"      "drifting_longlines"
 [4] "trollers"           "squid_jigger"       "pots_and_traps"    
 [7] "other_seines"       "pole_and_line"      "other_purse_seines"
[10] "tuna_purse_seines"  "set_gillnets"       "dredge_fishing"    
[13] NA                  

In the non-broadcasting dataset, emissions estimate columns for each of the 9 pollutants (CO2, CH4, N2O, NOX, SOX, CO, VOCS, PM2.5, PM10) are renamed to match the broadcasting dataset, and a flag column is created and populated with “DARK” to distinguish non-broadcasting emissions from the broadcasting emissions. Then, the datasets were concatenated.

A year column was created, and the combined dataset was filtered to 2016 and beyond to match the available data for the non-broadcasting dataset. Emissions estimates are then aggregated (summed) by year and flag for each one-by-one degree pixel (distinguished by lat_bin and lon_bin).

3.3.2 Assumptions

By filtering out certain gear types,… implications for non-broadcasting