Chapter 10 Quality check and errors

This last chapter of this book presents to aspects that must be considered for any practitioner interested in developing emissions inventories. The first part covers a summary of the EMEP/EEA air pollutant emission inventory guidebook 2016, Technical guidance to prepare national emission inventories (McGlade and Vidic 2009), based on the Intergovernmental Panel on Climate Change (IPCC) 2006 Good Practice Guidance [Change (2006). The second part consists of advice and specific considerations for avoiding errors when running VEIN in each part of the inventory.

10.1 Guidance from EMEP/EEA air pollutant emission inventory guidebook

I believe that this part of this chapter delivers knowledge that you really will obtain with experience. However, even with experience, you can get lost, and returning to this part would save you. This chapter is based on the EMEP/EEA air pollutant emission inventory guidebook (McGlade and Vidic 2009).

10.1.1 Key categories

Key categories are the most important source categories. This section is based on the chapter Key category analysis and methodological choice (J 2013) of the EMEP/EEA air pollutant emission inventory guidebook 2013. They are important because they are responsible for most of the pollution in a defined region. For instance, it has been described that vehicles are the most important source of pollution in mega-cities (Molina and Molina 2004). Even more, in the case of São Paulo, the official emissions inventory informs that vehicles are responsible for more than 97 % of \(CO\), 79 % of \(HC\), 68 % of \(NO_X\), and only 29 % of \(PM\) and 22 % of \(SO_X\) (CETESB 2015). This shown another characteristic of key categories: they depend on specific pollutants.

For another example, consider a city which suffers cold weather with use biomass burning for cooking and heating. The primary source of \(PM_{2.5}\) will be certainly bio-mass burning.

Now consider an industrial region with an industrial process that emit too much \(SO_X\). The key-categories might be industrial sources.

Moreover, one more example, consider a massive mega-city with electric mobility. However, the primary source of electricity is thermoelectric based on carbon (the key).

This implies that it should be a nomenclature for categorizing and naming sources. IPCC has a nomenclature for instance.

The idea behind key categories is that most efforts must be applied in this categories, obtaining the highest level of detail with less uncertainty. Hence, the emissions guidelines (McGlade and Vidic 2009) proposes three levels of complexity estimation methods, known as Tier Methods. The complexity increases from level 1, 2 and 3. The VEIN tier is 3, which is that the most complicated function of vehicular emissions estimations is incorporated, except evaporative. I need to improve that part.

Tier 1: a more straightforward method which includes free activity and emissions factors.
Tier 2: Similar to Tier 1, but includes more specific emission factors.
Tier 3: Most complex methodologies with a higher level of detail, temporal and spatial.

10.1.2 Uncertainty

This is an essential part in any emissions inventory, and a dedicated chapter or section should be made in any report/paper. This section is based on the chapter Uncertainties (Pulles T 2013) of the EMEP/EEA air pollutant emission inventory guidebook 2013. The 2006 IPCC Guidelines chapter states that estimating the the uncertainty of an inventory is needed (Change 2006).

It is recommended to use a 95% confidence interval. This means that:

there is a 95% of probability that the actual value of the quantity estimates is within the interval defined by the confidence interval.
The probability of the actual value will be outside the range it is 5%.

The general form for estimating emissions was shown on Eq. (1.1) (Pulles T 2013). This equation will be the base for calculating the uncertainty. This section is applied when measurements are made, for the case of activity or emission factors. In those case, it is possible to calculate the required confidence intervals.

10.1.2.1 Default uncertainty ranges

Activity data

The following table is taking directly from McGlade and Vidic (2009) and propose indicative ranges that could be applied in cases where no independent data are available.

National official statistics: - .
Update of last year’s statistics using Gross Economic Growth factors: 0 - 2%.
International Energy Agency (IEA) statistics: OECD: 2 - 3%, non-OECD 5-10%.
United Nation databases: 5 - 10%.
Default values, other sources: 30 - 100%.

Emission factors

The following table is taking directly from McGlade and Vidic (2009) and propose rating definitions for emission factors(McGlade and Vidic 2009).

A: An estimate based on a large number of measurements made at a large number of facilities that adequately represent the sector. Error: 10 o 30%.
B: An estimate based on a large number of measurements made at a large number of facilities that represent a large part of the sector. Error: 20 to 60%
C: An estimate based on some measurements made at a small number of representative facilities, or an engineering judgment based on some relevant facts. Error: 50 to 200%.
D: An estimate based on single measurements, or an engineering calculation derived from some relevant facts. Error: 100 to 300%.
E: An estimate based on an engineering calculation derived from assumptions only. Error: Order of magnitude.

In McGlade and Vidic (2009) appears ratings for road transport emission factors that differ from the ratings that appear in Ntziachristos and Samaras (2016). To preserve consistency in this book here is presented the precision indicators, Table 4-1 of Ntziachristos and Samaras (2016) report.

Table 10.1: Precision indicators (Ntziachristos and Samaras 2016)
Category	NOx	CO	NMHC	CH4	PM	CO2
PC G w/o Catalyst	A	A	A	A	-	A
PC G w Catalyst	A	A	A	A	-	A
PC D	A	A	A	A	A	A
PC LPG	A	A	A	-	-	A
PC LPG w/o Catalyst	A	A	A	A	D	A
PC LPG w Catalyst	D	D	D	D	D	A
PC 2 strokes	B	B	B	D	-	B
LCV G	B	B	B	C	-	A
LCV D	B	B	B	C	A	A
HDV G	D	D	D	C	-	D
HDV D	A	A	A	D	A	A
MC cc <50	A	A	A	B	-	A
MC cc > 50 2 strokes	A	A	A	B	-	A
MC cc > 50 4 strokes	A	A	A	B	-	A
Cold-start PC G P-Euro	B	B	B	-	-	B
Cold-start PC G Euros	B	B	B	A	-	A
Cold-start PC D P-Euro	C	C	C	-	C	B
Cold-start PC D Euros	A	A	A	A	A	A
Cold-start PC LPG	C	C	C	-	-	B
Cold-start LCV G	D	D	D	-	-	D
Cold-start LCV F	D	D	D	-	D	D

Uncertainties can be aggregated with two approaches

Rule A: uncertainties are combined by addition, as shown in Eq. (10.1):

\[\begin{equation} U_{total}=\frac{\sqrt{\sum_{i = 1}^{n}(U_i \cdot x_i )^2}}{\sum_{i = 1}^{n}} \tag{10.1} \end{equation}\]

Where, \(x\) are the quantities, \(U_i\) are the uncertain quantities and the percentage uncertainties (half the 95% confidence interval) associated with them, respectively. \(U_{total}\) is the percentage uncertainty in the sum of the quantities (half the 95% confidence interval divided by the total (i.e., mean) and expressed as percentage).

Rule B: uncertainties are combined by multiplication, as shown on Eq. (10.2):

\[\begin{equation} U_{total}=\sqrt{\sum_{i = 1}^{n}(U_i)^2} \tag{10.2} \end{equation}\]

Where \(U_i\) are the percentage quantities (95% confidence interval) associated with each of the quantities. \(U_{total}\) is the percentage in the the product of the quantities (half the 95% confidence interval divided by the total and expressed as a percentage).

Alternatively, a Monte Carlo simulation can be done.

10.1.3 Quality Assurance and Quality Check

This section is mostly based on the chapter Inventory management, improvement and QA/QC (Goodwin et al. 2013) of the EMEP/EEA air pollutant emission inventory guidebook 2013.

According to Wikipedia:

Quality assurance (QA) is a way of preventing mistakes and defects in manufactured products and avoiding problems when delivering solutions or services to customers (Wikipedia contributors 2018 b).
Quality control (QC) is a process by which entities review the quality of all factors involved in production (Wikipedia contributors 2018 c).

The primary reference in QA/QC is the International Organization for Standardization ISO 9000 (Wikipedia contributors 2018 a).

The idea is to avoid errors in the development of the inventory. Hence, it is essential that the objective of the inventory, frequency, and spatial and temporal resolution must be obvious. As mentioned before, the inventory can have a purpose of policy application or scientific. In any case, the QAQC procedures should be explicitly stated covering five data quality objectives transparency, consistency, comparability, completeness and accuracy.

This means that another researcher/practitioner should be able to reproduce the results. However, this is not always the case. Even more, it has been shown that there is currently a crisis of reproducibility in science, where ‘more than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments’ (Baker 2016). I believe that emissions inventories should guarantee the reproducibility of results, in scientific and policy application inventories.

10.1.4 Inventory managment cycle

Another important aspect is the inventory management cycle. The inventory manager is responsible for institutional arrangments, meet deadlines and for the inventory management cycle. The inventory compiler gets the data and estimate the emissions with the respective Tier.

The cycle is:

Prioritization of improvements: As the resources are limited, prioritization must be given to critical categories.
Data-collection: It is important to establish formal agreements needs with data providers using protocols. The protocols must clearly show the data needed, its format, content, and dates of delivery.
Inventory compilation. The inventory compiler estimates the emissions.
Consolidation. The inventory manager consolidates all the emissions ensuring quality in data and methods for estimating emissions.
Data Quality Review.
Reporting.
Lessons learned and improvement review.
Inventory Managment Report.
Quality Assurance and Quality Control Plan.

10.2 Avoiding errors with VEIN

The first part of this chapter covered some fundamental aspects of emissions inventory management for quality assurance and quality control. This part covers typical errors that I’ve faced using VEIN, and I hope it helps users how to avoid them.

Currently, VEIN does not have a graphical user interface (GUI) and runs on R (R Core Team 2017), which can be intimidating for new users. Also, even experienced users can make mistakes. If you find a mistake, write it down to avoid it in future. Be organized. I recommend coding your inventory keeping in mind that you will have to check your scripts in the future, so it would be nice if in the first try your code looks nice. And also, don’t panic.

10.2.1 Traffic flow

One of the first inputs of it is the traffic flow data. You must have an agreement with data providers to get the data. Also, the compiler must understand the format of the data. In my experience, data providers come from government agencies with limited resources and less time. This means that, if there is not a formal agreement, they won’t speed too much time for processing their data to meet your needs (because it is not part of their jobs). And even, if there was a formal agreement, the data processing must be done by the data compiler.

The data provided in this book covers a travel demand output for Metropolitan Are ofSão Paulo (MASP) made by the Traffic Company Engineering of São Paulo, for the year 2012. The has the same content inside VEIN package, with the exception that covers the whole MASP. The data initially is in format MAPINFO. Therefore, the package sf must be used to read the data. (Results not shown).

library(sf)
net <- st_read("data/masp.TAB")

The section @ref(#traffic) shows details about this data.

Have a meeting with data delivers to check and solve any doubt regarding the data.
Check if your data is projected or not. The data probably will be projected with a UTM zone, for instance, code 31983 http://spatialreference.org/ref/epsg/sirgas-2000-utm-zone-23s/. VEIN imports functions from sf and lwgeom which depend on GEOS libraries. This means that VEIN can work with data projected or not. Nevertheless, I suggest working with projected data for your location.
Ensure that the data is correctly read and that there are no objects of class factor in the columns.
Make sure that of what are the units of the traffic flow and remember that most of the VEIN functions are designed to work with hourly traffic flow.
Plot the data. Check if the data looks fine or have some mistakes.
Calculate the length of each road. Make proper transformations and make sure that resulting length of the road as units km. I suggest using the name lkm, which is done within R with packages sf and units installed. Let’s say that your data is named net, your data is already projected, and that your traffic flow is named ldv for light-duty vehicles and hdv for heavy-duty vehicles.

plot(net["ldv"], axes = T)
plot(net["hdv"], axes = T)
net$lkm <- sf::st_length(net) #Distance in meters
net$$lkm <- units::set_units(net$lkm, km) #transform in km

Despite that this transformation can be done by dividing lkm by 1000, this would be wrong because the units wouldn’t change and they must be km. Hence, using the functions of the units package are the recommended way.

10.2.2 Vehicular composition

The vehicular composition is a critical part of the emissions inventories for vehicles. It consists of the percentage of each type of vehicle and technology. Despite that it seems pretty straightforward, making a mistake in this part of the emissions inventories would cost lots of time. Developing an emissions inventory is a complex task, you must take great care in simple calculations because if the results are not consistent, it can take LOTS of time, to find the error, usually when the deadlines are dead.

The function inventory needs the vehicular composition to create the respective directories to run VEIN. I have the feeling that most the model users like structured models with clear input and outputs. VEIN is not like that, and this is in part due to the nature of the emissions inventories. I could design a model that works perfectly with one type of input, but in real life, the desired input is not always easy to get. For instance, the traffic simulation used inside the model is not easy to get, even in cities of developed countries. Hence, the inventory compiler must struggle to do their job. Therefore, VEIN was designed with flexibility and versatility on the mind.

Anyways, the vehicular composition is the number of each type the following vehicles: PC, LCV, HGV, BUS, MC. Then, each subcategory will be divided by type of age of use. For instance, the vehicular emissions inventory for São Paulo State considers 4 type of vehicles:

Particulate Cars using gasohol (Gasoline with 25% of ethanol, PC_25).
Particulate Cars Flex using gasohol (Gasoline with 25% of ethanol, PC_F25).
Particulate Cars Flex using ethanol (Gasoline with 25% of ethanol, PC_FE100).
Particulate Cars using ethanol (Gasoline with 25% of ethanol, PC_E100).

This means that the number of PC is 4.

Then each of this vehicles is divided by the age of use. As a consequence, we must know when each of this vehicle entered and went out of the market. Again, for São Paulo conditions, Flex engines entered into the market in 2003 and nowadays must of new cars are flex. Vehicles with engines designed for burning ethanol had a peak in the early 80s, but they went off the market in 2006. Gasoline vehicles have been in circulation from the beginning and CETESB inventory considers a lifespan of 40 years of use.

The same analyses must be done with each of the vehicle categories.

The distribution by the age of use indirectly shows the technology associated with emission standards by the age of use.

10.2.3 Units

Currently, VEIN checks that the variable lkm (length) must be in km. This is to avoid that the user wrongly enters the length in meters or without units. As a result, the emissions would be vast and wrong.

In the case of vehicles, currently, there are not designed a unit of “vehicles” in VEIN. In transportation studies, it is used the unit “equivalent vehicle” which standardize each type of vehicle by its size. For instance, LCV is sometimes equivalent to 1.2 vehicles. Buses or HGV can be equivalent to 2 or more vehicles. however, there are no such type of this things in VEIN and units management is designed to control emission factors and length.

The temporal dimension is also an aspect difficult to handle regarding the units. This is because, sometimes mileage are correctly in km, but the timespan is in one year. Therefore, the management of the unit is designed to avoid errors with emission factors and calculation of emissions. This means that a future version of VEIN will eradicate all temporal dimensions ensure the right calculation of mass only. This means that VEIN will not check if the data is hourly or annual.

10.2.4 Emission factors

When I started this book, VEIN contained few emission factors. Now, the version 0.5.7 has all the 2016 emission factors of CETESB, almost 100 emission factors from the European Emission guidelines the all the BASE emission factors from the International Vehicle Model (IVE). The functions to access this data are:

ef_cetesb.
ef_ldv_speed.
ef_hdv_speed.
ef_ive.

I will be working to import emission factors from HBEFA model which are based traffic situation (very cool).

10.2.5 Deterioration

VEIN currently cover the deterioration emission factors from the European emission guidelines. This value results in a simple numeric vector depending on standard, mileage, type of vehicle and pollutant. However, it would be possible to use any other deterioration factors.

Ensure to use the data correctly and cite several kinds of literature.

10.2.6 Fuel evaluation

It is a good practice to ensure that the mass of fuel consumed of the vehicle in your study area, matches the fuel sale son your area. If not, calibrate your emissions by some vehicles (if bottom-up) or a combination of vehicles and mileage (if top-down) to match fuel sales.

10.2.7 Fuel quality

The chemical composition of the fuel has a direct effect on the emissions. New fuel designed to be used with clean vehicular technologies has the effect of producing fewer pollutants when applied in vehicles of older technologies. Hence, when Copert emission factors are used, ALWAYS has to be included in the effect of the chemical composition. VEIN includes the function fuel_corr to account the impact of the fuel composition into emissions.

10.2.8 Emissions estimation

Most of the emission functions return an array of emissions, which means a matrix of matrices. The idea was that each dimension has a meaning but I’ve thought that it is not necessary and each dimension should have different nature, for instance, it is not necessary two dimensions for time, despite that one indicates hour and the other days. So I might change that in the future. Don’t panic; I’m always trying to change the functions without breaking older code.

I hope you liked this book.

References

McGlade, J, and S Vidic. 2009. “EMEP/Eea Air Pollutant Emission Inventory Guidebook 2009: Technical Guidance to Prepare National Emission Inventories.” Technical report 9/2009, EEA, Copenhagen, Denmark.

Change, IPOC. 2006. “2006 Ipcc Guidelines for National Greenhouse Gas Inventories.” 2013-04-28]. Http://Www.ipcc-Nggip.iges.or.jp./public/2006gl/Index.html.

J, Goodwin. 2013. “EMEP/Eea Air Pollutant Emission Inventory Guidebook 2013: Key Category Analysis and Methodological Choice.” Technical report 2013, EEA, Copenhagen, Denmark.

Molina, Mario J, and Luisa T Molina. 2004. “Megacities and Atmospheric Pollution.” Journal of the Air & Waste Management Association 54 (6). Taylor & Francis: 644–80.

CETESB. 2015. “Emissões Veiculares No Estado de São Paulo 2014.”

Pulles T, Kuenen J. 2013. “EMEP/Eea Air Pollutant Emission Inventory Guidebook 2013: Uncertainties.” Technical report 2013, EEA, Copenhagen, Denmark.

Ntziachristos, L, and Z Samaras. 2016. “EMEP/Eea Emission Inventory Guidebook; Road Transport: Passenger Cars, Light Commercial Trucks, Heavy-Duty Vehicles Including Buses and Motorcycles.” European Environment Agency, Copenhagen.

Goodwin, Pulles T, Ardenne J, Tooly L, and Rypdal K. 2013. “EMEP/Eea Air Pollutant Emission Inventory Guidebook 2013: Inventory Managment, Improvement and Qa/Qc.” Technical report 2013, EEA, Copenhagen, Denmark.

Wikipedia contributors. 2018b. “Quality Assurance — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=Quality_assurance&oldid=847736226.

Wikipedia contributors. 2018c. “Quality Control — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=Quality_control&oldid=850408288.

Wikipedia contributors. 2018a. “ISO 9000 — Wikipedia, the Free Encyclopedia.” https://en.wikipedia.org/w/index.php?title=ISO_9000&oldid=849979440.

Baker, Monya. 2016. “Is There a Reproducibility Crisis? A Nature Survey Lifts the Lid on How Researchers View the’crisis Rocking Science and What They Think Will Help.” Nature 533 (7604). Nature Publishing Group: 452–55.

R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.