Air quality data analysis: life after r2

Presented byJohn Saffell, Alphasesne Ltd.

Summary: Air quality (AQ) is increasingly being measured using high density, low cost sensor networks, rather than low density, equivalent or transfer standard monitoring stations.  The next step for these AQ networks is improving sensor performance so that data analysis is more robust.   

Companies that assemble, calibrate and maintain AQ networks can improve AQ sensor data quality with proprietary algorithms, but recent field tests of the same AQ networks by different universities and research organisations in different locations have resulted in conflicting conclusions. Why?

Data quality is usually judged by r2, the coefficient of determination. This is not surprising since it is an easy statistical tool which has been used successfully for many physical sensors. Using r2 should be reviewed when analysing chemical sensors which have more degrees of freedom.  r2 assumes a simple linear regression, which means that both the reference and low cost sensors have either no other or the same regressors. Also, spikes in the data due to local pollution events distort r2 calculations and good quality data during a stable period can yield a poor r2 due to the limited range of pollutant concentration.

Conflicting studies using r2 as the measure of data quality point to the ignored problem that measurements are affected from the specific environment: is it roadside, background urban, suburban or a rural location? Is the climate equatorial, desert, arctic, coastal, temperate? 

We propose a different approach to validating AQ data quality. Each AQ network remembers its environment. We must deconvolve the specific environment to obtain better results. This can be achieved mathematically but needs good quality data to operate correctly, more so than with the simpler r2 calculation. We discuss this approach and ask whether results should be expressed with their 95% confidence intervals, not r2.

Click here for presentation video!

Tags