Network Development: Web Tools for Sensor Data

Web Tools for Sensor Data

Presented by: Graeme Carvlin, Puget Sound Clean Air Agency

Summary: At the Puget Sound Clean Air Agency, one of our main goals is to characterize and communicate air quality with the active participation of the public. Lower cost air sensors are now a large part of that discussion and we are interested in communicating their benefits, drawbacks, and how to use them to get meaningful data.

To this end, we offer air sensors to individuals and groups through our air sensor lending program (https://pscleanair.gov/539/Air-Quality-Sensors). First, we contact the applicant and discuss their air quality concerns. This discussion is often enough to answer their questions or point them towards appropriate resources. If sensors would help answer their question, they are given the sensors along with documentation on how to operate them and interpret the data.

We often have members of the public ask us why the sensor closest to them is reading much higher than our reference monitors. Purple Air and other sensor manufacturers’ data displays often show the public uncalibrated and unfiltered sensor data. To lower the barriers to correctly interpreting sensor data, I have developed a map that combines reference monitors with quality controlled and calibrated Purple Air data (Sensor Map: map.pscleanair.org and description page: https://pscleanair.gov/570/Air-Quality-Sensor-Map). The Sensor Map has a Health view, which shows a health-based PM estimate, and an Instant view, which shows 1-min data and is useful during air quality events, such as wildfires.

One of the biggest challenges for community groups who want to work with sensors is taking the sensor data and creating a summary report. The Community Reporter is a tool I developed to ingest raw data from a variety of air sensors; QC and average the data; and create a summary report with graphs, maps, and text. It is our hope that these tools and the discussions that arise from their use will help the public interpret air sensor data and answer their air quality questions.

Follow Up Questions and Answers:

  • For the community sensor data (Purple...) How do you do data provenance?  For example, if you learn that one sensor was failing or sitting next to a pollution source which corrupted the data, how do you let users know that the data is suspect?
    • Data are flagged by the QC and calibration algorithms.  Those flags are added to the output Excel files when using the Purple Air Downloader and Community Reporter.  Currently we do not have that data on the Sensor Map, but we would be open to exploring how that might be done.  Also, we hope to have our scripts and algorithms available online (i.e. GitHub).
  • If we have a Dylos, can we contribute data?
    • Dylos data can be used with the Community Reporter to help analyze collected data.  The Sensor Map currently shows only Purple Air sensors.
  • How do you calibrate pm sensors to ug/m3
    • Purple Airs can be calibrated by placing them next to a reference monitor, collecting data for a period of time, then comparing the measurements.  Using linear regression, the Purple Air data can be adjusted to read similarly to the reference monitor.  For the Sensor Map, sensors that are nearby reference monitors are calibrated to them.  Sensors that are not close to a reference monitor are calibrated using the US EPA's national equation with temperature and relative humidity.  "Nearby" is based on a semi-variogram, which measures how similarly the sensors respond based on distance.  The basic form of the calibration equation is: Ref = Purple Air + temperature + relative humidity.  Where “Purple Air” is the QC’d PM2.5 ATM output from the A and B sensors.
    • If you have a Purple Air that you want to calibrate, a basic method to see adjusted data would be to go to the Purple Air map and select a calibration equation in the conversion dropdown (bottom left of the screen).  If you want the highest accuracy, I would suggest locating it close to a reference monitor in your area.  You could contact your local or state air agency to see if they would be willing to help you with this.
  • How are these sensors regarded by the science community? Can they be used for purposes other than public awareness (ex. enforcement action or research)? Also, can you expand on calibration and accuracy?
    • The EPA has a framework to understand what sensors can be used for in their Air Sensors Guidebook (https://www.epa.gov/air-sensor-toolbox/how-use-air-sensors-air-sensor-guidebook).  I have analyzed Purple Airs against this framework (for ASIC 2018) and, if properly calibrated, they can be used for education as well as supplemental monitoring.  They cannot be used for enforcement action or regulatory purposes.  However, community science can be an important tool in affecting government change.
    • The sensors are calibrated to the nearest reference monitor if the R2 of that calibration is greater than 0.5, otherwise they are calibrated to the US EPA national equation with temperature and relative humidity.
    • Angela / Safecast: In our monitor development we’ve only considered sensors that are already accepted and in use within the science community. Safecast isn't about "public awareness" but rather creating useful datasets. It's specifically because these sensors are accepted that we use them.
  • Does the hypothesis testing function check for statistical significance and take variability into account?
    • Yes, a t-test is used to compare groups and the p-value is used to determine the result.
  • How do you also reconcile differences between sensor readings such as dylos and purple air? And also, I was wondering how often you have such situations of problematic sensors with sudden failures and attenuation
    • The Community Reporter has specific QC filtering functions for each type of sensor that are based on how those sensors fail.
    • The sudden failures are more common than attenuation and both together account for about 5% of all data.  Very rarely are both sensors broken at the same time.
    • Angela / Safecast: As we haven't deployed air sensors en mass we've not had this problem. With our radiation sensors if a sensor goes bad we take it down.
  • We saw 45,000 on our calibrated Dylos during the wildfires of Sept 2018... one could taste the smoke on those days.
    • Wow!  Yes, very high concentrations (above 1000 ug/m3 or even 3000) can definitely be seen, especially close to a source.  I would like to explore making the QC'd data, which is normally removed from the Sensor Map, available.  The Community Reporter and Purple Air Downloader have the raw data as well as QC'd in the output Excel file.
  • Are the PurpleAir sensors sited/deployed by your agency or do these also include public citizen sensors as well?
    • We have deployed about a dozen Purple Air sensors.  The rest of the sensors were deployed by the public or other groups.  People who borrow a Purple Air from our lending program would see it pop up on the map within a few hours.
  • Are Purple Air Downloader and Community Reporter public available?
    • The Purple Air Downloader and Community Reporter are not publicly available yet, but I hope to make them available soon.
  • What kind of safeguards are placed within the data screening protocols to prevent automatic discarding of real elevated data from one sensor compared to those in its vicinity?
    • Great question!  If there is a sensor next to a source, say a firepit, recording real elevated data then the two sensors inside the Purple Air will agree with each other (pass the intra-monitor QC), but won't agree with nearby Purple Airs (fail the inter-monitor QC).  If you want to cut out extreme, but valid, data then you can have the comparison between monitors take precedence over comparison between the two sensors within the monitor.  However, if the inter-monitor QC is applied only when the two sensors don't compare well to each other, then these real elevated data can be preserved.  It really depends on what your goal is.
  • Really nice framework for QAQC, more networks need to incorporate all these steps. Follow up questions - Are all the sensors calibrated to a reference monitor? Are data from both A and B invalidated when there are large differences between the two measurements or when failure of one is obvious?
    • Sensors that have an R2 of >0.5 with the nearest reference monitor are calibrated to that monitor.  All other sensors are calibrated using the US EPA's national equation with temperature and relative humidity.
    • About 5% of the time the A and B sensor don't agree with each other and neither is obviously wrong (very low or very high).  When this happens, they are compared to the sensors of nearby monitors -- about 2/3 of the time both sensors turn out to be valid and 1/3 of the time one sensor is preferred over the other.
  • Are you considering chemometric analysis of the real-time data from the network of sensors? something in the area of Principal component analysis.
    • Not for our work since the sensors only measure particle counts and do not take samples.
  • Have you taken temperature or RH into account for any correction algorithms for purple air sensors?
    • Yes, temperature and RH are included in the calibration equation.  The basic form of the calibration equation is: Ref = Purple Air + temp + RH.  Where “Purple Air” is the PM2.5 ATM output.
  • AWS Shiny Interface
    • The AWS Shiny Interface is in beta and is not publicly available yet.  But we hope to release it in the near future!