What can machine learning teach us about water that we don’t already know?

By Sasha Breygina, Lizi Imedashvili, Victor Lima, Kenji Okura, Charles Metayer

First, what is machine learning?

Here in the 21st century, water is rarely just water. It can include human waste, microplastics, industrial chemicals, agricultural byproducts, pathogens, pharmaceuticals, storm-flushed highway residues, and more.

But how much of each? Under what conditions? If we could collect the necessary information in real-time, how do we manage and analyze it, before drowning in data?

Machine Learning uses data and algorithms to mimic the way humans learn, but using methods beyond human capability. Traditional analyses cannot handle the vast amount of data the world throws at us. Machine learning enables us to categorize, predict, and uncover insights that would otherwise go undetected.

At Blue CoLab, our mission is clear: the public has a right to know what is in their water, no matter how complex the challenge — just as they have the right to know the ingredients in their favorite bag of chips, or the contents of a medical prescription. As a Machine Learning team, we “Data Divas” are committed to utilizing data-driven solutions to revolutionize water monitoring and ensure public health and safety.

What insights can machine learning provide?

For example, urbanization near bodies of water can affect the quality of the water in unforeseen ways. To understand fully the human impact on water, there has to be a clear distinction between water quality changes caused by humans and those caused by natural variations. A group of scientists led by Benjamin Schäfer applied machine learning to the River Chess in South-East England to solve this problem. Using machine learning, his team found there was a correlation between the output of a wastewater treatment plant and change in water conductivity and temperature.

The study discovered conductivity levels fluctuating with the discharge from the treatment plant. By proxy of conductivity the study also found that higher levels of discharge by the treatment plant contributed to lower pH. Additional factors like sunlight only make the effects more extreme. The study also found that there was a 1°C increase in water temperatures correlating with the wastewater treatment. Understanding there is a correlation between human behavior and water quality can lead to changes in human behavior.

While the above study is promising, other studies urge caution. Machine Learning requires large amounts of high-quality data – requiring more advanced water monitoring sensors.  Additionally, the implementation of Machine Learning requires the interdisciplinary talents of diverse fields.

L to R: Ali Tejeda is a senior who is in her second semester at Blue CoLab. She is excited to continue working on Choate Pond’s Water Quality Index; Lizi Imedashvili is a sophomore taking Blue CoLab for the first time this semester; Victor Lima is a junior taking Blue CoLab for the first time this semester, but has worked with machine leaning team as a volunteer last semester; Kenji Okura is a senior having taken Blue CoLab for three semesters. He’s excited to continue Blue CoLab to finish the AquaWatch Mobile app his team started; Sasha Breygina is a senior who has interned with Blue Colab for two years; Charles Metayer – Charles is a senior taking Blue CoLab for the first time.

Why Blue CoLab?

At Blue CoLab we collect data every fifteen minutes from two sets of five sensors deployed in Choate Pond on the Pace University Pleasantville campus. This totals 350,400 points of data annually. While perhaps not big compared to other big data, it is simply unmanageable to analyze manually, particularly while those sensors are cranking out an additional 960 data points every hour we are at work. Our brains would explode like cheap lithium batteries.

(Fun fact: The Chess River study also used sensor readings reading every fifteen minutes! You can view the data yourself here. We are also working on doing the same for Blue CoLab’s data in our upcoming app, AquaWatch Mobile – coming soon to a cell phone near you!)

Our team is also working to implement machine learning algorithms to predict the state, or health, of Choate Pond sensors according to “Good”, “Warning”, or “Bad.” This predictive capability is important because it allows us to proactively address operational issues before they escalate, saving valuable time, resources, and potentially even lives. So, instead of spending hours trying to diagnose issues after they’ve occurred, we want a program that will flag in advance those sensors that require attention or maintenance. This will allow us to ensure the integrity of our water monitoring systems.

Conclusion

Our ultimate goal is to deliver accurate and actionable data. By developing our sensor health dashboard and integrating Machine Learning technologies, we’re getting started on the creation of a more efficient, effective, and transparent approach to water monitoring.

Connected to the Blue CoLab mission, our work has the potential to make a significant impact on the world. By providing individuals with access to accurate and timely information about their water quality, we empower them to make informed decisions about their health and well-being. By continuing to identify and address potential issues with water quality, we contribute to the larger goal of protecting human and environmental health. Our dual goal is to revolutionize water monitoring while advancing the broader mission of Blue CoLab to promote transparency, accountability, and public health.