Report 2414

Earlier this year, Toronto's public health department quietly flipped the switch on an experiment targeting the city's most pollution-prone beaches.

Instead of relying on day-old laboratory tests to ensure that people don't swim in unsafe water, the city tapped the magic of artificial intelligence, contracting with Cann Forecast, a Montreal-based startup whose predictive modeling products use AI and machine learning to forecast water quality. Beginning in June, officials used Cann's model to decide when two of Toronto's most frequently contaminated beaches should be open to the public and when the water was unsafe.

Almost immediately, the experiment began to go awry. The model regularly declared the beaches safe to swim on days when history or the water's appearance suggested it was not safe. City officials waved off concerns from residents and a local environmental monitoring group over the discrepancies, arguing that the AI tool was more accurate than traditional testing methods.

But an analysis of water quality data from the two beaches by The Information found that the predictive model was less accurate than the city's previous method—and less accurate than a coin flip. In all, the model flagged a little more than one of every three days when the water was unsafe, leading to nearly 50 instances this summer when beaches with dangerous bacteria levels were open to the public for swimming.

Toronto's struggles highlight the challenges for governments as they deploy predictive models and other AI tools. In recent years, public agencies from Michigan to Australia have turned to automation to improve the efficiency of processes like debt collection and benefits fraud, with mixed results.

Child welfare agencies across the U.S. have deployed algorithms to help social workers prioritize investigations, despite little evidence that they work and multiple examples of racially biased results. One such tool, developed by a child welfare agency in Pennsylvania, was found to disproportionately flag Black children for neglect investigations; some social workers tried to manually adjust the scores to reduce bias. Officials in Oregon, which had adopted a similar system, stopped using their version of the tool in June, following academic research and a report from the Associated Press highlighting concerns about the system used in Pennsylvania.

"It's automation bias," said Shea Brown, founder and CEO of Babl AI, a consulting firm focused on the ethical deployment of algorithms. "In the absence of lots of scrutiny, people tend to default to the automated tool," in part to shift responsibility to a computer program, Brown said.

Part of the problem lies in how these technologies are marketed, said Arvind Narayanan, a computer science professor at Princeton University who is writing a book about the rise of what he calls bogus AI. "They sell these tools based on the promise of full automation, but when concerns are raised about bias, catastrophic failure or other well-known limitations of AI, they retreat to the fine print, which says that the tool shouldn't be used on its own," he said. That's problematic, Naryanan said, "because human oversight would negate the cost savings that [users] were hoping for when they bought the tool."

Ordered by Congress to reduce recidivism, the U.S. Department of Justice in 2018 developed an algorithm, known as Pattern, to determine the likelihood that someone in prison would be arrested or otherwise come in contact with the criminal justice system after release. People the algorithm categorized as lower risk are more likely to get access to rehabilitative programs that can shorten their prison sentences, while those categorized as higher risk face significant barriers.

Bugs and clerical errors have plagued Pattern since its deployment in 2019, according to reports published by the Justice Department and Congressional testimony. Recent reviews of the tool's efficacy found that it overpredicts the risk of recidivism for Black, Hispanic and Asian people, and has led to the incorrect classification of 14,000 incarcerated people.

"It's perfectly OK to use a predictive model when the failure of that model doesn't result in anything that's hugely bad," said Brown. "But the higher the consequences are of failure, the higher the stakes and the more accurate the model needs to be."

Testing the Waters

Toronto has been trying to shake off its reputation as home to one of the most polluted waterfronts in the Great Lakes region for three decades. The issue stems from the city's aging and easily overwhelmed sewer system, which collects both residential wastewater and stormwater for treatment.

When rain or overuse pushes too much water into the city's sewer system, it can overflow, sending untreated sewage into Lake Ontario and nearby rivers. In a typical summer, Toronto's sewage system discharges more than 1.6 billion gallons of raw sewage into the lake and nearby rivers, according to government analysis of wastewater overflow volumes between 2017 and 2020.

Untreated sewage contains high levels of the E. coli bacteria, which can cause mild to severe sickness in humans and potentially life-threatening illness in children, the elderly and others at elevated risk.

Toronto has spent hundreds of millions of Canadian dollars in recent years to reduce the amount of waste and dangerous bacteria at its 11 public beaches and increase public confidence in water quality. That investment has had mixed results.

A key part of the plan is testing: During the summer swimming season, the city collects daily water samples at each of its beaches and sends them to a laboratory to be tested for E. coli. If the samples average more than 100 E. coli clusters per 100 milliliters of water, the city marks the beach as unsafe for swimming. An E. coli level of 100 clusters per 100 milliliters puts 32 out of 1,000 bathers at risk of developing gastrointestinal illness, according to studies cited in Canadian water quality guidelines.

The laboratory tests Toronto uses can take up to 24 hours to provide results. So the city historically based its water-safety decisions on beach conditions from the prior day. Despite the lag, this testing method has produced reliable results for most Toronto beaches, because the bacteria levels typically change slowly. Since 2014, eight of the city's beaches have regularly received international recognition for their cleanliness, water quality and consistently low E. coli counts.

But the city has struggled to produce similar results at two key beaches. Toronto's investment in predictive modeling technology was an attempt to better gauge water quality in those areas. On its website, Toronto Public Health wrote that the technology would enable it to "forecast water quality with much greater confidence in its accuracy" than for the day-old lab tests.

The model Toronto Public Health used was developed by Cann Forecast, a startup born out of a 2016 water-tech hackathon where participants used data from the city of Montreal to create technological solutions addressing freshwater issues. A team of university students and researchers won top prize at the event for developing an application that used predictive modeling to estimate water quality on the banks of Montreal's St. Lawrence River. Some members of the team later formed Cann Forecast, which has contracted with Montreal since at least 2018 to provide water quality prediction services.

On its website, Cann Forecast describes its beach water monitoring system as a high-tech "artificial intelligence algorithm" that uses "machine learning" to provide "real-time water quality advisories that are 90% accurate on average."

Toronto Public Health awarded a noncompetitive contract to Cann Forecast for CA$30,000 last year to use the model during the 2022 swimming season as part of a one-year pilot program. City officials believed the company "was the only known vendor" of predictive modeling tools for beach water quality, so they did not open the contract to competing bids, a city spokesperson told The Information.

Ultimately, the predictive model proved to be worse at identifying when the water was unsafe than the city's previous methods, according to The Information's analysis of water quality data collected by Toronto Public Health. Between June and September, the predictive model declared one of the two beaches to be safe 46 times when E. coli levels exceeded guidelines.

At one beach, Sunnyside, there were 50 days where the water contained hazardous levels of E. coli bacteria and 44 days where the water was safe for swimming, according to an analysis of daily water sample results. Of those 50 unsafe days, the predictive model correctly identified only 19. That meant officials opened Sunnyside Beach on 31 days when bacteria levels were dangerously high. Had the city stuck with its previous methods for assessing water quality, it would have mistakenly declared the beach safe to swim on 23 days with high E. coli counts during the same period, The Information's analysis found.

A spokesperson for Toronto Public Health confirmed the accuracy of The Information's analysis of Sunnyside's data. The agency said the predictive model was just one factor in decisions ultimately made by people.

"During the pilot project, decisions to post or not to post a beach as safe were made by Toronto Public Health staff," a department spokesperson said. "Artificial intelligence predictive modeling forecasts were reviewed by staff who considered any anomalies before posting or unposting each beach."

The spokesperson did not respond to a question about whether officials ever overrode the model's forecast. But data published by the city show that the posted swimming flags at the two beaches never differed from the model's predictions.

The department is still assessing how the model performed and expects to report by the end of the year, the spokesperson said. If the city decides to keep using the model, it will work to improve it, the spokesperson said. "The experience of other jurisdictions suggest significant improvements in accuracy can be achieved with continual analysis and improvements of the models over time."

Cann Forecast declined to comment and referred questions about the model to Toronto Public Health.

'The Water Just Looked Gross'

Isabel Fleisher, a manager at Toronto-based water quality monitoring group Swim Drink Fish, noticed that something was wrong with the city's predictions a couple of weeks after the model went live in early June.

A Toronto native and recreational swimmer, Fleisher lives near Sunnyside Beach and regularly checks the water quality before swimming. Her group, Swim Drink Fish, then analyzes the water using a laboratory test similar to the city's, and posts the results to its website.

In June, she noticed that the city regularly listed Sunnyside as safe for swimming on days when the city's own laboratory tests reported high E. coli levels. Sometimes the beach was open on the days immediately following a rainstorm, when bacteria levels are often high in areas near sewer outflows, like Sunnyside.

"The water just looked gross," she said.

As the summer went on, Fleisher grew more concerned about the discrepancies, which continued after results the city posted indicated that the model regularly made errors. In mid-July, she met with the Toronto Public Health manager in charge of the experiment, Mahesh Patel, to discuss the issue, Fleisher said.

During the meeting, Patel said Toronto Public Health was aware of the issues with the model's prediction capabilities, according to Fleisher. "He admitted it was wrong to have a [an E. coli level of] 400 and say it was safe," recalled Fleisher. Patel did not respond to a request for comment.

Fleisher brought her concerns to the Toronto Star, which published a report questioning the model's efficacy a few weeks later on August 10. In a statement to the Star, public health officials defended the model, saying it wasn't expected to be "100% accurate," but called it "a significant improvement" over the prior testing method.

Five days later, public health officials discovered that the model was using faulty weather data to issue predictions at Sunnyside Beach, a Toronto Public Health spokesperson told The Information in a statement. "This inaccuracy was addressed, but was too late to affect the overall results of the pilot project," the spokesperson said.

Report 2414

Associated Incidents

Incident 2903 Report
False Negatives for Water Quality-Associated Beach Closures

Toronto Tapped Artificial Intelligence to Warn Swimmers. The Experiment Failed

Testing the Waters

'The Water Just Looked Gross'

Report 2414

Associated Incidents

Incident 2903 ReportFalse Negatives for Water Quality-Associated Beach Closures

Toronto Tapped Artificial Intelligence to Warn Swimmers. The Experiment Failed

Testing the Waters

'The Water Just Looked Gross'

Incident 2903 Report
False Negatives for Water Quality-Associated Beach Closures