London Office of Data Analytics pilot – now for the hard part

by Eddie Copeland

For the past few months, Nesta has been working with the GLA, more than a dozen London boroughs, and data science specialists the ASI to develop an algorithm that predicts which of London’s many thousands of properties are unlicensed HMOs – “Houses in Multiple Occupation”.

I outlined the rationale for the project in this interview with UK Authority.

The motivation for addressing the specific issue of HMOs is twofold.

First, the fact that only 10 to 20 per cent of London’s HMOs are currently licenced is a missed revenue opportunity for local authorities at a time when public sector budgets are tight. Second, unlicensed HMOs are the likely locations of some of the capital’s worst and most exploitative housing conditions. Identifying more of them could raise money and help protect vulnerable tenants.

Our aim has been to use data to help councils’ building teams prioritise their inspections on properties that are most likely to be unlicensed HMOs. In early January, representatives from each organisation involved in the pilot gathered at Nesta to review progress and figure out the next steps.

The process has gone like this:

Step 1: Ask building inspectors to spell out the features of a likely HMO. Like many front-line workers, building inspectors can provide a long list of mental risk criteria, honed over many years of experience. In the case of HMOs, they might suggest judging risk based on features such as the height of a property, its age, location, or whether the living accommodation is above a shop or restaurant.

Step 2: Identify datasets that relate to that list of mental criteria. To build from inspectors’ gut instincts and expertise to a data-driven approach, it’s necessary to determine which datasets held by the public sector can confirm whether a particular property has any of the features highlighted in Step 1. In this, we’ve benefited from work already carried out by the London Borough of Westminster, with 40 datasets identified as being linked to risk factors associated with HMOs.

Step 3: Use machine learning to weight the criteria. Using a technique called Balanced Random Forest (commonly used in fraud detection) the ASI compared the 40 datasets to known cases of HMOs in Westminster to see which of them correlated most highly. In the first version of the model, 10 per cent of the datasets were shown to have 90 per cent of the predictive power, creating a model that’s 500 per cent more accurate than picking a property at random. Further iterations of the model are expected to increase its accuracy significantly.

In addition to using Westminster’s data, the ASI added open data from the 2011 census. In this specific case, the open data was found to add next to no predictive value to the model, highlighting the importance of local authorities using their own data to reform services.

This is how far we’ve reached to date. Next comes:

Step 4: Add data from other boroughs to the model. I’d originally assumed that once a model was created for one local authority, all the rest would have to provide identical datasets in the same format to benefit from the algorithm’s insights. As it turns out, this couldn’t be further from the truth.

The wonder of using machine learning is that, other than a few essential datasets (UPRNs of known HMOs, UPRNs of all property addresses in the borough, and number of occupants per address), each borough can provide whatever datasets they believe are relevant. (Intuitively this makes sense. The features of a likely HMO in Camden may not be identical to those in Ealing.) This is possible because algorithms are no longer solely coded by humans. Machines can explore the data, identify which datasets are important, weight them and produce results themselves. Rather than having one HMO prediction model for the whole of London, we’ll end up with a bespoke model per borough, but all based on a common foundation. Best of all, the boroughs can provide their data in almost any format.

Step 5: Implement real-world testing and evaluation. Supported by economists and expert evaluators at the GLA and the Behavioural Insights Team, the final step will be for each borough to get their building inspectors to visit properties suggested by the algorithm.

This is not straightforward. To provide meaningful results, we cannot simply compare the hit rate of each team (per cent of inspections resulting in positive identification of HMOs) using the algorithm-generated list with their historic performance. Instead, we need to run randomised control trials where not even the building inspectors know which list they are following.

This is complicated by the fact that each borough has different teams and ways of working. Some use mobile devices to record their visits, others still use paper records. In some cases, boroughs only respond reactively to HMOs (i.e. to complaints from tenants), so there’s no proactive inspection model to compare it to. Creating a comparable method for all boroughs to work to is therefore extremely challenging. These are the issues that will occupy our attention in the weeks ahead.

Early insights

In the meantime, it’s worth drawing out three lessons from what we’ve seen so far:

1. The different technology and data standards used by local authorities are no barrier to working on joint data initiatives. While conforming to common data standards may be necessary in some niche areas, technology is lessening their importance. To emphasise: legacy IT systems are not a legitimate excuse to delay using data.

2. Data analytics does not merely refine existing ways of working – it makes new ones possible. Some boroughs conduct no proactive inspections of suspected HMOs because they are too resource intensive. However, if an algorithm can raise the probability that such inspections will result in positive findings of HMOs, that predictive model becomes not just financially viable, but desirable.

3. Implementation and evaluation are the hardest but most necessary part. If the public sector is serious about using data to deliver deep reform, the process cannot be left to the data teams. Data is useful to the extent that it leads to action. That means being willing to adapt processes and ways of working. It also means committing to measuring the results. These experiments with data – and they are experiments – will only win hearts and minds and lead to wider reform if they can stand up to rigorous evaluation.

The proof of the pudding…

Ultimately, the sucess of this pilot hinges on that final point. Its entire purpose has been to show that joining up, analysing and acting upon data at a city scale is not just an interesting idea, but delivers real-world results. Thanks to the expertise of the ASI, the data science has turned out to be the easy bit. Moving from theory to action is the hard part.

Interested in how cities can use data to improve services?
I highly recommend The Responsive City by Stephen Goldsmith and Susan Crawford.

Follow Eddie on Twitter  

Image credit: CC0 Public Domain, diego_torres on Pixabay

You may also like