I have frequently written and spoken about the amazing work of New York City’s Mayor’s Office of Data Analytics (MODA), originally set up and led by Mike Flowers (read my interview with Mike).
A common question I receive is: How does MODA actually apply data analytics to help a service delivery team (SDT) – i.e. a public sector organisation offering a service – solve a problem?
In this article I outline MODA’s 10-step approach. These details were kindly provided by the MODA team and can be found in Big Data In the Big Apple – a report for Policy Exchange.
Step 1. Understand how day-to-day operations work
The MODA team spends time shadowing front-line Service Deliver Team (SDT) staff to understand: a) the nature of the service they provide; b) how resources are allocated, scheduled and delivered; c) the factors that go into the prioritisation of delivery of the service; and d) how data is recorded in the SDT’s IT system(s). When he first arrived at City Hall, Mike Flowers spent six months with front-line staff to experience their activities for himself and to understand the data they used and recorded.
Step 2. Identify areas where data could help
MODA examines the data that is used and recorded during the process of delivering a public service. The team then considers how the service could be improved (for example, by being better able to allocate a scarce resource, such as inspectors’ time) and tries to identify what information it would take to achieve that aim.
Step 3. Form a project plan
A project plan is put in place so that the SDT team and MODA can agree: a) an approach that works for each party; b) the data that will be used; and c) the timeline for the project.
Step 4. Understand data context
To understand the value of the data that is used and collected during the course of providing a particular service, MODA analysts need to understand how it is created and what it means in its original context.
Step 5. Create a Memorandum of Understanding (MOU)
Much like writing a contract, MODA establishes a formal written agreement with the SDT’s organisation (e.g. the Fire Department). The MOU details the purpose of data sharing and the privacy and data protections that will be applied by MODA and the SDT. It also ensures that there is transparency and commitment about what is required from each side.
Step 6. Integrate data
MODA sets up the technical connection to take the SDT’s data so it can be stored and analysed. To combine records with other datasets, MODA insists that all records are geo-tagged – in other words, given a location such as a street address, ZIP code or grid reference. It is this that allows different datasets to be mapped together so that new correlations can be identified.
Step 7. Test hypotheses
Working with the SDT, MODA creates several hypotheses regarding which pieces of information will be useful in improving the service outcome. For example, when investigating what information could help predict illegal building conversions, MODA discovered that the most likely sources of violations are single family homes that are less than the average home value and smaller than 3,000 square feet. The homes within that subset that have histories of tax delinquency, mortgage liens and especially a history of building violations are the ones that are most likely to contain illegal conversions. (See full details in case study below.)
Step 8. Service delivery team review
Once MODA has run its pilot data model, the SDT team needs to check the analysis to make sure that MODA has interpreted the information correctly. If needed, MODA updates the model to correct any misunderstandings.
Step 9. Automate the process
To deliver sustainable savings and improvements in performance, the processes designed by MODA must not depend on human analysts (who act as single points of failure when they are late for work, sick or on holiday), but should be automated and integrated into SDT systems so that they become part of the normal workflow.
Step 10. Implement solution
The final step is for MODA to roll out their solution so that it becomes a permanent fixture of the service.
(And just for luck…) Step 11. Delegate responsibility for the data model
An eleventh step could be added: the model can be passed onto, and managed by, the department itself.
Interested in how cities can use data to improve services?
I highly recommend The Responsive City by Stephen Goldsmith and Susan Crawford.
The following table outlines in detail the 10 step process used by MODA to apply data analytics to improve a service, including the key questions the team asks at each stage. In the third column, each step is explained using the specific example of how MODA helped the Department of Buildings (DOB) prioritise the inspection of illegally converted apartments. (The table is adapted from ‘Memorandum on MODA Project Process Flow’ by Mike Flowers.)
|Steps||Key questions asked by MODA||Case study: NYC illegal conversions|
Spend time in the field with front-line staff to understand how their day-to-day operations work.
|What is the service being provided?||The Department of Buildings (DOB) inspects illegal conversion complaints to ensure that NYC residents are living in safe conditions. When conditions are not safe, orders (known as ‘violations’) are issued to property owners to remedy the apartment. In extreme conditions DOB will vacate the living space.|
|How is the service allocated, scheduled and delivered?||In each of NYC’s five boroughs, DOB has a Borough Command office with a team of inspectors. When a new illegal conversion complaint comes in (via 311), it is printed at the relevant Borough Command office.Typically, complaints are investigated in the same order they are received. DOB has a goal of inspecting every non-prioritised complaint within 40 days.|
|What factors go into the prioritisation of delivery?||Complaints are inspected in the order they are received. However, priority is given to those that include phrases such as ‘no exit’ and ‘exposed boiler’, which suggest higher risk.|
|How is the delivery recorded in the organisation’s IT system(s)?||DOB tracks the complaint number through the final disposition in the Building Information System (BIS).
The Environmental Control Board records and adjudicates DOB violations.
Identify what part(s) of the service can be improved through data analysis.
Check assumptions with the team(s) delivering the service.
|What type of problem is this?||The main challenge is identifying which complaints to prioritise given the limited number of inspectors.|
|What data exists around the operation?||The wording and details of the complaint in 311.
BIS holds the inspection history of each property.
ECB holds the violation history of each property.
|What other data would be helpful (hypotheses)?||Potentially:
Department of Finance property records;
Tax liens and lis pendens;
The age of building.
|What is the desired end goal of the data use?||A priority flag that can added against the highest risk complaints on the list printed out each morning at each DOB Borough Command office.|
|What’s the commitment from the agency and MODA?||That DOB will provide expert guidance on how their service is delivered; review and pilot MODA’s data-driven prioritisation model; and then work with MODA to automate the process.MODA will test and create a risk filter.|
Form a project plan for the delivery team and MODA to agree an approach that works for each party.
|What data will be used, and what new data is needed?||No new data required.|
|What is the timeline for the project?||Three months: one month for analysis; two months for the pilot; one month for automating the IT processes (concurrent with the second month of the pilot).|
|What are the check points during the project?||Two check points during development of the risk filter: one after two weeks and the second after one month. End of month checks on pilot results. Weekly checks on IT development once launched.|
Understand data context.
To appreciate the value of the data, MODA analysts have to understand how it is created, and what it means in its original context.
|What are the datasets and what are they measuring?||Records of inspections and violations;
Records of every visit to a property;
Records of when access is granted and the inspection is completed;
Records of violation notices, by type, which are found in inspections.
|How is the data generated? What road bumps should MODA anticipate?||Data is generated by inspectors or Borough Command staff who manually enter records.
In the case of DOB complaints and inspections, MODA learned that any new complaint on an existing scheduled inspection is ‘administratively closed’. This was important to understand why some complaints showed fast resolution but no history.
|How is the data interpreted?||The results of an inspection are recorded. Often the most serious violation is the violation that is written.|
|How is the data set stored?||Inspections are stored in BIS. Violations are adjudicated through the Environmental Control Board (ECB).|
Creating a Memorandum of Understanding (MOU).
|What is the purpose of the project?||The purpose of the DOB project is to use DOB inspection and violation data to perform an analysis of historical outcomes, find common traits of illegal apartments that are vacated, and use that data to risk-analyse new complaints to prioritise future inspections.|
|What data security guarantees are provided?||In the case of the DOB, the information shared is available in the public record, therefore no special care was needed for DOB records. However, while data from DOB BIS was not sensitive, MODA agreed not to use or share the DOB data in other projects without notification to DOB.|
Data Integration. MODA sets up the technology required to take agency data.
Data must be matched to geo-located records in MODA’s system.
|What sort of system records the data?||BIS records data in a mainframe system.|
|What is the most appropriate method for transmitting the data to MODA?||A paging server sits on top of the BIS mainframe. Every day, the paging service automatically extracts data from BIS using A NiemXML transfer protocol. The data is transferred using DEEP to DataShare. DataShare then pushes the data to MODA using an ETL workflow. MODA loads the data into DataBridge using Informatica.|
Testing hypotheses. Working with the department, MODA chooses several hypotheses on which variables will be useful in improving the service outcome.
|What variables will we test?||Property tax records; lien record; building age; building size; building value; violation history; neighbourhood conditions.|
|What’s the most appropriate analytical technique for this analysis?||A decision tree was used to identify the relative value of variables in predicting an illegal conversion.|
|What do the preliminary results show?||Single family homes that are less than the average NYC home value and smaller than 3,000 square feet are the most likely sources illegal conversions. The homes within that subset that have histories of tax delinquency, mortgage liens and especially a history of DOB violations are the ones that are most likely to now contain illegal conversions.|
|How do we communicate these results to the delivery team?||A series of slides was used to graphically convey the value of the variables in predicting the historical outcomes.|
Client review or pilot. The delivery team needs to check the analysis to make sure that the information is interpreted correctly. MODA updates its model if required.
|Does any of the analysis surprise the delivery team? If so, why?||The delivery team were surprised that the age of the building was important. Initial MODA analysis had attempted to rank buildings’ risk by their age, with older buildings being more dangerous than newer ones. After discussion with DOB, it was apparent that age was binary: buildings constructed after the implementation of the 1938 Building Code are significantly safer than the buildings constructed prior to that year.|
|What agency procedures could account for data surprises?||The change in the New York City building code accounted for the importance of pre- and post-1938 safety.|
|How can the analysis be altered to produce a more accurate result?||Rather than apply a scale by age, MODA switched to a binary analysis that gave more risk weighting to buildings constructed prior to 1938.|
|How should the analysis be tested in the field (pilot)?||MODA and DOB agreed on a 60-day field pilot in Queens. MODA emailed a daily list (in Excel) of complaints prioritised by their data analysis. The Borough Command staff manually flagged the preceding day’s complaints for inspection.|
|How will we know if the pilot is successful?||After 30 days, and again after 60 days, MODA reviewed the history of inspections and violations to determine if the pilot was achieving its goal of reducing time-to-vacate for dangerous structures. Reducing the time between complaint filing and vacate was important to measure.
MODA also checked the number of vacates to make sure they were not simply going up due to the greater exposure being given to the project by the pilot.
MODA also observed the number of inspection attempts to make sure that improved results were not simply the outcome of increased effort.
|What systems are necessary to support the pilot and pilot measurement?||MODA needed to manually run the script each morning and email the results to Queens Borough Command.
Borough Command staff needed to manually annotate the prior day’s complaints based on the MODA list.
Automating the process.
The process should be reviewed at least twice a year to ensure that the data model created by MODA takes account of changes in the field.
|What system needs to be changed and how?||The DOB mainframe system could not use the MODA logic. Instead, MODA’s tech team developed a web-based service that caught the complaint upstream, between 311 and the delivery of the complaint to DOB.
The web service would analyse the complaint (and the relevant information about the property in DataBridge) to determine whether the complaint met MODA’s risk priority. The web service provided a priority flag (high or normal) and forwarded the information to DOB BIS.
BIS was updated to include a new field for the flag.
The DOB Borough Command prints lists of complaints with the priority flag included. Stop-gaps were built into the system to ensure that any significant downtime in the data model would not delay or disrupt the delivery of 311 complaints to DOB Borough Commands.
|How often will the solution be reviewed and calibrated?||Twice a year. In the case of the DOB filter, a community board inquiry led to an earlier review of the filter, however, detailed review revealed no need to change the logic.|
|Are we confident that this is not disruptive to the field?||DOB confirmed that no change in field operations was required.|
|How do we maintain the solution on an ongoing basis?||MODA’s tech team established a notification process with DOB to make sure that the system in place for the MODA filter is updated along with any underlying change to the DOB BIS technology|
Operational Implementation. The new automated system is launched.
|What education needs to be provided to staff in the field?||MODA and DOB needed to explain the significance of the flag on the complaint to Borough Commanders and building inspectors.|
|How will success be measured over time?||The MMR was changed to include a ‘time-to-vacate’ measurement to ensure that the filter is leading to the desired policy outcome of reducing the number of days that a dangerous apartment remains at risk.|