Don’t Let Anything Fall Through the Cracks

In Shehzad’s recent post on model risk management, he mentioned some of the technical means that we use to support RDC’s control and oversight of our predictive models. Model risk management at RDC begins and ends with the Model Risk Governance function, under Shehzad’s oversight.

Technology and data science provide tools that help the governance function do their job more effectively and efficiently. Predictive models are statistical in nature, and the tools we use to control them are also based on statistics. The first of these is model auditing or model monitoring.

Ground Truth vs Model Prediction for AI Review: Large European Bank

Above, the “V-Graph” that I described in the webinar How AI Review Works.  (explanation begins at 4:30). A machine learning model only produces numbers, in AI Review the numbers represent alert or not alert. On the graph, green = no alert, blue= alert sent to a level 2 analyst.

To realize the efficiency gains promised by AI Review, we define a lower threshold. Every potential alert whose AI Review prediction is below the lower threshold is a predicted negative – that is, the model predicts, at a certain level of confidence, that a level-1 risk analyst at the client would have classified it as a “No Alert.” Just as the model’s prediction is based on human activity, model auditing is analogous to oversight of the human-powered risk analysis operation: some of the predicted negatives are passed through and reprocessed by more senior risk analysts. The choice of how many predicted negatives to pass through is a distillation of the tradeoff between risk management and cost: passing more predicted negatives through drives the probability of a false negative down more than passing fewer, but also causes more work for operations.

During AI Review setup, the user interface includes a feedback message to let the customer understand the tradeoff:

An administrator fills in the values for lower threshold and audit rate for both person and organization. The setup interface updates the residual false-negative probability based on these values and the test data set used to create the customer’s AI Review model.

RDC tracks the audited negatives, and alerts the Model Risk Governance function if any of the predicted negatives was labeled an alert – a positive – by an analyst. This model audit event is reviewed by RDC’s Model Risk Governance function, possibly in conjunction with the client’s senior risk analysts. If they determine that the auditor was in error – that the predicted negative was in fact a true negative – then the model audit event is labeled a false audit event and the process ends, at least from RDC’s point of view. If, however, they determine that the predicted negative was in fact an alert, and therefore a false negative, then the model audit event is labeled a verified audit event. A verified audit event indicates that the lower threshold may be set too high, or that the model needs to be re-trained and re-verified, or both. In any case, the customer and RDC may agree to retroactively audit a larger sample of the predicted negatives that have been made up to that point by the model.

Model auditing is just one of the statistical tools that RDC uses to manage our model risk. In the next blog post, I’ll cover model retraining, concept drift and semantic velocity.