popmon: Analysis Package for Dataset Shift Detection

Simon Brugman
ING Analytics Wholesale Banking

Tomas Sostak

Pradyot Patil
ING Analytics Wholesale Banking

Max Baak
ING Analytics Wholesale Banking


popmon is an open-source Python package to check the stability of a tabular dataset. popmon creates histograms of features binned in time-slices, and compares the stability of its profiles and distributions using statistical tests, both over time and with respect to a reference dataset. It works with numerical, ordinal and categorical features, on both pandas and Spark dataframes, and the histograms can be higher-dimensional, e.g. it can also track correlations between sets of features. popmon can automatically detect and alert on changes observed over time, such as trends, shifts, peaks, outliers, anomalies, changing correlations, etc., using monitoring business rules that are either static or dynamic. popmon results are presented in a self-contained report.


dataset shift detection, population shift, covariate shift, histogramming, profiling



