Counterfactual evaluation of learning systems

Haven't read the paper yet. But I'm interested in learning how to handle off-policy learning. eg you train classifier on dataset X0 which was produced by a previous classifier. The classifier will now select a distribution different than X0. How will that impact the error?