API Reference¶

Detectors (skclean.detectors)¶

`skclean.detectors.KDN`([n_neighbors, weight, …])	For each sample, the percentage of it’s nearest neighbors with same label serves as it’s conf_score.
`skclean.detectors.ForestKDN`([n_neighbors, …])	Like KDN, but a trained Random Forest is used to compute pairwise similarity.
`skclean.detectors.RkDN`([n_neighbors, …])	For each sample, the percentage of it’s nearest neighbors with same label serves as it’s conf_score.
`skclean.detectors.PartitioningDetector`([…])	Partitions dataset into n subsets, trains a classifier on each.
`skclean.detectors.MCS`([classifier, n_steps, …])	Detects noise using a sequential Markov Chain Monte Carlo sampling algorithm.
`skclean.detectors.InstanceHardness`([…])	A set of classifiers are used to predict labels of each sample using cross-validation.
`skclean.detectors.RandomForestDetector`([…])	Trains a Random Forest first- for each sample, only trees that didn’t select it for training (via bootstrapping) are used to predict it’s label.

`skclean.handlers.Filter`(classifier[, …])	Removes from dataset samples most likely to be noisy.
`skclean.handlers.FilterCV`(classifier[, …])	For quickly finding best cutoff point for Filter i.e.
`skclean.handlers.CLNI`(classifier, detector)	Iteratively detects and filters out mislabelled samples unless a stopping criterion is met.
`skclean.handlers.IPF`(classifier, detector[, …])	Iteratively detects and filters out mislabelled samples unless a stopping criterion is met.
`skclean.handlers.SampleWeight`(classifier[, …])	Simply passes conf_score (computed with detector) as sample weight to underlying classifier.
`skclean.handlers.WeightedBagging`([…])	Similar to regular bagging- except cleaner samples will be chosen more often during bagging.
`skclean.handlers.Costing`([classifier, …])	Implements costing, a method combining cost-proportionate rejection sampling and ensemble aggregation.

`skclean.models.RobustForest`([method, K, …])	Uses a random forest to to compute pairwise similarity/distance, and then a simple K Nearest Neighbor that works on that similarity matrix.
`skclean.models.RobustLR`([PN, NP, C, …])	Modifies the logistic loss using class dependent (estimated) noise rates for robustness.

The imblearn.pipeline module implements utilities to build a composite estimator, as a chain of transforms, samples and estimators.

`skclean.pipeline.Pipeline`(**kwargs)	Sequentially applies a list of transforms and a final estimator.
`skclean.pipeline.make_pipeline`(steps, *kwargs)	Construct a Pipeline from the given estimators.

`skclean.simulate_noise.flip_labels_uniform`(Y, …)	All labels are equally likely to be flipped, irrespective of their true label or feature.
`skclean.simulate_noise.flip_labels_cc`(y, lcm)	Class Conditional Noise: general version of flip_labels_uniform, a sample’s probability of getting mislabelled and it’s new (noisy) label depends on it’s true label, but not features.
`skclean.simulate_noise.UniformNoise`(noise_level)	All labels are equally likely to be flipped, irrespective of their true label or feature.
`skclean.simulate_noise.CCNoise`([lcm, …])	Class Conditional Noise: general version of flip_labels_uniform- a sample’s probability of getting mislabelled and it’s new (noisy) label depends on it’s true label, but not features.
`skclean.simulate_noise.BCNoise`(classifier, …)	Boundary Consistent Noise- instances closer to boundary more likely to be noisy.