API Reference¶
Detectors (skclean.detectors)¶
|
For each sample, the percentage of it’s nearest neighbors with same label serves as it’s conf_score. |
|
Like KDN, but a trained Random Forest is used to compute pairwise similarity. |
|
For each sample, the percentage of it’s nearest neighbors with same label serves as it’s conf_score. |
Partitions dataset into n subsets, trains a classifier on each. |
|
|
Detects noise using a sequential Markov Chain Monte Carlo sampling algorithm. |
A set of classifiers are used to predict labels of each sample using cross-validation. |
|
Trains a Random Forest first- for each sample, only trees that didn’t select it for training (via bootstrapping) are used to predict it’s label. |
Handlers (skclean.handlers)¶
|
Removes from dataset samples most likely to be noisy. |
|
For quickly finding best cutoff point for Filter i.e. |
|
Iteratively detects and filters out mislabelled samples unless a stopping criterion is met. |
|
Iteratively detects and filters out mislabelled samples unless a stopping criterion is met. |
|
Simply passes conf_score (computed with detector) as sample weight to underlying classifier. |
Similar to regular bagging- except cleaner samples will be chosen more often during bagging. |
|
|
Implements costing, a method combining cost-proportionate rejection sampling and ensemble aggregation. |
Models (skclean.models)¶
|
Uses a random forest to to compute pairwise similarity/distance, and then a simple K Nearest Neighbor that works on that similarity matrix. |
|
Modifies the logistic loss using class dependent (estimated) noise rates for robustness. |
Pipeline (skclean.pipeline)¶
The imblearn.pipeline
module implements utilities to build a
composite estimator, as a chain of transforms, samples and estimators.
|
Sequentially applies a list of transforms and a final estimator. |
|
Construct a Pipeline from the given estimators. |
Noise Simulation (skclean.simulate_noise)¶
All labels are equally likely to be flipped, irrespective of their true label or feature. |
|
Class Conditional Noise: general version of flip_labels_uniform, a sample’s probability of getting mislabelled and it’s new (noisy) label depends on it’s true label, but not features. |
|
|
All labels are equally likely to be flipped, irrespective of their true label or feature. |
|
Class Conditional Noise: general version of flip_labels_uniform- a sample’s probability of getting mislabelled and it’s new (noisy) label depends on it’s true label, but not features. |
|
Boundary Consistent Noise- instances closer to boundary more likely to be noisy. |