Classification
Before starting any of these examples, please ensure that you installed Pycaret=>2.2
in your working environment. You can use pip install pycaret==2.3.8
to install it too.
Basic Pycaret AutoML Binary classification
Example: AutoMLBasicBinaryClassification.xircuits
In this example, you will learn how to build a basic Pycaret classification application that reads a tabular dataset,setup environment, compare training on multiple ML models, fine-tune models, plot results and save the trained model.
- To start the workflow,first you will need to get a dataset with
GetData
. Here we chose the credit dataset. Additionally,SampleTestData
could be used to set-aside a testing dataset. - To setup the Pycaret AutoML environment you will need the
SetupClassification
. This component initializes the training environment and creates the transformation pipeline.SetupClassification
component must be present before executing any other component. It takes two mandatory parameters:in_dataset
andtarget(label column)
. All the other parameters are optional. CompareModelsClassification
: This component trains and evaluates performance of all estimators available in the model library using cross validation.The output of this component is a score grid with average cross validated scores. Additionally, it output a list of the top performing models, number of top model returned can be controlled by the num_top input.
Compare Model Output
CreateModelClassification
: This component trains and evaluates the performance of a given model using cross validation.The output of this component is a score grid with CV scores by fold and the created model.TuneModelClassification
: This component tunes the hyperparameter of a given model, in this case the output model fromCreateModelClassification
. The output of this component is a score grid with CV scores by fold of the best selected model based on optimize parameter and the tuned model.PlotModelClassification
: This component analyzes the performance of a trained model on holdout set. the type of the plot wanted could be set in plot_type.
Plot Feature Graph
PredictModelClassification
: This component predicts Label and Score (probability of predicted class) using a trained model. When the predict_dataset input is None, it predicts label and score on the holdout(validation) set.FinalizeModelClassification
: This component trains a given estimator on the entire dataset including the holdout set.AutoMLClassification
: This component returns the best model out of all trained models in current session based on the input optimize parameter(default optimize is Accuracy).SaveModelClassification
: This component saves the transformation pipeline and trained model object into the current working directory as a pickle file for later use.
Pycaret AutoML Model Operation
Example: AutoMLClassificationBlendModels.xircuits
In this example, you will learn how to build to apply transformation on the dataset, Blend the top performing model into one model and calibrate the model.
As with the previous example, you would start with a
GetData
andSampleTestData
.To perform transformation on the dataset you would require to pass parameters to the transformation inputs in
SetupClassification
. available dataset transform operation:normalize : when set to True, it transforms the numeric features by scaling them to a given range.
transformation : when set to True, it applies the power transform to make data more Gaussian-like.
ignore_low_variance : When set to True, all categorical features with insignificant variances are removed from the data.
remove_multicollinearity : When set to True, features with the inter-correlations higher than the defined threshold are removed.
multicollinearity_threshold :Threshold for correlated features. Ignored when remove_multicollinearity is not True.
bin_numeric_features : To convert numeric features into categorical,It takes a list of strings with column names that are related.
group_features : When the dataset contains features with related characteristics, group_features parameter can be used for feature extraction. It takes a list of strings with column names that are related.
More data transformation techniques from Pycaret could be added to the
SetupClassification
component simply by adding new inputs to the component script.
BlendModelsClassification
: This component trains a Soft Voting / Majority Rule classifier for select models passed in the top_model list. Here, you could pass a list of top models from theCompareModelsClassification
component or create multiple models and link them to the model_1,model_2,model_3 inputs.CalibrateModelClassification
: This component calibrates the probability of a given estimator using isotonic or logistic regression. The output of this function is a score grid with CV scores by fold and calibrated model.logging
: This component save all the trained models logs to MLflow dashboard can access at localhost:5000, to activate logging you will need to set log_experiment inSetupEnvironmentClassification
component to True.
Basic Pycaret AutoML Multiclass classification
AutoMLBasicMulticlassClassification.xircuits
In this example, we have used the same components to build a basic Pycaret Multiclass classification that reads a tabular dataset. We setup the environment, compare training on multiple ML models, fine-tune models, then pick and save the best performance trained model.
- For this example we also fetch the dataset with
GetData
, but chose the iris dataset which has multiple classes. - In the
SetupClassification
, we need to ensure the target input is supplied with the dataset label column name(species). PlotModelClassification
gives us insights on the model performance. The following are some plots that can be generated: