Logistic regression is a fundamental machine learning algorithm used for binary classification tasks. In this tutorial, we'll delve into performing binary classification using logistic regression with the Scikit-Learn LogisticRegression class. We'll cover the following topics:
- Introduction to logistic regression
- Preparing data
- Training the model
- Prediction and accuracy check
- Conclusion
- Source code listing.
Let's get started.
Introduction to logistic regression
Logistic regression is a statistical method used for binary classification tasks. It models the probability that a given input belongs to a certain class, typically denoted as 1 or 0. Despite its name, logistic regression is a classification algorithm, not a regression algorithm.
In logistic regression, the input features are combined linearly using weights, and then the logistic function (also known as the sigmoid function) is applied to the result. The logistic function transforms the linear combination of inputs into a probability score between 0 and 1. This probability score represents the likelihood that the input belongs to the positive class.
Mathematically, the logistic regression model can be expressed as:
In this formula:
- is the probability of the dependent variable (y) being 1 given the value of the independent variable (x).
- is the base of the natural logarithm.
- and are coefficients (weights) that the model learns from the data.
This formula calculates the probability that the outcome is 1 (or true) given the value of the independent variable .
Preparing data
We'll start loading the necessary libraries for this tutorial. Make sure you have the sklearn library installed.
Next, we load the Breast Cancer dataset available in Scikit-Learn and split the dataset into training and testing sets using the train_test_split function from Scikit-Learn. We apply the StandardScaler to preprocess the features in the dataset.
Training the model
We create an instance of the logistic regression model using LogisticRegression() constructor. Here, we set the max_iter parameter to 200, which determines the maximum number of iterations.
After initializing the model, we train it using the training data. The fit() method is called on the model object, where we pass the scaled training features X_train_scaled and corresponding labels y_train.
Prediction and accuracy check
We use the trained logistic regression model to make predictions on the test data X_test. The predict() method is applied to the model object with the test features as input, resulting in predicted class labels y_pred.
We compute the accuracy of the model predictions by comparing the predicted class labels with the actual class labels from the test set. The accuracy_score() function from scikit-learn is used to calculate the accuracy as the fraction of correctly predicted labels over the total number of samples.
The classification report includes metrics such as precision, recall, F1-score, and support for each class, providing insights into the model's ability to correctly classify instances of each class.
The result looks as follows:
Conclusion
In this tutorial, we learned how to perform binary classification using logistic regression with binary dataset. We split the dataset into training and testing sets, scaled the feature data, trained a logistic regression model, and evaluated its performance on the test set. Logistic regression is a simple yet powerful algorithm for binary classification tasks, and it can be easily implemented using Scikit-Learn. The full source code is listed below.
Source code listing
OMG! This is so helpful man! Thank you so much for sharing this. I really appreciate it because this article/blog helped me a lot and you explained and showed it very clearly.
ReplyDelete