screenager.dev

4 min read

Cat or not a cat with Logistic Regression: Part 1 - Understanding the Basics

Tejas Mahajan
Tejas Mahajan
@the_screenager

Binary classification, a fundamental concept in machine learning, involves categorizing data into one of two distinct classes. Like predicting whether an email is spam or not, whether an image is of a cat or not, or whether a tumor is malignant or benign.

In this blog post, we will explore the concept of binary classification, focusing on logistic regression. We will cover the mathematical foundations, practical implementation, and provide interactive examples to illustrate the concepts.

Logistic Regression BannerGenerated with xAI Grok

Binary Classification?

Let's say we have an 64×6464 \times 64 image of a cat and we want to classify it as either a cat or not a cat. This is a binary classification problem, where the two classes are "cat" and "not cat". The goal is to develop a model that can accurately predict the class of new images based on the features extracted from the images.

Image Decomposition

R
G
B
0
0
0
12,288 values
Click to decompose the image into RGB channels

Every image can be represented as a vector of pixel values. For example, a 64×6464 \times 64 image can be represented as a vector of 64×64×3=1228864 \times 64 \times 3 = 12288 values, where each value represents the intensity of a pixel in one of the three color channels (red, green, blue).

So, the feature vector for a 64×6464 \times 64 image can be represented as:

x=[x1x2x12288]\mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_{12288} \end{bmatrix}

where xix_i is the intensity of the ithi^{th} pixel in the image. Let n=nx=12288n = n_x = 12288 be the number of features in our dataset. The feature vector x\mathbf{x} is a n×1n \times 1 vector, where nn is the number of features (pixel values). In this case, n=12288n = 12288.

and the target variable can be represented as:

y={1if the image is a cat0if the image is not a caty = \begin{cases} 1 & \text{if the image is a cat} \\ 0 & \text{if the image is not a cat} \end{cases}

so if we have mm training examples in our dataset, we can represent the training set as:

X=[x1(1)x1(2)x1(m)x2(1)x2(2)x2(m)xnx(1)xnx(2)xnx(m)]\mathbf{X} = \begin{bmatrix} x^{(1)}_1 & x^{(2)}_1 & \cdots & x^{(m)}_1 \\ x^{(1)}_2 & x^{(2)}_2 & \cdots & x^{(m)}_2 \\ \vdots & \vdots & \ddots & \vdots \\ x^{(1)}_{n_x} & x^{(2)}_{n_x} & \cdots & x^{(m)}_{n_x} \end{bmatrix}

where XRnx×m\mathbf{X}\in \mathbb{R}^{n_x \times m} is the feature matrix with X.shape=(n_x, m), and mm is the number of training examples. The target variable can be represented as:

Y=[y(1)y(2)y(m)]\mathbf{Y} = \begin{bmatrix} y^{(1)} & y^{(2)} & \cdots & y^{(m)} \end{bmatrix}

where yR1×m\mathbf{y} \in \mathbb{R}^{1 \times m} is the target variable vector with Y.shape=(1, m), representing the output labels for each training example.

Logistic Regression

Logistic regression is a statistical method used for binary classification. It models the probability that a given input belongs to a particular class. The logistic regression model uses the logistic function (also known as the sigmoid function) to map any real-valued number into the range of 0 and 1.

In simple words, Given x\mathbf{x}(the input image), we want y^=P(y=1x)\hat{y} = \mathbf{P}(y=1|\mathbf{x}) or what is the probability that the image is a cat given the pixel values.

here, xRnx\mathbf{x} \in \mathbb{R}^{n_x} and 0y^10 \le \hat{y} \le 1, and we have some extra parameters wRnxw \in \mathbb{R}^{n_x} and bRb \in \mathbb{R}, where ww is the weight vector and bb is the bias term.

Output,

y^=σ(wTx+b), where σ(z)=11+ez\hat{y} = \sigma(w^T \mathbf{x} + b), \text{ where } \sigma(z) = \frac{1}{1 + e^{-z}}

is the sigmoid function. The output y^\hat{y} is interpreted as the probability that the input x\mathbf{x} belongs to class 1 (cat). If y^0.5\hat{y} \ge 0.5, we classify the input as class 1 (cat), otherwise we classify it as class 0 (not cat).

Sigmoid Function: σ(z) = 1/(1+e^(-z))
z = w^T x + b = 0.00
σ(z):0.500000
Prediction:Cat
-1010
The sigmoid function maps any input to a value between 0 and 1, making it ideal for binary classification.

This is the basic idea of logistic regression. In the next post, we will discuss the cost function, gradient descent optimization, and implement a complete logistic regression model from scratch.

Stay tuned for the next part of this series!