Image Classification Using Feature Extraction

Introduction

The rising occurrence of melanoma has led to the development of computer-aided diagnosis systems for classifying dermoscopic images. The PH² dataset, a dermoscopic image database, was created to enable comparative studies on segmentation and classification algorithms.

Dataset:

There are 200 images with their lesion mask, of 3 classes (80 common nevi, 80 atypical nevi, and 40 melanomas).

Download data

Working

This code is a Python script for detecting skin lesions and classifying them into three types: Common Nevus, Atypical Nevus, and Melanoma. The script imports necessary libraries, defines functions to process and analyze images, and extracts various features from the images. These features include color values, symmetry, irregularity, and average phase.

The dataset used in this script is the PH2 dataset, and the file paths are specific to Google Colab's file structure. The script reads in the dataset, processes the images, and extracts features for each image. The features are then used to calculate average values for each class. The code also visualizes the data in 3D scatter plots for better understanding of the differences between classes.

At the end, the script calculates and prints the differences between the average values of each class to provide insight into the classification process.

Steps

Get file details

Read the file “PH2_dataset.txt” and extract the classes and all the images in the class. It return a dictionary of {class: [files of the classes]}

File avalaible on my github for your use.

Choosing the parameters

In the start I implemented a lot of random parameters, then I judged each parameter on the base of its quality (# of right classifications it makes). Then I remove the bad parameters and just use the good parameters, there is a lot of space for improvement in this part.

Calculate Average Class parameters

It’s a nested loop that, that visits and apply the parameter finding techniques on each image and at the end it calculates the average of the parameters for each class. we can see that the segregation between Atypical and common is not that good.

Classification:

These are these 16 parameters I used in my Final code. all the other parameters were giving errors. The classification is based in Voting mechanism rather then Distance. So I can check the quality of each parameter separately.

Parameters

Symmetry:

This function basically cuts down the image from vertical center, and horizontal center, mirror one half and Xor it to the other half. Which gives all the points which are not symmetric on the given axis

by xor-ing we can calculate the amount of curvature of the mole. and the #of white pixels define the shape.

Colors:

All the color operations are dealt with within this function, it takes the image split it in BGR channels , then we calculate the histogram of the colors and find the averages, mode and standard deviation of the given images.

Irregularity & Phase:

Irregularly is to apply a threshold on the image and count the white pixels, it give the over all gray distribution over the image. and phase is calculated by Applying the horizontal and vertical Sobel, and return the magnitudes and avg (phases) of the image. As the parameters

Output:

The code is sensitive to the parameters so maybe the code wont give the exact accuracy. But it ranges between 58-75% for different training and testing sets. The final accuracy over whole data was 63.3%

Implementation

An implementation of the model has been coded in python which can be found on Github you can also go to the Colab.

Acknowledgements

This was a really Interesting and Challanging task. And I would like to Thank Dr Usman Akram for assigning it as an assignment.

Page updated

Google Sites

Report abuse