Brain-Tumor

The project focuses on analyzing tumor data using machine learning techniques, including logistic regression and random forest classification. This README file provides an overview of the project and explains each aspect of the code.

Introduction

In this project, we aim to analyze tumor data and predict the presence or absence of a tumor using machine learning algorithms. We employ logistic regression and random forest classification techniques to train models on the provided dataset. The project involves several steps, including data exploration, preprocessing, model training, and evaluation.

Data Overview

This is a brain tumor feature dataset including five first-order features and eight texture features with the target level (in the column Class).

Image: Image name
Class: value Tumor = 1 Non tumor =0
Mean: First order feature mean
Variance: First order feature variance
Standard Deviation: First order feature std deviation
Entropy: Second order feature entropy
Skewness: First order feature skewness
Kurtosis: First order feature kurtosis
Contrast: Second order feature contrast
Energy: Second order feature energy
ASM: Second order feature ASM
Homogeneity: Second order feature homogeneity
Dissimilarity: Second order feature dissimilarity
Correlation: Second order feature correlation
Coarseness: Second order feature coarness

Image column defines image name and Class column defines either the image has tumor or not (1 = Tumor, 0 = non-Tumor)

Installation

To run the code locally, please ensure you have the following dependencies installed:

pandas
seaborn
matplotlib
pandasgui
scikit-learn

You can install these dependencies using pip:

pip install pandas seaborn matplotlib pandasgui scikit-learn

Usage

To use the code, follow these steps:

Clone the repository or download the code files.
Make sure the required libraries are installed (see Installation section).
Place the tumor data file (tumor.csv) in the same directory as the code files.
Run the code in a Python environment or Jupyter Notebook.

Data

The tumor data is stored in a CSV file named tumor.csv. It contains information about different features of tumors and their corresponding classes (presence or absence of a tumor). The data is loaded into a pandas DataFrame and displayed using the pandasgui library, which provides an interactive interface for exploring the dataset.

Exploratory Data Analysis

The code begins with an exploratory data analysis section, where various aspects of the dataset are examined. The following operations are performed:

Displaying the loaded DataFrame using pandasgui to visualize the data.
Describing the dataset to obtain summary statistics.
Checking for missing values in the dataset.
Dropping the 'Coarseness' column from the DataFrame due to its negligible average and no effect on the dataset.
Creating a correlation matrix and generating a heatmap using seaborn to visualize the correlations between different features.

Data Preprocessing

In the data preprocessing section, the dataset is split into input features (X) and the target variable (y). The 'Class' and 'Image' columns are excluded from the input features. The data is further divided into training and testing sets using a 90:10 ratio.

Logistic Regression

A logistic regression model is created using a pipeline that includes standard scaling of the features and logistic regression as the classifier. Hyperparameter tuning is performed using grid search with cross-validation. The best hyperparameters are determined, and a new logistic regression model is trained with these optimal hyperparameters. The model is then evaluated using a classification report and a confusion matrix plot.

Random Forest Classification

A random forest classifier model is created using a pipeline that includes standard scaling of the features and a random forest classifier. Hyperparameter tuning is performed using grid search with cross-validation. The best model obtained from the grid search is selected. The model is evaluated using a classification report and a confusion matrix plot.

Results

The results of the logistic regression and random forest classification models are displayed using classification reports and confusion matrix plots. These results provide insights into the performance of the models in predicting tumor presence or absence.

Conclusion

In this project, we successfully analyzed tumor data using logistic regression and random forest classification. The models were trained and evaluated on the dataset, providing classification reports and confusion matrix plots as performance metrics. The code and results presented in this README can serve as a guide for future analysis or improvements in the field of tumor classification.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
classification.ipynb		classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Brain-Tumor

Table of Contents

Introduction

Data Overview

Installation

Usage

Data

Exploratory Data Analysis

Data Preprocessing

Logistic Regression

Random Forest Classification

Results

Conclusion

About

Releases

Packages

Languages

License

Nishef/Brain-Tumor

Folders and files

Latest commit

History

Repository files navigation

Brain-Tumor

Table of Contents

Introduction

Data Overview

Installation

Usage

Data

Exploratory Data Analysis

Data Preprocessing

Logistic Regression

Random Forest Classification

Results

Conclusion

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages