Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

By combining Transformer models for feature extraction with deep learning classifiers and optimizing them using a Genetic Algorithm, predicting the breed of cats and dogs.

NotificationsYou must be signed in to change notification settings

Bevinaa/Breed-Classification-of-Cats-and-Dogs-Using-Transformer-Architecture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Tech Stack

Status

Overview

This project introduces a robust and scalable deep learning framework tailored forfine-grained classification of cat and dog breeds, a task that presents challenges due to high inter-class similarity (between breeds) and intra-class variation (within the same breed). The proposed pipeline leverages thelatest advancements in computer vision, specificallyTransformer-based architectures likeVision Transformer (ViT),Swin Transformer,DeiT,BEiT, andConvNeXt, to extract rich, high-level semantic features from pet images. These models are known for their ability to capture long-range dependencies and fine-grained details, enabling better differentiation between visually similar breeds.

The extracted features are fused and passed into severaldeep neural classifiers, with a focus on a customizedResidual Multi-Layer Perceptron (Residual MLP) architecture. This classifier includes skip connections inspired by ResNet, which facilitate stable and efficient training of deeper models by preserving gradient flow.

To further enhance classification performance and generalization, we employ aGenetic Algorithm (GA) forhyperparameter optimization. This metaheuristic approach intelligently searches the hyperparameter space—tuning parameters such as learning rate, dropout rate, and weight decay—to find the most optimal configuration for training. This results in significantly improved validation accuracy while avoiding overfitting.


Key Features

  • Transformer-based feature extraction using:

    • ViT-B/16
    • Swin Transformer
    • BEiT
    • DeiT
    • ConvNeXt
  • Classifier Architectures:

    • Basic MLP
    • Deep MLP
    • Residual MLP (Best Accuracy)
    • ResNet-style, EfficientNet-style, DenseNet-style classifiers
  • Metaheuristic Optimization:

    • Genetic Algorithm (GA) for hyperparameter tuning
  • Robust Evaluation:

    • Accuracy, Precision, Recall, F1-score, Confusion Matrix
    • CAM-based feature visualization
    • t-SNE plots for class separability
    • Edge-based shape and texture analysis

Visual Architecture

image

Figure: Proposed Feature Extraction and Classification Pipeline


Dataset

  • Oxford-IIIT Pet Dataset
    • 7,349 images across 35 cat and dog breeds
    • Approx. 200 images per class
    • Includes bounding boxes, labels, and segmentation masks

Tools and Technologies

  • Python, PyTorch
  • NumPy, Scikit-learn, Matplotlib, Seaborn
  • DEAP (for Genetic Algorithm)
  • Pretrained models fromtimm orhuggingface

How It Works

  1. Preprocessing
    • Resize to 224×224
    • Normalize using ImageNet mean and std
    • Label encode breed classes

image

  1. Feature Extraction
    • Use pretrained transformer backbones
    • Concatenate features to form a composite matrix of shape(7349 × 4352)

image

  1. Classification
    • Train classifiers on extracted features
    • Residual MLP yields highest validation accuracy (96.41%)

image

  1. Optimization
    • GA tunes learning rate, dropout, and weight decay
    • Best configuration:
      • LR: 0.00138 | Dropout: 0.2869 | WD: 0.00025

Deep Learning Model Performance

ClassifierAccuracy (%)
Basic MLP96.14
Deep MLP96.35
Residual MLP96.35 [HIGHEST]
MLP + Dropout/BatchNorm96.14
DenseNet-style96.32
EfficientNet-style96.28
ResNet-style95.94

Evaluation Visuals

Class Activation Maps (CAM)

image

Figure: Class Activation Map showing important features of an Abyssinian cat

t-SNE Plot

image

Figure: Class-wise separability using t-SNE

Model Predictions

image

Prediction: Abyssinian | Confidence: 100.00%

image

Prediction: Beagle | Confidence: 99.98%


Future Work

  • AutoML tools (Optuna, Ray Tune)
  • Expand to wildlife and ecological datasets
  • Use of Explainable AI (Grad-CAM, SHAP, LIME)
  • Ensemble learning and data augmentation
  • Video-based temporal classification (CNN+LSTM)

How to Run

git clone https://github.com/Bevinaa/Breed-Classification-of-Cats-and-Dogs-Using-Transformer-Architecturecd Breed-Classification-of-Cats-and-Dogs-Using-Transformer-Architecture# Run feature extractionpython extract_features.py# Train classifierpython train_classifier.py# Evaluate modelpython evaluate_model.py# Predict the breedpython predict_pet.py

Contact

Bevina R.
Email:bevina2110@gmail.com
GitHub:Bevinaa


About

By combining Transformer models for feature extraction with deep learning classifiers and optimizing them using a Genetic Algorithm, predicting the breed of cats and dogs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp