PythonPyTorchYOLOv11Computer VisionML EngineeringOpenCV

Computer Vision Fashion Detection

I built a continuous machine learning system for real-time fashion detection across 10 garment categories. Trained custom YOLOv11 models on Kaggle fashion datasets with multi-device GPU support (Apple Silicon MPS, NVIDIA CUDA, CPU) and deployed real-time inference with combined detection and pose estimation.

Role

Solo Developer: Training pipeline, model evaluation, real-time inference

Duration

2 months

Timeline

2024

The Challenge

I set out to build a real-time clothing detection system that could identify what garment someone is wearing, where it is in the frame, and classify it by type. The core challenge split into two problems: sourcing quality labeled fashion data (Kaggle had garment type labels but almost nothing for style attributes), and building a training pipeline that could run on consumer hardware without requiring expensive cloud GPU time for every iteration.

The Stack

The system uses YOLOv11 for detection, PyTorch as the ML framework, and OpenCV for video processing. Training runs on Google Colab for cloud GPU access, with local fallback support for Apple Silicon (MPS), NVIDIA (CUDA), or CPU.

Software

Python 3.12
YOLOv11 (Ultralytics)
PyTorch (MPS / CUDA / CPU)
OpenCV (real-time video processing)
Google Colab (cloud GPU training)
Kaggle Datasets (10 garment classes)
Jupyter Notebooks (experimentation)

The Logic

The training script handles automatic device detection (MPS for Apple Silicon, CUDA for NVIDIA, or CPU fallback), checkpoint management to avoid redundant training, and model validation with mAP metrics. This architecture allows seamless fashion model training across different hardware environments without code changes.

Python

def check_device():
    """Check available GPU: MPS, CUDA, or CPU."""
    if torch.backends.mps.is_available():
        device = "mps"
        print(f"[INFO] Using Apple M1/M2 GPU (MPS)")
    elif torch.cuda.is_available():
        device = "cuda"
        print(f"Using torch {torch.__version__} "
              f"({torch.cuda.get_device_properties(0).name})")
    else:
        device = "cpu"
        print(f"[WARNING] Using torch {torch.__version__} "
              f"(CPU - slow)")
    return device

def train_fashion_model():
    """Train with checkpoint management."""
    if TRAINED_MODEL_PATH.exists():
        print(f"[INFO] Trained model found at: "
              f"{TRAINED_MODEL_PATH}")
        return YOLO(str(TRAINED_MODEL_PATH))

    device = check_device()
    model = YOLO(PRETRAINED_MODEL)

    results = model.train(
        data=str(DATA_YAML),
        epochs=EPOCHS,
        batch=BATCH_SIZE,
        imgsz=IMG_SIZE,
        device=device,
        patience=50,
        save=True,
        plots=True,
        val=True,
        amp=False if device == "mps" else True
    )

    best_weights = (PROJECT_ROOT / "runs" / "detect"
        / "fashion_training" / "weights" / "best.pt")
    shutil.copy(best_weights, TRAINED_MODEL_PATH)
    return model

Failure Log

Three critical iterations shaped the final architecture.

v1: TensorFlow on Mac

Issue

Started training with TensorFlow on a MacBook Pro but consistently ran out of RAM during model training. Local training would crash after a few epochs, making iteration impossible.

Resolution

Migrated to Google Colab for cloud GPU access and switched to PyTorch/YOLOv11. This gave me access to high-memory GPUs and faster training cycles. Kept local inference capability for deployment.

v2: Real-time Performance

Issue

Real-time webcam inference was stuttering badly on the local machine. Frame rate dropped below usable levels with full-resolution frames.

Resolution

Downscaled video output resolution and optimized the inference loop. Reduced display resolution while maintaining model input size, achieving smooth real-time detection.

v3: Dataset Limitations

Issue

Public Kaggle datasets provided good garment type labels but lacked fine-grained style attribute data. This limited what the model could classify beyond basic clothing categories.

Resolution

Accepted limitation for proof of concept. For production, would need to build a proprietary dataset from user-generated data with custom attribute labeling.

The Outcome

The final system trains in Colab and runs real-time inference locally via terminal, with clothing classification and human pose estimation running simultaneously on a webcam feed, identifying garment types with bounding boxes and confidence scores. The model detects 10 garment classes (jacket, shirt, pants, dress, skirt, shorts, hat, sunglasses, bag, shoe) reliably in well-lit environments, though style attribute classification remains unsolved without better training data. This project built my foundation in ML engineering: PyTorch training pipelines, cloud compute workflows, and the gap between 'model works in a notebook' and 'model runs in real-time on live video.'

Next projectAliveCor Product Development