Hate Speech Detection Project

Dec 1, 2023 min read

Overview

Built and compared machine learning models for automated hate speech detection on social media, implementing both traditional ML approaches and state-of-the-art transformer models.

Problem Statement

Hate speech on social media platforms:

  • Harms targeted individuals and communities
  • Spreads rapidly and is difficult to moderate manually
  • Requires automated detection at scale
  • Needs nuanced understanding of context and language

Dataset

Worked with a dataset of 40,000+ social media text samples containing:

  • Hate speech examples
  • Offensive but non-hateful content
  • Neutral content

The multi-class nature of the problem required careful handling of class boundaries and imbalanced distributions.

Approach

Model 1: TF-IDF + Naive Bayes (Baseline)

Implemented a traditional ML pipeline:

  1. Text Preprocessing

    • Tokenization
    • Stop word removal
    • Lemmatization
  2. Feature Extraction

    • TF-IDF vectorization
    • N-gram features (unigrams and bigrams)
  3. Classification

    • Multinomial Naive Bayes classifier
    • Fast training and inference
    • Interpretable feature importance

Model 2: Fine-tuned BERT (Advanced)

Implemented a transformer-based approach:

  1. Pre-trained Model

    • BERT base model
    • Pre-trained on large text corpora
  2. Fine-tuning Strategy

    • Task-specific classification head
    • Learning rate scheduling
    • Early stopping to prevent overfitting
  3. Optimization

    • Weights & Biases hyperparameter sweep
    • Systematic search over learning rates, batch sizes, epochs

Hyperparameter Optimization

Used Weights & Biases for:

  • Automated hyperparameter sweeps
  • Experiment tracking and comparison
  • Visualization of training metrics
  • Model version control

Key Hyperparameters Tuned

ParameterSearch RangeOptimal Value
Learning Rate1e-5 to 5e-52e-5
Batch Size16, 3232
Epochs3-54
Warmup Steps0-500100

Results

Performance Comparison

ModelTest AccuracyF1 Score
Naive Bayes72.3%0.69
BERT (Fine-tuned)79.5%0.77

Key Findings

  • BERT significantly outperformed traditional ML approaches
  • Early stopping was crucial to prevent overfitting
  • Class imbalance required careful handling through weighted loss

Error Analysis

Conducted comprehensive error analysis revealing:

  • False Positives: Often triggered by offensive-but-not-hateful content
  • False Negatives: Missed subtle or coded hate speech
  • Challenging Cases: Sarcasm, context-dependent meaning

Documented Limitations

  • Difficulty with emerging slang and coded language
  • Context window limitations
  • Potential bias in training data

Research Paper

Documented all findings in a full research paper covering:

  • Literature review
  • Methodology
  • Experimental results
  • Error analysis
  • Future work recommendations

Skills Applied

  • BERT (Language Model)
  • Naive Bayes Classification
  • Hyperparameter Optimization
  • Natural Language Processing
  • Experiment Tracking (W&B)
  • Research Writing