Overview
Built and compared machine learning models for automated hate speech detection on social media, implementing both traditional ML approaches and state-of-the-art transformer models.
Problem Statement
Hate speech on social media platforms:
- Harms targeted individuals and communities
- Spreads rapidly and is difficult to moderate manually
- Requires automated detection at scale
- Needs nuanced understanding of context and language
Dataset
Worked with a dataset of 40,000+ social media text samples containing:
- Hate speech examples
- Offensive but non-hateful content
- Neutral content
The multi-class nature of the problem required careful handling of class boundaries and imbalanced distributions.
Approach
Model 1: TF-IDF + Naive Bayes (Baseline)
Implemented a traditional ML pipeline:
Text Preprocessing
- Tokenization
- Stop word removal
- Lemmatization
Feature Extraction
- TF-IDF vectorization
- N-gram features (unigrams and bigrams)
Classification
- Multinomial Naive Bayes classifier
- Fast training and inference
- Interpretable feature importance
Model 2: Fine-tuned BERT (Advanced)
Implemented a transformer-based approach:
Pre-trained Model
- BERT base model
- Pre-trained on large text corpora
Fine-tuning Strategy
- Task-specific classification head
- Learning rate scheduling
- Early stopping to prevent overfitting
Optimization
- Weights & Biases hyperparameter sweep
- Systematic search over learning rates, batch sizes, epochs
Hyperparameter Optimization
Used Weights & Biases for:
- Automated hyperparameter sweeps
- Experiment tracking and comparison
- Visualization of training metrics
- Model version control
Key Hyperparameters Tuned
| Parameter | Search Range | Optimal Value |
|---|---|---|
| Learning Rate | 1e-5 to 5e-5 | 2e-5 |
| Batch Size | 16, 32 | 32 |
| Epochs | 3-5 | 4 |
| Warmup Steps | 0-500 | 100 |
Results
Performance Comparison
| Model | Test Accuracy | F1 Score |
|---|---|---|
| Naive Bayes | 72.3% | 0.69 |
| BERT (Fine-tuned) | 79.5% | 0.77 |
Key Findings
- BERT significantly outperformed traditional ML approaches
- Early stopping was crucial to prevent overfitting
- Class imbalance required careful handling through weighted loss
Error Analysis
Conducted comprehensive error analysis revealing:
- False Positives: Often triggered by offensive-but-not-hateful content
- False Negatives: Missed subtle or coded hate speech
- Challenging Cases: Sarcasm, context-dependent meaning
Documented Limitations
- Difficulty with emerging slang and coded language
- Context window limitations
- Potential bias in training data
Research Paper
Documented all findings in a full research paper covering:
- Literature review
- Methodology
- Experimental results
- Error analysis
- Future work recommendations
Skills Applied
- BERT (Language Model)
- Naive Bayes Classification
- Hyperparameter Optimization
- Natural Language Processing
- Experiment Tracking (W&B)
- Research Writing