Data Analyst Intern at Bluebono

Overview

At Bluebono, I designed and implemented a comprehensive loan-to-value estimation and market-score evaluation pipeline for residential properties. This project covered the full data science lifecycle from project scoping and data acquisition to model deployment.

Key Contributions

Data Acquisition & Vendor Management

Directed dataset acquisition efforts by:

Evaluating multiple data vendors and their offerings
Negotiating contracts for optimal pricing and data access
Integrating a real-time data feed from the California Regional Multiple Listing Service (CRMLS) via the Trestle API

This established a reliable, up-to-date data pipeline for property information across California.

Large-Scale Data Engineering

Engineered and processed a massive dataset consisting of:

11.3 million property records
1,000+ market features per property

This required building robust ETL pipelines capable of handling data at scale while maintaining data quality and consistency.

Feature Engineering & Dimensionality Reduction

Applied unsupervised feature clustering techniques to:

Group highly correlated variables together
Reduce dimensionality by 65%
Improve model interpretability without sacrificing predictive power

This approach made the resulting models more explainable for business stakeholders while maintaining strong performance.

Technical Stack

Data Processing: Python, Pandas, SQL
APIs: Trestle API (CRMLS)
ML Techniques: Unsupervised Clustering, Feature Selection
Scale: 11.3M+ records, 1000+ features

Impact

The pipeline I built enables Bluebono to provide accurate property valuations and market scores, helping clients make informed decisions in the competitive California real estate market.