Resume

⬇ Download PDF

213-248-9142 | qiang_zeng@berkeley.edu | Berkeley, CA | LinkedIn | GitHub

Experience

Cmind AI | Associate Data Scientist

May 2024 – Sep 2025

Architected and deployed modular machine-learning microservices for feature selection, class imbalance handling, and model training, enabling independent scaling and reducing production failure risk.
Enhanced the Earnings Per Share Surprise Prediction system by implementing automated retraining pipelines, training XGBoost, SVM, & RNN models on 15+ years of financial data to achieve 90% quarterly forecast accuracy.
Led development of a customizable end-to-end sentiment-analysis pipeline using FinBERT (financial domain) and OpenAI APIs for tailored Q&A with explanations; designed Oracle Cloud ingestion for cost-efficient legacy-system compatibility and AWS S3 storage to support a smooth migration to modern client-facing analytics workflows.
Leveraged MLflow for experiment tracking, model version control, and deployment monitoring, ensuring reproducibility, transparency, and compliance with internal MLOps standards.

Bluebono | Data Analyst Intern

June – Aug 2025

Designed and implemented a loan-to-value estimation and market-score evaluation pipeline for residential properties, covering the full workflow from project scoping and data acquisition to model deployment.
Directed dataset acquisition efforts by evaluating vendors, negotiating contracts, and integrating a real-time data feed from the California Regional Multiple Listing Service (CRMLS) via the Trestle API.
Engineered and processed 11.3M property records with over 1,000 market features; applied unsupervised feature clustering to group correlated variables, reducing dimensionality by 65% and improving model interpretability.

Guangfa Futures | Quantitative Analyst Intern

June – Aug 2023

Applied regression, time-series and K-Means clustering models to identify predictive patterns in futures markets.
Performed sensitivity analyses to quantify volatility impacts, developed monthly and quarterly forecasting models, and delivered actionable reports to support trading strategy decisions.

Education

University of California, Berkeley

Masters in Analytics | Expected Aug 2026

University of Southern California

B.S. in Computer Science & B.A. in Applied and Computational Mathematics | May 2025
GPA: 3.94/4.0

Research and Project Experience

AI-Based Career Advisor

Jan – May 2025

Designed and developed an interactive AI-based tool to help users plan and explore career paths and provide real-time applicable links by leveraging user-input skills and interests. Deployed a Streamlit UI for streamlined user interaction.
Utilized large language models (GPT APIs and Llama3) and 1.3 million entries datasets (O*NET, LinkedIn) for career path recommendations and real-time skill gap analysis, providing specific learning resources to bridge identified gaps.

Super Computing In Pocket (SCIP) Lab

Jan 2024 – May 2025

Conducted research on privacy-preserving knowledge distillation for large language models, aligning attention and hidden layers between GPT-2 and DistilGPT2 and integrating Differentially Private SGD to enhance data confidentiality without significant loss in performance, ensuring data protection.
Re-engineered and optimized a text dataset distillation framework using Hugging Face models; redesigned data-loading pipelines and initiated a full algorithm reimplementation to enable integration with differential privacy mechanisms.

Hate Speech Detection Project

Aug – Dec 2023

Implemented and optimized Naive Bayes and BERT models for hate speech detection on social media, applying k-fold cross-validation, early stopping, & extensive preprocessing to achieve 92% accuracy on a dynamically generated dataset.

Technical Skills

Programming & Tools: Python (TensorFlow, PyTorch), SQL, R, C++, Java, MLflow, Git/GitHub, Streamlit, RESTful API integration (OpenAI, Trestle), Web Scraping, ETL/ELT pipelines

Machine Learning: SVM, Tree Ensembles (DT/RF/XGBoost), kNN, Naive Bayes, Unsupervised (k-Means, GMM, PCA), Deep Learning (LSTM, Transformers/GPT, BERT/FinBERT, seq2seq+attention, word2vec/doc2vec), GAT, Q-Learning

Cloud & Databases: AWS (S3, Glue), Oracle Cloud, SQL Databases (MySQL, Redshift, Oracle DB), NoSQL Databases (MongoDB)