Economist, Data Scientist

Hao-Che Hsu

I'm Howard, a Ph.D. candidate in Economics and a Noyce Fellow at University of California-Irvine.

My research focuses on applied econometrics and the intersection of data science and machine learning.


  Work in Progress

"Decision Parity on Alternative Credit with Shape Constraints" ML
Abstract: We utilized Experian's proprietary alternative credit data to investigate classification algorithms with asymmetric losses. With a deep neural network to assess the risk of loan applicants, we aim to understand algorithmic fairness in credit ratings and construct an optimal decision rule for loan inquiries.
"Understanding Household Choice of Leisure with Time Allocation and Expenditure Measurements" DS
Abstract: People engage in leisure activities, but they come at a cost. Accounting for both time allocation and the price of leisure, we combined data from ATUS and retail markets scanners to investigate the geography of leisure. To gain insight into consumer behaviors, our study compared price indexes and Engel curves across leisure activities. Then we estimate the elasticity and substitution patterns of leisure with double machine learning and causal forest.
"Choice and Backward Spillovers in the Film Industry" IO
Abstract: This study utilized weekly video sales data and a demand model to investigate factors that influence consumer movie purchasing decisions and recover the market's cost structure. The findings provide evidence of the director's backward spillover effect on sales. Then a counterfactual merger simulation was performed to demonstrate the market's reaction to a structural adjustment.

  Data Visualization

Map of ALT-credit Loan Inquiries (ALT Loan Lab) Vue.js Highcharts

Please view this interactive map on desktop browsers.


"Product Level Hierarchy Classification with Transformer-based Clustering" ML
I utilize a sentence transformer to embed product names with BERT models. The products gathered from online stores are projected into the embedding space and grouped into finer COICOP categories to calculate price indexes. Unlike traditional NLP models, the Transformer-based model evaluates sentence tokens simultaneously. The inputs are represented by a vector of embeddings that incorporate position and attention information. The high-dimensional embeddings are reduced to ten principal components and clustered with the EM algorithm.
"Community Detection on a Social Network" Graphical Models
This project investigates various community detection techniques using network data from the Hornet social platform. The approaches compared include Mixed Membership Stochastic Blockmodels and K-means clustering on both node edges and individual demographic features. The likelihoods of different sampling methods for the stochastic model are evaluated and the network topology is visualized using Gephi.
"Random Coefficient Logit Model with MCMC Algorithms" metrics
Numerous techniques have been developed to solve the random coefficients logit model. Following a developed method, I modify the prior distribution assumption on the aggregate demand shocks and estimate demand by sequentially updating the market share inversion process with Gibbs and Metropolis-Hasting sampling methods. In particular, I present a practitioner's guide including details of the algorithms' implementations.
"Image Generation with an Introspective Deep Learning Algorithm" ML
This project re-implements the introspective variational autoencoder to synthesize realistic images. IntroVAE repurposes the inference model to additionally act as a discriminator, enabling the model to self-estimate differences between generated and real images in an adversarial manner. We replicate and deliver comparable image quality to those presented in the research, and confirm the advantages of this model over standard VAEs and GANs.
"Natural Language Processing with three Supervised Learning Methods" ML
This project trains three machine learning models: Naive Bayes Classifier, Random Forest Classifier, and the Convolutional Neural Network to predict the sentiments of individual words or a review statement. The models are trained using IMDb movie reviews to optimize the parameters to attain the highest prediction accuracy.