Economist, Data Scientist

Hao-Che Hsu

I'm Howard, a Ph.D. candidate in Economics and a Noyce Fellow at University of California-Irvine.

My research focuses on applied econometrics and the intersection of data science and machine learning.


  Work in Progress

"Decision Parity on Alternative Credit with Shape Constraints" ML
Abstract: Compared to traditional credit that generates the credit reports, we use the unique proprietary alternative credit data from Experian to investigate the classification algorithm in situations with asymmetric loss. With a deep neural network, by assessing the risk of applicants, we aim to understand issues of algorithmic fairness in credit ratings and construct an optimal decision rule on loan inquiries.
"Understanding Household Choice of Leisure with Time Allocation and Expenditure Measurements" DS
Abstract: Everyone spends time in leisure but leisure is not costless. We combine data from ATUS and scanner data from retail markets. Taking into account time allocation and the price of leisure, we investigate the geography of leisure consumption with principal component analysis and compare the price indexes and Engel curves across leisure activities. For consumer behaviors, we estimate leisure elasticity and substitution patterns with double machine learning and causal forest.
"Choice and Backward Spillovers in the Film Industry" IO
Abstract: There are wide discrepancies in consumer preference between choosing theater movies and videos. Rather than working with box office revenues, I use weekly video sales data and a demand model to investigate factors that influence consumer purchasing decisions and recover the cost structure of the market. In particular, I find evidence of the existence of the director's backward spillover effect. The counterfactual of merger simulation demonstrates how the market reacts to a structural adjustment.

  Data Visualization

Map of ALT-credit Loan Inquiries (ALT Loan Lab) Vue.js Highcharts

Please view this interactive map on desktop browsers.


"Product Level Hierarchy Classification with Transformer-based Clustering" ML
I utilize a sentence transformer to embed product names with BERT models. The products collected from online stores are projected onto the embedding space and grouped into finer COICOP to compute price indexes. Unlike classic NLP models, the Transformer-based model evaluates sentence tokens simultaneously. The inputs are represented by a vector of embeddings after injecting position and attention information. The high dimensional embeddings are reduced to ten principal components and clustered with the EM algorithm.
"Community Detection on a Social Network" Graphical Models
Working with network data from Hornet social platform, this project compares different community detection approaches including Mixed Membership Stochastic Blockmodels and K-means clustering on node edges and individual demographic attributes. For the stochastic model, the likelihoods of different sampling methods are examined. Finally, the network topology is visualized with Gephi.
"Random Coefficient Logit Model with MCMC Algorithms" metrics
Numerous techniques have been developed to solve the random coefficients logit model. Following a developed method, I modify the prior distribution assumption on the aggregate demand shocks and estimate demand by sequentially updating the market share inversion process with Gibbs and Metropolis-Hasting sampling methods. In particular, I present a practitioner's guide including details of the algorithms' implementations.
"Image Generation with an Introspective Deep Learning Algorithm" ML
This project re-implements the introspective variational autoencoder to synthesize realistic images. IntroVAE repurposes the inference model to additionally act as a discriminator, enabling the model to self-estimate differences between generated and real images in an adversarial manner. We replicate and deliver comparable image quality to those presented in the research, and confirm the advantages of this model over standard VAEs and GANs.
"Natural Language Processing with three Supervised Learning Methods" ML
This project trains three machine learning models: Naive Bayes Classifier, Random Forest Classifier, and the ConvolutionalNeural Network to predict the sentiments of a specific word or a review statement. Models are trained with the IMDb movie reviews to optimize the parameters to achieve the highest prediction accuracy.