Email: sj693 at cornell dot edu

I am an Assistant Research Professor (non-tenure-track) in the Center for Data Science for Enterprise and Society (CDSES) at Cornell University. Broadly speaking, my academic focus is primarily centered on applied probability and discrete optimization problems inspired by real-world business scenarios. Specifically, my recent work revolves around *online controlled experiments*, often referred to as *A/B testing*. In terms of applications, I am interested in revenue mangement and healthcare.

My research has been recognized by the INFORMS Pierskalla Best Paper Award in Health Applications in 2021 and the Dantzig Dissertation Award in Operations Research and Management Science in 2022. Here is my CV.

I received my BS degree in Mathematics from Tsinghua University and MS degree from Stony Brook University where I worked with Joseph Mitchell and
Jie Gao. I earned my PhD degree in Algorithms, Combinatorics and Optimization (ACO) from CMU, advised by
R. Ravi and
Andrew Li.

**PhD Thesis. **
My doctoral research is primarily concentrated on *online learning and optimization* which studies strategies for making well-informed decisions with a *stream* of data. Here is a 25-page condensed version of my thesis:

Learning and Earning Under Noise and Uncertainty.

Committee: R. Ravi (Chair), Andrew A. Li, Alan Scheller-Wolf and Sridhar Tayur.

**Publications and Preprints**

- From Stream to Pool: Dynamic Pricing with Unit-Demand Customers.

*Titing Cui, Su Jia and Thomas Lavastida.

Under review

## Abstract

Dynamic pricing under uncertainty is often formulated as a multi-armed bandit model where a**stream**of customers arrive sequentially, each with an i.i.d. valuation drawn from an unknown distribution. However, this formulation is not entirely reflective of the real world. In numerous retail scenarios such as fashion clothing and electronic goods, customers with high valuations tend to make purchases earlier and subsequently leave the market, leading to a gradual*shift*in the valuation distribution towards the lower end. Thus motivated, we consider a single-item revenue maximization problem where a**pool**of unit-demand, non-strategic customers interact repeatedly with a seller. Each customer monitors the price intermittently according to an independent Poisson process and makes a purchase if she observes a price lower than her valuation, whereupon she leaves the market permanently. As the main challenge, the valuations of the customers are unknown but come from a given family of instances. We present an efficient algorithm that computes a non-adaptive policy whose expected revenue is at least 1/k times the optimum where k is the number of prices. This result is*optimal*: We show that there exists a family of instances on k prices such that no algorithm guarantees more than $1/k$ fraction of optimal revenue. Furthermore, the output of our algorithm also guarantees a $1/log ρ$ fraction of the optimum where ρ is the aspect ratio, i.e., the ratio of the highest and lowest prices. Finally, we propose a learn-then-earn type policy with n^3/4 regret, i.e., the loss due to not knowing the true valuations.**Key words:**revenue mangement, dynamic pricing, optimization under uncertainty, Poisson process, multi-armed bandits - Smooth Non-stationary Bandits.

Su Jia, Qian Xie, Nathan Kallus and Peter Frazier.

Preliminary version appeared in the proceedings of ICML'23.

Under review,*Operations Research*

## Abstract

In many applications of online decision making, the environment is non-stationary and it is therefore crucial to use bandit algorithms that handle changes. Most existing approaches are designed to protect against non-smooth changes, constrained only by total variation or Lipschitzness over time, where they guarantee T^2/3 regret. However, in practice environments are often changing smoothly, so such algorithms may incur higher-than-necessary regret in these settings and do not leverage information on the rate of change. We study a non-stationary two-arm bandit problem where we assume each arm's mean reward is a β-Holder function over (normalized) time, meaning it is (β- 1)-times Lipschitz-continuously differentiable. We show the first separation between the smooth and non-smooth regimes by presenting a policy with T^3/5 regret for β = 2. We complement this result by a T^{(β+1)/(2β+1)} lower bound for any integer β ≥ 1, which matches our upper bound for β = 2.**Key words:**non-stationary bandits, Holder class, non-parametric statistics, smoothness - Short-Lived High-Volume Multi-A(rmed)/B(andit) Testing.

Su Jia, Nishant Oli, Ian Anderson, Paul Duff, Andrew Li and R. Ravi.

Preliminary version appeared in the proceedings of ICML'23.

Under review,*Operations Research*

## Abstract

Modern platforms leverage randomized experiments to make informed decisions from a given set of items or ''treatments''. As a particularly challenging scenario, these treatments may (i) arrive in high volume, with thousands of new items being released per hour, and (ii) have short lifetime, say, due to the contents' transient nature or underlying non-stationarity that impels the learner to perceive the same item as distinct copies over time.

Thus motivated, we study a Bayesian multiple-play bandits problem that encapsulates the key features of multi-variate testing with a high-volume of short-lived treatments. In each round, a set of k actions (treatments) arrive, each available for $w$ rounds. Without knowing the mean reward of the actions, the learner selects a multiset of n actions and immediately observes their realized rewards. We aim to minimize the loss due to not knowing the mean rewards, averaged over instances generated from a given prior. We show that when k = O(n^ρ) for some constant ρ>0$, our proposed policy has n^{-\min {ρ, 1/2 + O(1/w)} loss on a sufficiently large class of prior distributions. We complement this result by showing that every policy suffers n^{-\min {ρ, 1/2}} loss on the same class of distributions.

We further validate the effectiveness of our policy through a large-scale field experiment on Glance, a content card-serving platform that faces exactly the above challenge. A simple variant of our policy outperforms their incumbent DNN-based recommender by 4 - 8% in user engagement.**Key words:**A/B testing, Bayesian bandits, field experiment, experimental design, recommender systems, ephemeral marketing, website optimization -
Markdown Pricing for Unknown Parametric Demand Models.

Su Jia, Andrew Li and R. Ravi.

Preliminary version appeared in the proceedings of NeurIPS'22.

Under review,*Management Science*

## Abstract

We consider a Continuum-Armed Bandit problem with an additional monotonicity constraint (or ''markdown'' constraint) on the actions selected. This problem faithfully models a natural revenue management problem, called ''markdown pricing'', where the objective is to adaptively reduce the price over a finite horizon to maximize the expected revenues. Prior to this work, a tight T^3/4 regret bound is known under minimal assumptions of unimodality and Lipschitzness in the reward function. This bound shows that markdown pricing is strictly harder than unconstrained dynamic pricing (i.e., without the monotonicity constraint), which admits T^2/3 regret under the same assumptions. However, in practice, demand functions are usually assumed to have certain functional forms (e.g. linear or exponential), rendering the demand learning easier and suggesting better regret bounds. In this work we introduce a concept, markdown dimension, that measures the complexity of a parametric family, and present optimal regret bounds that improve upon the previous T^3/4 bound under this framework.**Key words:**markdown pricing, multi-armed bandits, monotonicity, parametric demand model -
Toward A Liquid Biopsy: Greedy Approximation Algorithms for Adaptive Hypothesis Testing.

*Kyra Gan, Su Jia, Andrew Li and Sridhar Tayur.

Preliminary version appeared in the proceedings of NeurIPS'21

**Winner, INFORMS Pierskalla Best Paper Award 2021**

Major revision,*Management Science*## Abstract

This paper addresses a set of active learning problems that occur in the development of liquid biopsies via the lens of active sequential hypothesis testing (ASHT). In the problem of ASHT, a learner seeks to identify the true hypothesis from among a known set of hypotheses. The learner is given a set of actions and knows the random distribution of the outcome of any action under any true hypothesis. Given a target error \delta > 0, the goal is to sequentially select the fewest number of actions so as to identify the true hypothesis with probability at least 1-\delta. Motivated by applications in which the number of hypotheses or actions is massive (e.g., genomics-based cancer detection), we propose efficient greedy algorithms and provide the first approximation guarantees for ASHT, under two types of adaptivity. Both of our guarantees are independent of the number of actions and logarithmic in the number of hypotheses. We numerically evaluate the performance of our algorithms using both synthetic and real-world DNA mutation data, demonstrating that our algorithms outperform previously proposed heuristic policies by large margins.**Key words:**active learning, sequential hypothesis testing, approximation algorithms, cancer detection -
Conservative Price Experimentation: Markdown Pricing with Unknown Demand.

Su Jia, Andrew Li and R. Ravi.

**Egon Balas Award**for best CMU student Operations Research paper, 2020

Major revision,*Management Science*## Abstract

We consider a variant of the Continuum-Armed Bandit problem where the arm sequence is required to be non-increasing. This problem faithfully models a single-product revenue management problem where the objective is to*reduce*the price over a finite sales horizon to maximize expected revenue. A policy that satisfies this monotonicity constraint is often called a*markdown*policy. We focus on the scenario where the demand model is unknown. A policy's performance is measured by the*regret*, i.e., the revenue loss due to not knowing the true demand function. We first observe that to achieve sublinear regret, it is necessary to assume that the revenue function (defined as the product of the price and the mean demand) is unimodal and Lipschitz. We then present a markdown policy that explores prices monotonically before a certain stopping condition is satisfied, and show that it has L^{1/4} \min(n,m)^{3/4} regret for any inventory level m, number of time periods n, and Lipschitz constant L. This bound is nearly optimal: We prove that no markdown policy achieves o(L^{1/4}\min(m,n)^{3/4}) regret for all m, n. In contrast, under the same assumptions, the optimal regret is L^{1/3} n^{2/3}**without**monotonicity for m=\infty, which is asymptotically lower. We also consider a variant where price increases are allowed but subject a limited budget. We present a policy with O(log n) price increases and L^{1/3} \min(m,n)^{2/3} regret, which matches the optimal regret without monotonicity.**Key words:**Markdown Pricing, Multi-armed Bandits, Monotonicity, Unimodal Bandits -
Optimal Decision Tree and Submodular Ranking with Noisy Outcomes.

Su Jia, Fatemeh Navidi, Viswanath Nagarajan and R.Ravi.

Preliminary version appeared in the Proceedings of NeurIPS'19

Under review,*Journal of Machine Learning Research*## Abstract

A fundamental task in active learning involves performing a sequence of tests to identify an unknown hypothesis that is drawn from a known distribution. This problem, known as optimal decision tree induction, has been widely studied for decades and the asymptotically best-possible approximation algorithm has been devised for it. We study a generalization where certain test outcomes are noisy, even in the more general case when the noise is persistent, i.e., repeating a test gives the same noisy output, disallowing simple repetition as a way to gain confidence. We design new approximation algorithms for both the non-adaptive setting, where the test sequence must be fixed a-priori, and the adaptive setting where the test sequence depends on the outcomes of prior tests. Previous work in the area assumed at most a logarithmic number of noisy outcomes per hypothesis and provided approximation ratios that depended on parameters such as the minimum probability of a hypothesis. Our new approximation algorithms provide guarantees that are nearly best-possible and work for the general case of a large number of noisy outcomes per test or per hypothesis where the performance degrades smoothly with this number. Our results adapt and generalize methods used for submodular ranking and stochastic set cover. We evaluate the performance of our algorithms on two natural applications with noise: toxic chemical identification and active learning of linear classifiers. Despite our theoretical logarithmic approximation guarantees, our methods give solutions with cost very close to the information theoretic minimum, demonstrating the effectiveness of our methods.**Key words:**Optimal Decision Tree, Approximation Algorithms, Submodular Ranking, Stochastic Set Cover -
Effective Online Order Acceptance Policies for Omni-Channel Fulfillment.

*Su Jia, Jeremy Karp, R. Ravi and Sridhar Tayur.

*Manufacturing and Service Operations Management (M&SOM)*## Abstract

**Problem Definition**: Omni-channel retailing has led to the use of traditional stores as fulfillment centers for online orders. Omni-channel fulfillment problems have two components: (1) accepting a certain number of on-line orders prior to seeing store demands, and (2) satisfying (or filling) some of these accepted on-line demands as efficiently as possible with any leftover inventory after store demands have been met. Hence, there is a fundamental trade-off between store cancellations of accepted online orders and potentially increased profits due to more acceptances of online orders. We study this joint problem of online order acceptance and fulfillment (including cancellations) to minimize total costs, including shipping charges and cancellation penalties in single-period and limited multi-period settings.

**Academic/Practical Relevance**: Despite the growing importance of omni-channel fulfillment via online orders, our work provides the first study incorporating cancellation penalties along with fulfillment costs.

**Methodology**: We build a two-stage stochastic model. In the first stage, the retailer sets a policy specifying which online orders it will accept. The second stage represents the process of fulfilling online orders once the uncertain quantities of in-store purchases are revealed. We analyze two classes of threshold policies that accept online orders as long as the inventories are above a global threshold, or a local threshold per region.**Results**: Total costs are unimodal as a function of the global threshold, and unimodal as a function of a single local threshold holding all other local thresholds at constant values, motivating a gradient search algorithm. Reformulating as an appropriate linear program with network flow structure, we estimate the derivative (using infinitesimal perturbation analysis) of the total cost as a function of the thresholds. We validate the performance of the threshold policies empirically using data from a high-end North American retailer. Our two-store experiments demonstrate that Local Thresholds perform better than Global Thresholds in a wide variety of settings. Conversely, in a narrow region with negatively correlated online demand between locations and very low shipping costs, Global Threshold outperforms Local Thresholds. A hybrid policy only marginally improves on the better of the two. In multiple periods, we study one- and two-location models and provide insights into effective solution methods for the general case.

**Managerial Implications**: Our methods give an effective way to manage fulfillment costs for online orders, demonstrating a significant reduction compared to policies that treat each store separately, reflecting the significant advantage of incorporating shipping in computing thresholds.**Key words:**Omni-channel retailing, Infinitesimal Perturbation Analsysis (IPA), Fulfillment Policy -
Competitive Analysis for Online Scheduling in Software-Defined Optical WAN.

Su Jia, Xin Jin, Golnaz Ghasemiesfeh, Jiaxin Ding and Jie Gao.

IEEE International Conference on Computer Communications 2017 (INFOCOM'17)## Abstract

Modern planetary-scale online services have massive data to transfer over the*wide area network*(WAN). Due to the tremendous cost of building WANs and the stringent timing requirement of distributed applications, it is critical for network operators to make efficient use of network resources to optimize data transfers. By leveraging*software-defined networking*(SDN) and reconfigurable optical devices, recent solutions design centralized systems to jointly control the network layer and the optical layer.

While these solutions show it is promising to significantly reduce data transfer times by centralized cross-layer control, they do not have any theoretical guarantees on the proposed algorithms. This paper presents approximation algorithms and theoretical analysis for the online transfer scheduling problem over optical WANs. The goal of the scheduling problem is to minimize the makespan (the time to finish all transfers) or the total sum of completion times. We design and analyze various greedy, online scheduling algorithms that are 3-competitive for makespan, 2-competitive for minimum sum completion time for jobs of unit size, and 3-competitive for jobs of arbitrary transfer size and each node having degree constraint d, where \alpha = 1 when d = 1 and \alpha = 1.86 when d>=2. We also evaluate the performance of these algorithms and compare the performance with prior heuristics.**Key words:**online algorithm, competitive analysis, scheduling, Wide Area Networks (WAN), Software-defined Networking (SDN) -
Network Optimization on Partitioned Pairs of Points.

*Esther Arkin, Aritra Banik, Paz Carmi, Gui Citovsky, Su Jia, Matthew Katz, Tyler Mayer and Joseph S. B. Mitchell

The 28th International Symposium on Algorithms and Computation (ISAAC'17) -
Exact and Approximation Algorithms for Time-Window TSP and Prize Collecting Problem.

Su Jia, Jie Gao, Joseph S. B. Mitchell and Lu Zhao.

International Workshop on the Algorithmic Foundations of Robotics 2016 (WAFR'16)