Yong Chen, Ph.D.

I was trained as a mathematical statistician on statistical inference and the foundation of statistics by working with Professor Kung-Yee Liang and Professor Charles Rohde. In the past, I have been working independently and collaboratively with my colleagues to advance rigorous statistical theory, and to develop statistical and computational methods that are truly motivated from small data (e.g., summary data in systematic reviews) and big data (e.g., healthcare data, self-reported vaccine/drug safety data, self-reported weight loss data). I am also interested in evidence-based medicine because I believe it is an area that needs deep thinking of the fundamental question "what is evidence in the data?", which is critical toward improved medical decision-making for patients and stakeholders, and is sometimes blurred by the use of `standard' statistical analyses.

In general, I am interested in advancing general theory in all disciplines that can be broadly applicable, as well as developing tailored statistical models to specific applications by working with domain experts.


Research interest

  1. Methods for large healthcare data (e.g., electronic health record data).
  2. Precision medicine using heterogeneous massive data.
  3. Comparative effectiveness research. Patient centered outcome research.
  4. Statistical inference.
  5. Philosophy of statistics. Statistical evidence.

Active Research Grants (selected)

  1. 1R01AI130460: Dynamic learning for post-vaccine event prediction using temporal information in VAERS.

    Role: Principal Investigator (with Cui Tao at University of Texas). Period: 04/01/2017 - 03/31/2022

  2. R21-LM012197: Statistical Methods and Software for Multivariate Meta-analysis

    Role: Subcontract Principal Investigator (with Haitao Chu at University of Minnesota). Period: 10/01/2015 - 07/31/2017

  3. Penn CTER Pilot Study Award: Integrative Analysis in Large Distributed Research Networks

    Role: Principal Investigator. Period: 07/2016 - 06/2017.


Selected Publications

*: first-authored by a student advised/co-advised by Dr. Chen

  • Statistical inference
    • Chen, Y, Huang, J, Ning, Y, Liang, KY and Lindsay, B. A Conditional Composite Likelihood Ratio Test with Boundary Constraints, Biometrika (minor revision).
    • Hong, C*, Chen, Y, Ning, Y, Wang, S, Wu, H and Carroll, R.J.. PLEMT: A Novel Pseudolikelihood Based EM Test for Homogeneity in Generalized Exponential Tilt Mixture Models. Journal of the American Statistical Association (in press). (This paper won 2015 JSM Biometrics section Byar Awards)
    • Hong, C*, Ning, Y, Wei, P, Cao, Y and Chen, Y. A semiparametric model for vQTL mapping, Biometrics (in press).
    • Chen, Y, Ning, J, Ning, Y, Liang, KY and Bandeen-Roche, K. On the Pseudolikelihood Inference for Semiparametric Models with Boundary Problems , Biometrika (in press).
    • Ning, J, Chen, Y, Cai, C, Huang, X and Wang, MC. On the Dependence Structure of Bivariate Recurrent Event Processes: Inference and Estimation, Biometrika 2015 March, 102 (2), 345–358.
    • Ning, Y and Chen, Y Test for homogeneity in semiparametric exponential tilt mixture models, Scandinavian Journal of Statistics 2015 June, 41 (2), 504–-517.
    • Chen, Y, Ning, J and Cai, C. Regression analysis of longitudinal data with irregular and informative observation times, Biostatistics, March 2015, 16(4): 727-739.
    • Chen, Y, Ning, Y, Hong, C and Wang, S. Semiparametric tests for identifying differentially methylated loci with case-control designs using Illumina arrays, Genetic Epidemiology 2014 Jan 38 (1), 42-50.
    • Chen, Y and Liang, KY. On the asymptotic behaviour of the pseudolikelihood ratio test statistic with boundary problems, Biometrika 2010 June, 97 (3), 603--620.
  • Methods and Applications on Large Healthcare Data
    • Duan, R*, Cao, M, Wu, Y, Huang, J, Denny, J, Xu, H and Chen, Y. An Empirical Study for Impacts of Measurement Errors on EHR based Association Studies, AMIA annual symposium proceedings 2016 (in press). (This paper won the first prize of "Best of Student Papers in Knowledge Discovery and Data Mining (KDDM)”Awards)
    • Cai, Y*, Du, J, Huang, J, Tao, C and Chen, Y. Signal Detection Method for Temporal Variation of Adverse Effect with VAERS database. International Conference on Intelligent Biology and Medicine 2016 (accepted).
    • Du J, Cai Y, Chen, Y, He Y, Tao C. Analysis of Individual Differences in Vaccine Pharmacovigilance using VAERS Data and MedDRA System Organ Classes: A Use Case Study with Trivalent Influenza Vaccine, Vaccine 2016.
    • Du J, Cai Y, Chen, Y, Tao C. Trivalent influenza vaccine adverse symptoms analysis based on MedDRA terminology using VAERS data in 2011, Journal of Biomedical Semantics 2016.
    • Tao C, Du J, Cai Y, Chen, Y. Trivalent Influenza Vaccine Adverse Event Analysis Based On MedDRA System Organ Classes Using VAERS Data, Studies in health technology and informatics 2015. 216:1076.
    • Cao M, Chen, Y, Zhu M, Zhang J. Automated Evaluation of Medical Software Usage: Algorithm and Statistical Analyses. Studies in health technology and informatics 2015. 216:965.
  • Comparative Effectiveness Research
    • Ning J, Chen, Y and Piao, J. Maximum likelihood estimation and EM algorithm of Copas selection model for publication bias correction. Biostatistics 2017 (in press). *: equally contributed first-author.
    • Li, X*, Chen, Y and Li, R. A frailty model for recurrent events during alternating restraint and non-restraint time periods. Statistics in Medicine 20 February 2017.
    • Liu, Y*, DeSantis, S and Chen, Y. Bayesian network meta-analysis of clinical trials with correlated outcomes subject to publication bias: application to a systematic review of alcohol dependence. Journal of the Royal Statistical Society: Series C (minor revision)
    • Liu, Y*, Chen, Y and Scheet, P. A Meta-Analytic Framework for Detection of Genetic Interactions, Genetic Epidemiology 2016 (in press).
    • Chen, Y, Liu, Y, Chu, H, Lee, M and Schmid, C. A simple and robust method for multivariate meta-analysis of diagnostic test accuracy. Statistics in Medicine 2016.
    • Chahoud, J, Semaan, A, Chen, Y, Cao, M, Rieber, A, Rady, P and Tyring, S. The Association between Beta-genus Human Papillomavirus and Cutaneous Squamous Cell Carcinoma in Immunocompetent Individuals: a Meta-analysis, JAMA Dermatology 2015 Dec.
    • Chen, Y, Cai, Y, Hong, C, and Jackson, D. Inference for correlated effect sizes using multiple univariate meta-analyses, Stat Med. 2015 Nov.
    • Chen, Y, Hong C, Ning Y, Su X. Meta-analysis of studies with bivariate binary outcomes: a marginal beta-binomial model approach. Stat Med. 2015 Aug 24. PubMed PMID: 26303591.
    • Liu, Y*, Chen, Y and Chu H, A unification of models for meta-analysis of diagnostic accuracy studies without a gold standard, Biometrics 2015 Jun;71(2):538-47.
    • Chen, Y, Liu Y, Ning J, Cormier J, Chu H. A hybrid model for combining case-control and cohort studies in systematic reviews of diagnostic tests. J R Stat Soc Ser C Appl Stat. 2015 Apr 1;64(3):469-489.
    • Chen, Y, Hong C, Riley RD. An alternative pseudolikelihood method for multivariate random-effects meta-analysis. Stat Med. 2015 Feb 10;34(3):361-80.
    • Chen, Y, Liu Y, Ning J, Nie L, Zhu H, Chu H. A composite likelihood method for bivariate meta-analysis in diagnostic systematic reviews. Stat Methods Med Res. 2014 Dec 14. PubMed Central PMCID: PMC4466215.
    • Chen, Y, Luo, S, Chu, H, Su, X and Nie, L. An Empirical Bayes Method for Multivariate Meta-analysis with Application in Clinical Trials. Communications in Statistics - Theory and Methods. 2014 43(16), 3536–3551.
    • Ma, X, Chen, Y, Cole, S and Chu, H. A hybrid Bayesian hierarchical model combining cohort and case-control studies for meta-analysis of diagnostic tests: accounting for partial verification bias. Statistical Methods in Medical Research. 2014.
    • Luo S, Chen, Y, Su X, Chu H. mmeta: An R Package for Multivariate Meta-Analysis. J Stat Softw. 2014 Jan 1;56(11):11. PubMed PMID: 24904241; PubMed Central PMCID: PMC4043353.
    • Chen, Y, Luo, S, Chu, H and Wei, P. Bayesian inference on risk differences: an application to multivariate meta-analysis of adverse events in clinical trials. Statistics in Biopharmaceutical Research. 2013 5 (2): 142-155.
    • Nie, L, Soon, G, Qi, K, Chen, Y, and Chu, H. A note on partial covariate-adjustment and design considerations in noninferiority trials when patient-level data are not available. Journal of Biopharmaceutical Statistics. 2013 23 (5), 1042–1053.
    • Chu, H, Nie, L, Chen, Y, Huang, Y and Sun, W. Bivariate random effects models for meta-analysis of comparative studies with binary outcomes: methods for the absolute risk difference and relative risk. Statistical Methods in Medical Research. 2012. 21 (6): 621–633.
    • Chen, Y, Chu H, Luo S, Nie L, Chen S. Bayesian analysis on meta-analysis of case-control studies accounting for within-study correlation. Stat Methods Med Res. 2011 Dec 4. [Epub ahead of print] PubMed PMID: 22143403; PubMed Central PMCID: PMC3683108.

    Statistical software

    • R package xmeta: a comprehensive collection of functions for multivariate meta-analyses of continuous or binary outcomes. This package also includes functions and visualization tools for detection and correction of publication bias in multivariate meta-analysis. https://cran.rstudio.com/web/packages/xmeta/index.html
    • R package mmeta: a free, cross-platform and open-source program for pooling summary measures on correlated binary outcomes from multiple studies https://cran.r-project.org/web/packages/mmeta/index.html
    • R package robustETM: Testing homogeneity for generalized exponential tilt model. This package includes a collection of functions for (1) implementing methods for testing homogeneity for generalized exponential tilt model; and (2) implementing existing methods under comparison. https://cran.r-project.org/web/packages/robustETM/index.html References: Chen, et al., 2013, Genetic Epidemiology; and Hong, et al. (2017) JASA.
    • R code for YETI in A meta-analytic framework for detection of genetic interactions (2016) Genetic Epidemiology. http://www.yulunliu.com/software.html

    Current students/postdocs

    • Rui Duan, Ph.D. candidate at Upenn
    • Jing Huang, Postdoc fellow at Upenn
    • Yulun Liu, Postdoc fellow at Upenn
    • Le Wang (jointly with Dr. Jinbo Chen), Ph.D. candidate at Upenn
    • Ming Cao (jointly with Dr. Fujimoto Kayo), Ph.D. candidate at UTSPH

    Former students

    • Chuan Hong, Ph.D. at University of Texas School of Public Health (UTSPH) (defended in March 2016; now Postdoc fellow at Harvard)
    • Yulun Liu, Ph.D. in Biostatistics at UTSPH (jointly with Dr. Paul Scheet) (defended in April 2016; now Postdoc fellow at University of Pennsylvania)
    • Yi Cai, Ph.D. in Biostatistics at UTSPH (jointly with Dr. Cui Tao) (defended in April 2016; now data scientist at Pieces Technologies Inc.)
    • Xiaoqi Li (co-advisor; primary advisor: Dr. Ruosha Li), Ph.D. in Biostatistics at UTSPH (defended in Nov 2015; now Senior Research Scientist at Eli Lilly and Company)