Assistant Professor, USC Computer Science
USC Information Sciences Institute (joint appt.)
Director, USC INK Research Lab
Information Director, SIGKDD
Forbes' Asia 30 Under 30

xiangren [at] usc.edu
Machine Learning, Natural Language Processing


INK@GitHub  |  Google Scholar

I'm an Assistant Professor of Computer Science at USC, with affiliated appointment at USC Information Sciences Institute (ISI). At USC, I'm the director of the Intelligence and Knowledge Discovery (INK) Research Lab, and member of the USC Machine Learning Center, NLP Community@USC, and ISI Center on Knowledge Graphs. In 2018, I served as a part-time Data Science Advisor at Snap Inc. Prior to USC, I was a visiting researcher at Stanford University and received my PhD in Computer Science from UIUC.

I work on new algorithms and datasets in natural language processing and machine learning, with limited labeled data. My group at USC focuses on developing label-efficient, prior-informed models that extract machine-actionable knowledge from natural language data and perform neural-symbolic knowledge reasoning for question answering. I'm particularly excited about problems in the space of modeling language data with weak supervision and prior knowledge. This includes neural-symbolic learning, learning from high-level supervision, learning with noisy data, and zero/few-shot learning.

A summary of my previous work on label-efficient information extraction can be found in the book "Mining Structures of Factual Knowledge from Text: An Effort-Light Approach". They are also covered in over 10 tutorials in major conferences, and received awards including ACM SIGKDD Dissertation Award, WWW Best Poster runner-up, David J. Kuck Outstanding Thesis Award, and Google PhD fellowship. My research is funded by NSF, DARPA, IARPA and receives faculty awards from industry partners including Google, Amazon, JP Morgan, Adobe and Snapchat. I was named Forbes' Asia 30 Under 30 in 2019.


Research  

- Research fundings: As part of the USC/ISI team, we received DARPA awards to work on enabling commonsense capabilities on machines and learning with less labeled data. We got NSF support on machine reading of massive scientific literature to understand science of innovation.

- AlpacaTag is a web-based, crowd annotation framework that supports auto-suggest of annotations, online/active learning of annotator model, model consolidation, and real-time API access & model deployment. Code are published at Github and documented at the Wiki. Try out the demo!

- Research monograph: Mining Structures of Factual Knowledge from Text: An Effort-Light Approach by Morgan & Claypool Publishers (ACM SIGKDD Doctoral Dissertation Award).

- Conference tutorials: Tutorial slides on scalable construction and reasoning of massive knowledge bases at NAACL 2018. Full-day tutorial at The Web Conference on knowledge graph construction and querying.

- Blog posts: Information Extraction with Indirection Supervision and Heterogeneous Supervision, Dynamic Network Embedding.

- Learning with weak supervision: In many information extraction tasks, direct supervision in the form of manually-annotated text sequences is expensive to obtain but different kinds of weak supervisions (e.g., KB facts, hand-craft rules, crowd-sourced labels, user feedbacks) are much easier to collect at a large scale. Our WWW 2017 tutorial summarize recent advances on denoising distant supervision, multi-tasking extraction, and leveraging QA data as indirection supervision.

- To self-learn from a few examples of given relations (and a large corpus), REPEL jointly optimize an embedding-based discriminator and a pattern-based generator.

- Both human annotators and external knowledge bases can provide weak supervision for information extraction tasks. Such heterogeneous forms of weak supervisions trades off label quality with the amount of labeled data one can obtain. How could we leverage these heterogeneous supervisions in a principled way?

- Indirection supervision may result in noisily- and partially-labeled data. This is especially challenging when dealing with a complex label space (e.g., a label hierarchy). We propose hierarchical partial-label embedding to overcome these issues.


News  cv

Nov, 2019 - Will be giving an invited talk at CMU LTI Colloquium in Feb, 2020. Sep, 2019 - Excited to receive a data science research award from Adobe Research to work on neural symbolic learning for recommendation.
Aug, 2019 - INK lab members have 10 papers accepted at EMNLP 2019. Congratulations!
June, 2019 - We're excited to receive a gift award from Snapchat to work on modular neural networks for interpretable NLP!
June, 2019 - We received a DARPA GAILA grant to work on building AI to mimic children language learning.
May, 2019 - Serve as area chair for EMNLP 2019, ACL 2019; as senior PC for AAAI 2020.
Mar, 2019 - Excited to receive a Google Faculty Award for supporting our research on explainable recommendation.
Mar, 2019 - Our research on interpretable knowledge reasoning is funded by JP Morgan AI Research Award.
Feb, 2019 - As part of the USC/ISI team, we received DARPA award to work on Machine Commonsense and Learning with Less Data.
Jan, 2019 - Our research on neural-symbolic deep learning for NLP is funded by an Amazon Research Award.
Dec, 2018 - Organizing the ICLR 2019 LLD Workshop on learning from limited labeled data.
Dec, 2018 - Organizing the RepL4NLP Workshop at ACL 2019 on representation Learning for NLP. We're soliciting submissions.
Nov, 2018 - Organizing the DeepLo Workshop at EMNLP 2019 on deep learning for low-resource NLP.
Oct 2018 - Excited to release our new OpenIE system, ReMine. A key distinguishing feature is the ability to learn from entire corpus for measuring cohesiveness of the extraction. (Project | Github)
Sep, 2018 - Serving as PC of ICML'19 and WWW'19.
Sep 2, 2018 - Thanks National Science Foundation for supporting our collaborative research on Modeling the Invention, Dissemination, and Translation of Scientific Concepts.
Aug 30, 2018 - Start a new role as Information Director of ACM SIGKDD.
Aug 21, 2018 - Thrilled to receive the 2018 ACM SIGKDD Doctoral Dissertation Award.