Assistant Professor, USC Computer Science
USC Information Sciences Institute (joint appt.)
Director, USC INK Research Lab
Information Director, SIGKDD
Forbes' Asia 30 Under 30

xiangren [at]
Machine Learning, Natural Language Processing

INK@GitHub  |  Google Scholar

I'm an Assistant Professor of Computer Science at USC, with affiliated appointment at USC Information Sciences Institute (ISI). At USC, I'm the director of the Intelligence and Knowledge Discovery (INK) Research Lab, and member of the USC Machine Learning Center, NLP Community@USC, and ISI Center on Knowledge Graphs. In 2018, I served as a part-time Data Science Advisor at Snap Inc. Prior to USC, I was a visiting researcher at Stanford University and received my PhD in Computer Science from UIUC.

I work on machine learning and natural language processing for machine reading, focusing on developing label-efficient, prior-informed models that extract machine-actionable knowledge (e.g., compositional, graph-structured representations) from natural-language text data, as well as performing neural reasoning over symbolic knowledge. I'm particularly excited about problems in the space of modeling sequential/graph-structured data with weak supervision and prior knowledge. This includes neural-symbolic learning, learning with noisy data, zero/few-shot learning, and transfer learning.

My research leads to a book entitled "Mining Structures of Factual Knowledge from Text: An Effort-Light Approach" and over 50 publications in top conferences and journals, was covered in over 10 conference tutorials (NAACL, KDD, WWW), and received faculty research awards from Google, Amazon, JP Morgan, and Snap, in addition to other awards including ACM SIGKDD Dissertation Award, WWW Best Poster runner-up, David J. Kuck Outstanding Thesis Award, Google PhD fellowship, and Yelp Dataset Challenge Award. I'm part of the Forbes' Asia 30 Under 30.


- Research fundings: As part of the USC/ISI team, we received DARPA awards to work on enabling commonsense capabilities on machines and learning with less labeled data. We got NSF support on machine reading of massive scientific literature to understand science of innovation.

- AlpacaTag is a web-based, crowd annotation framework that supports auto-suggest of annotations, online/active learning of annotator model, model consolidation, and real-time API access & model deployment. Code are published at Github and documented at the Wiki. Try out the demo!

- Research monograph: Mining Structures of Factual Knowledge from Text: An Effort-Light Approach by Morgan & Claypool Publishers (ACM SIGKDD Doctoral Dissertation Award).

- Conference tutorials: Tutorial slides on scalable construction and reasoning of massive knowledge bases at NAACL 2018. Full-day tutorial at The Web Conference on knowledge graph construction and querying.

- Blog posts: Information Extraction with Indirection Supervision and Heterogeneous Supervision, Dynamic Network Embedding.

- Learning with weak supervision: In many information extraction tasks, direct supervision in the form of manually-annotated text sequences is expensive to obtain but different kinds of weak supervisions (e.g., KB facts, hand-craft rules, crowd-sourced labels, user feedbacks) are much easier to collect at a large scale. Our WWW 2017 tutorial summarize recent advances on denoising distant supervision, multi-tasking extraction, and leveraging QA data as indirection supervision.

- To self-learn from a few examples of given relations (and a large corpus), REPEL jointly optimize an embedding-based discriminator and a pattern-based generator.

- Both human annotators and external knowledge bases can provide weak supervision for information extraction tasks. Such heterogeneous forms of weak supervisions trades off label quality with the amount of labeled data one can obtain. How could we leverage these heterogeneous supervisions in a principled way?

- Indirection supervision may result in noisily- and partially-labeled data. This is especially challenging when dealing with a complex label space (e.g., a label hierarchy). We propose hierarchical partial-label embeddingn to overcome these issues.

News  cv

Aug, 2019 - INK lab members have 10 papers accepted at EMNLP 2019. Congratulations!
June, 2019 - We're excited to receive a gift award from Snapchat to work on modular neural networks for interpretable NLP!
June, 2019 - We received a DARPA GAILA grant to work on building AI to mimic children language learning.
May, 2019 - Serve as area chair for EMNLP 2019, ACL 2019; as senior PC for AAAI 2020.
Mar, 2019 - Excited to receive a Google Faculty Award for supporting our research on explainable recommendation.
Mar, 2019 - Our research on interpretable knowledge reasoning is funded by JP Morgan AI Research Award.
Feb, 2019 - As part of the USC/ISI team, we received DARPA award to work on Machine Commonsense and Learning with Less Data.
Jan, 2019 - Our research on neural-symbolic deep learning for NLP is funded by an Amazon Research Award.
Dec, 2018 - Organizing the ICLR 2019 LLD Workshop on learning from limited labeled data.
Dec, 2018 - Organizing the RepL4NLP Workshop at ACL 2019 on representation Learning for NLP. We're soliciting submissions.
Nov, 2018 - Organizing the DeepLo Workshop at EMNLP 2019 on deep learning for low-resource NLP.
Oct 2018 - Excited to release our new OpenIE system, ReMine. A key distinguishing feature is the ability to learn from entire corpus for measuring cohesiveness of the extraction. (Project | Github)
Sep, 2018 - Serving as PC of ICML'19 and WWW'19.
Sep 2, 2018 - Thanks National Science Foundation for supporting our collaborative research on Modeling the Invention, Dissemination, and Translation of Scientific Concepts.
Aug 30, 2018 - Start a new role as Information Director of ACM SIGKDD.
Aug 21, 2018 - Thrilled to receive the 2018 ACM SIGKDD Doctoral Dissertation Award.