In today's computerized and information-based society, people are inundated with vast amounts of text data, ranging from news articles, social media posts, scientific publications, to a wide range of textual information from various vertical domains (e.g., corporate reports, advertisements, legal acts, medical reports). How to turn such massive and unstructured text data into structured, actionable knowledge, and how to enable effective and user-friendly access to such knowledge is a grand challenge to the research community. This course will introduce and discuss many of the sub-problems and methods of information extraction, including use of textual patterns, language and formatting features, generative and conditional models, rule-learning and deep learning techniques. We will discuss segmentation of text streams, classification of segments into fields, association of fields into records, and clustering and de-duplication of records.

Course Description

An overview of course requirement, assignment, project, etc.

Course Schedule

Course schedule and syllabus.

Course Project

Course project requirement and policy.


News and announcements.