You are here

Prof. Dagan's Lab

Prof. Dagan's Lab

Head - Natural Language Processing Lab

 

Tel: 972-3-531-7620
Email: dagan@cs.biu.ac.il

 

Applied Semantic Processing

Research at Prof. Ido Dagan’s Natural Language Processing (NLP) Lab in the Department of Computer Science is dedicated to applied semantic processing. 

For over a decade, Dagan’s work has focused on empirical and learning methods for language processing, with a particular emphasis on unsupervised semantic learning.

In the last several years, he and his team introduced textual entailment as a generic framework for applied semantic inference over texts.

This framework aims to power the core semantic inferences in many NLP applications. With other colleagues, they organized the seven rounds of the PASCAL Recognizing Textual Entailment (RTE) Challenges (2005-2011), which became the primary forum for empirical evaluation of semantic inference systems. These challenges were adopted as a track at the Text Analysis Conference (TAC), the primary evaluation forum for semantic text processing organized by the U.S. National Institute of Standards and Technology (NIST).

At Bar-Ilan, Dagan’s NLP group develops computational models of textual entailment, including automatic knowledge acquisition, semantic inference, and information access applications, as detailed below. Dagan was awarded the Wolf Foundation Krill Prize for Excellence in Scientific Research in 2008 in recognition of his contributions to the field, and the IBM Faculty Award in 2007. He was elected as the President of the Association for Computational Linguistics (ACL) for 2010, and has been serving on its Executive Committee during 2008-2011.

Semantic Knowledge Acquisition

Semantic inference requires vast amounts of knowledge, which encodes inferences that are based on both linguistic and world knowledge. Accordingly, a large proportion of research at the NLP lab is dedicated to developing automated knowledge acquisition methods. Mostly, such knowledge is encoded in the form of entailment rules, which specify inferential relations between words, phrases and complex language expressions. Such rules are extracted from available language resources (WordNet and FreeNet), from Web-based knowledge resources designed for human consumption (Wikipedia), or directly from raw text through unsupervised learning methods. Learning methods include improved variants of distributional similarity, co-occurrence statistics and pattern-based extraction, as well as global optimization algorithms that learn complex entailment-graphs. The latter line of research yielded a paper which received the Best Student Long Paper award at the ACL-2011 conference, authored by Jonathan Berant, Ido Dagan and Jacob Goldberger.

Semantic Inference Engine

Dagan and his colleagues are developing a semantic inference engine for recognizing textual entailment. The engine is based on a formalism which “proves” the targeted hypothesis from a given text through a sequence of parse-tree generation operations. These operations include the application of entailment rules obtained from various resources (cf. Semantic Knowledge Acquisition above), operations driven by the discourse structure, and on-the-fly heuristic operations that bridge inevitable knowledge gaps along the proof. The inference engine searches for an optimal integration of the various operations, while applying machine learning in order to obtain the most reliable proofs.

Information Access Applications

The thrust of the textual entailment paradigm is to provide generic semantic engines that will boost existing NLP applications and enable new promising ones. Along this line, the NLP Lab applies entailment technology, in a mostly unsupervised manner, to applications such as Information Extraction and Name-Based Text Categorization.

Dagan and his group are now undertaking a more ambitious project - developing an intriguing paradigm for text exploration based on entailment graphs – which they hope will make a significant contribution to information consumption capabilities. This line of research is being conducted in collaboration with industry and government partners (IBM, NICE, Israel Ministry of Defense). As an additional line of research they are investigating how entailment technology can provide a generic infrastructure to boost natural language interpretation in human-machine interactions.

Last updated on 1/6/14