**Speaker**: Guy Van den Broeck, UCLA, USA

**Title**: Open-World Probabilistic Databases

**Date**: 21st September

**Abstract**: Large-scale probabilistic knowledge bases are becoming increasingly important in academia and industry alike. They are constantly extended with new data, powered by modern information extraction tools that associate probabilities with database tuples. In this paper, we revisit the semantics underlying such systems. In particular, the closed-world assumption of probabilistic databases, that facts not in the database have probability zero, clearly conflicts with their everyday use.

To fix this discrepancy, we propose an open-world probabilistic databases semantics, which relaxes the probability of open facts to intervals. While still assuming a finite domain, this semantics can provide meaningful answers when some probabilities are not precisely known. For this open-world setting, we propose an efficient evaluation algorithm for unions of conjunctive queries. Our open-world algorithm incurs no overhead compared to closed-world reasoning and runs in time linear in the database size for tractable queries. All other queries are #P-hard, implying a data complexity dichotomy between linear time and #P. For queries involving negation, however, open-world reasoning can become NP-, or even NP^PP-complete. Finally, we discuss limitations and additional knowledge-representation layers that can further strengthen open-world reasoning about big uncertain data.

**Speaker:**Jonathan Lawry, University of Bristol, UK

**Title:**Blurred Boundaries, Borderline Cases and the Utility of Vagueness

**Date**: 22nd September

**Abstract**: Vagueness is ubiquitous in natural language, but it is unclear exactly what role it plays in communication. In this talk I will consider possible ways in which vagueness can be useful, not only in natural language, but also as an integral part of formal knowledge representation in artificial intelligence. Vagueness is a multifaceted phenomenon and different aspects are likely to be useful in different ways. I will specifically consider the utility of blurred boundaries where there is uncertainty about the exact location of the boundary between a category and its complement, and of borderline cases which neither belong absolutely to a category nor to its complement. I will describe an integrated approach to modelling both blurred boundaries and borderline cases based on a combination of probability and three-valued logic. I will briefly outline the properties of this model and then use it to explore how vagueness can be useful in the areas of natural language generation, consensus building in multi-agent systems, distributed decision making, and signal aggregation in multi-sender channels.

**EurAI (formerly ECCAI) Talk**

**Speaker**: Eyke Hüllermeier, Universität Paderborn, Germany

**Title**: Large-scale machine learning: Sparse probability estimation for extreme classification

**Date**: 23rd September

**Abstract**: Machine learning is nowadays applied to massive data sets of considerable size, including potentially unbounded streams of data. Under such conditions, the scalability of learning algorithms is of major concern, calling for an effective data management and the use of appropriate data structures for time- and space-efficient implementations. Starting with a brief introduction to large-scale machine learning and the discussion of some general issues in this field, the talk will focus on the problem of extreme multi-label classification, i.e., multi-label classification with extremely large label spaces. In this context, the choice of appropriate loss functions is specifically important, because loss functions commonly used in standard classification are no longer meaningful. Here, a motivation is given for the so-called F-measure, and a learning algorithm tailored for maximizing performance in terms of this measure is proposed. A key feature of this algorithm is the use of sparse probability estimates of labels given instances, that is, probability estimates restricted to the most probable labels. Thanks to this approach, the algorithm is applicable to problems with extremely many labels.