Thursday, April 10, 2014

JWS preprint: Linked knowledge sources for topic classification of microposts: a semantic graph-based approach


A new preprint is available on the JWS preprint server.

Andrea Varga, Amparo E. Cano, Matthew Rowe, Fabio Ciravegna and Yulan He, Linked knowledge sources for topic classification of microposts: a semantic graph-based approach, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.


Short text messages, a.k.a microposts (e.g., tweets), have proven to be an effective channel for revealing information about trends and events, ranging from those related to disaster (e.g., Hurricane Sandy) to those related to violence (e.g., Egyptian revolution). Being informed about such events as they occur could be extremely important to authorities and emergency professionals by allowing such parties to immediately respond.

In this work we study the problem of topic classification (TC) of microposts, which aims to automatically classify short messages based on the subject(s) discussed in them. The accurate TC of microposts however is a challenging task since the limited number of tokens in a post often implies a lack of sufficient contextual information. In order to provide contextual information to microposts, we present and evaluate several graph structures surrounding concepts present in linked knowledge sources (KSs). Traditional TC techniques enrich the content of microposts with features extracted only from the microposts content. In contrast our approach relies on the generation of different weighted semantic meta-graphs extracted from linked KSs. We introduce a new semantic graph, called category meta-graph. This novel meta-graph provides a more fine grained categorisation of concepts providing a set of novel semantic features. Our findings show that such category meta-graph features effectively improve the performance of a topic classifier of microposts.

Furthermore our goal is also to understand which semantic feature contributes to the performance of a topic classifier. For this reason we propose an approach for automatic estimation of accuracy loss of a topic classifier on new, unseen microposts. We introduce and evaluate novel topic similarity measures, which capture the similarity between the KS documents and microposts at a conceptual level, considering the enriched representation of these documents.

Extensive evaluation in the context of Emergency Response (ER) and Violence Detection (VD) revealed that our approach outperforms previous approaches using single KS without linked data and Twitter data only up to 31.4% in terms of F1 measure. Our main findings indicate that the new category graph contains useful information for TC and achieves comparable results to previously used semantic graphs. Furthermore our results also indicate that the accuracy of a topic classifier can be accurately predicted using the enhanced text representation, outperforming previous approaches considering content-based similarity measures.

Saturday, March 29, 2014

W3 CSV on the Web Working Group publishes working drafts



The W3c Data Activity's CSV on the Web Working Group published two first public working drafts. One provides a basic data model for tabular data and metadata and the other describes use cases and requirements derived from them.

Model for Tabular Data and Metadata on the Web

Tabular data is routinely transferred on the web as "CSV", but the definition of "CSV" in practice is very loose. This document outlines a basic data model or infoset for tabular data and metadata about that tabular data. It also contains some non-normative information about a best practice syntax for tabular data, for mapping into that data model, to contribute to the standardisation of CSV syntax by IETF. Various methods of locating metadata are also provided.

CSV on the Web: Use Cases and Requirements

A large percentage of the data published on the Web is tabular data, commonly published as comma separated values (CSV) files. The CSV on the Web Working Group aim to specify technologies that provide greater interoperability for data dependent applications on the Web when working with tabular datasets comprising single or multiple files using CSV, or similar, format. This document lists the use cases compiled by the Working Group that are considered representative of how tabular data is commonly used within data dependent applications. The use cases observe existing common practice undertaken when working with tabular data, often illustrating shortcomings or limitations of existing formats or technologies. This document also provides a set of requirements derived from these use cases that have been used to guide the specification design.

Friday, March 28, 2014

JWS preprint: API-Centric Linked Data Integration: the Open PHACTS Discovery Platform Case Study


Paul Thomas Groth, Antonis Loizou, Alasdair J. G. Gray, Carole Goble, Lee Harland and Steve Pettifer, API-Centric Linked Data Integration: the Open PHACTS Discovery Platform Case Study, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

Data integration is a key challenge faced in pharmacology where there are numerous heterogenous databases spanning multiple domains (e.g., chemistry and biology). To address this challenge, the Open PHACTS consortium has developed the Open PHACTS Discovery Platform that leverages Linked Data to provide integrated access to pharmacology databases. Between its launch in April 2013 and March 2014, the platform has been accessed over 13.5 million times and has multiple applications that integrate with it. In this work, we discuss how Application Programming Interfaces can extend the classical Linked Data Application Architecture to facilitate data integration. Additionally, we show how the Open PHACTS Discovery Platform implements this extended architecture.

Thursday, March 13, 2014

The Web's 25th anniversary


Greeting from Web inventor Tim Berners-Lee on the Web's 25th anniversary

Tim Berners-Lee invites everyone to celebrate the 25th anniversary of the Web and to join the activities organized by the World Wide Web Consortium and World Wide Web Foundation in 2014 and beyond to address some of the threats to the future of the Web. As Berners-Lee says, "Together we have built an amazing Web. But we still have a lot to do so that the Web remains truly for everyone." For more information about Web25 activities, visit webat25.org.

Saturday, February 22, 2014

JWS preprint: ArguBlogging: an Application for the Argument Web


New preprint on the JWS preprint server:

Floris Bex, Mark Snaith, John Lawrence and Chris Reed, ArguBlogging: an Application for the Argument Web, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

In this paper, we present a software tool for 'ArguBlogging', which allows users to construct debate and discussions across blogs, linking existing and new online resources to form distributed, structured conversations. Arguments and counterarguments can be posed by giving opinions on one's own blog and replying to other bloggers' posts. The resulting argument structure is connected to the Argument Web, in which argumentative structures are made semantically explicit and machine-processable. We discuss the ArguBlogging tool and the underlying infrastructure and ontology of the Argument Web.

Thursday, February 20, 2014

JWS Preprint: Streaming the Web: Reasoning over Dynamic Data


A new preprint is available on the JWS preprint server:

Alessandro Margara, Jacopo Urbani, Frank van Harmelen and Henri Bal, Streaming the Web: Reasoning over Dynamic Data, Web Semantics: Science, Services and Agents on the World Wide Web, to appear, 2014.

In the last few years a new research area, called stream reasoning, emerged to bridge the gap between reasoning and stream processing. While current reasoning approaches are designed to work on mainly static data, theWeb is, on the other hand, extremely dynamic: information is frequently changed and updated, and new data is continuously generated from a huge number of sources, often at high rate. In other words, fresh information is constantly made available in the form of streams of new data and updates. Despite some promising investigations in the area, stream reasoning is still in its infancy, both from the perspective of models and theories development, and from the perspective of systems and tools design and implementation. The aim of this paper is threefold: (i) we identify the requirements coming from dierent application scenarios, and we isolate the problems they pose; (ii) we survey existing approaches and proposals in the area of stream reasoning, highlighting their strengths and limitations; (iii) we draw a research agenda to guide the future research and development of stream reasoning. In doing so, we also analyze related research fields to extract algorithms, models, techniques, and solutions that could be useful in the area of stream reasoning.

Wednesday, January 1, 2014

Data Activity ⊃ Semantic Web ∪ eGovernment



In December the W3C created the Data Activity as the new home for the Semantic Web and eGovernment activities with Phil Archer as its lead. The Data Activity has eight ongoing groups, which include two new ones: CSV on the Web and Data on the Web Best Practices.

Much has changed in the past 15 years when the term Semantic Web was introduced and researchers and developers have  produced a large and mature collection of concepts, languages and technologies. The new, wider focus on exploiting this in support of open data and services is welcome. You can keep track of what the W3C Data Activity is doing on the new Data Activity Blog.

Tuesday, December 31, 2013

Carole Goble awarded a CBE for services to science


Congratulations to Professor Carole Goble who has been awarded a CBE (Commander of the Order of the British Empire) for services to science in the 2014 New Year's Honours List.

Professor Goble helped found the Journal of Web Semantics in 2003 and was an Editor in Chief and managed its editorial office from 2003 to 2008.

She is currently a full professor in the the School of Computer Science in the University of Manchester, UK, where she has co-led the Information Management Group since 1997.