LODUM Blog

News from the LODUM headquarters and all things Linked Open Data.


Submissions due: 18. June 2012

Workshop on GIScience in the Big Data Age 2012

http://stko.geog.ucsb.edu/gibda2012/

In conjunction with the seventh International Conference on Geographic Information Science 2012 (GIScience 2012).

Columbus, Ohio, USA. September 18th, 2012

Workshop Description and Scope

The rapidly increasing information universe with new data created at a speed surpassing our capacities to store it, calls for improved methods to retrieve, filter, integrate, and share data. The vision of a data-intensive science hopes that the open availability of data with a
higher spatial, temporal, and thematic resolution will enable us to better address complex scientific and social questions. However, on the downside, understanding, sharing, and reusing these data becomes more challenging. Big Data is not only big because it involves a huge amount
of data, but also because of the high-dimensionality and inter-linkage of these data sets. The on-the-fly integration of heterogeneous data from various sources has been named one of the frontiers of Digital Earth research, Bioinformatics, the Digital Humanities, and other emerging research visions. From a more technical perspective, a knowledge infrastructure is required to handle Big Data. Currently, the most promising approach is the Linked Data cloud. While the Web has changed with the advent of the Social Web from mostly authoritative towards increasing amounts of
user-generated content, it is essentially still about linked documents. These documents provide structure and context for the described data and easy their interpretation. In contrast, the upcoming Data Web is about linking data, not documents. Such data sets are not bound to a specific
document but can be easily combined and used outside of the original context. With a growth rate of millions of new facts encoded as RDF-triples per month, the Linked Data cloud allows users to answer complex queries spanning multiple sources. Due to the uncoupling of data
from its original creation context, semantic interoperability, identity resolution, and ontologies are central methodologies to ensure consistency and meaningful results. Space and time are fundamental ordering relations to structure such data and provide an implicit context for their interpretation. Prominent geo-related Linked Data hubs include Geonames.org as well as the Linked Geo Data project, which provides a RDF serialization of Open Street Map. Furthermore, many other
Linked Data sources contain location references, e.g., observation data provided by sensors.
This full day workshop is a follow-up event of the successful first workshop on Linked Spatiotemporal Data at GIScience 2010. While this first workshop was centered around Linked Data and geo-ontologies, the GiBDA 2012 workshop takes a broader perspective by highlighting
data-intensive science as the research vision and Linked Data as a promising knowledge infrastructure. We hope that the workshop will help better define the data, knowledge representations, infrastructure, reasoning methodologies, and tools needed to link and query massive data based on their spatial and temporal characteristics.

List of Relevant Topics

Topics of interest for the Linked Spatiotemporal Data workshop include
(but are not limited to):

  • Mining Big Data
  • Learning geo-ontologies out of massive data
  • Abduction-based frameworks and systems
  • Mining Location-based Social Networks
  • Studying the geo-indicativeness of massive, semi-structured data
  • Analogy-based search in Big Data
  • Semantic heterogeneity and ontology alignment
  • Semantics-enabled geo-statistics
  • Retrieving and browsing of Linked Spatiotemporal Data
  • Learning Linked Spatiotemporal Data from existing sources
  • Spatiotemporal indexing of Linked Data
  • Harvesting Linked Data from heterogeneous sources
  • Spatial extensions to query languages (e.g., GeoSPARQL)
  • Visualizing and browsing through Linked Spatiotemporal Data
  • Big Data and Volunteered Geographic Information (VGI)
  • Spatiotemporal aspects of data quality, trust, and provenance
  • Tag and vocabulary recommendations for annotating VGI
  • Maintenance of outgoing links
  • Application of Linked Spatiotemporal Data
  • Linked Data and Sensor Web Enablement (SWE)
  • Linked Data and mobile applications
  • Linked Data gazetteers and Points Of Interest
  • Linked Data in the domain of cultural heritage research
  • Integration and Interoperation of Linked Spatiotemporal Data
  • Ontologies and vocabularies to support interoperability
  • Geo-Ontology Design Patterns
  • Identity assumptions and resolution for data fusion and integration
  • The role of space and time to structure Linked Data
  • Versioning of spatiotemporal data.
  • Semantic annotation and Microformats
  • Adding contextual information to Linked Data

Workshop Format and Structure

The full day workshop will focus on intensive discussions setting a roadmap towards publishing, structuring, retrieving, and consuming
Linked Spatiotemporal Data and understanding how GIScience can
contribute to the vision of a data-intensive science. The workshop will
accept three kinds of contributions, full research papers presenting new
work in the indicated areas, statements of interest, and data challenge
papers. While the research papers will be selected based on the review
results adhering to classical scientific quality criteria, the
statements of interest should raise questions, present visions, and
point to the open gaps. However, statements of interest will also be
reviewed to ensure quality and clarity of the presented ideas. We also
welcome demonstrations of existing tools, applications, and
geo-ontologies. Details for the data challenge are given below. The
presentation time per speaker will be restricted to 5 minutes for
statements of interest and 10 minutes for full papers. Based on the
presented work, all workshop participants will decide on 2—3 research
topics to be discussed in breakout groups. In a final session, the
breakout groups will present their findings on research topics and
challenges and try to integrate them across the discussed topics.

Submissions and Proceedings

All presented papers will be made available through the workshop
Web-page, the electronic conference proceedings of GIScience 2012, as
well as via CEUR-WS. Full research papers should be approximately 7-10
pages, while statements of interest and data challenge papers should be
between 5-6 pages. Selected papers may be considered for a fast-track
submission to the Semantic Web journal by IOS Press.

Data Challenge

The website spatial.linkedscience.org contains a growing collection of
metadata for proceedings of conferences on topics related to geographic
information science. So far, it contains most of the metadata for the
GIScience, COSIT, ACM GIS, and AGILE conference series. Within the GIBDA
data challenge, we are looking for

  • innovative analyses of the data
  • interactive visualizations
  • approaches for cleaning the data up
  • pattern and topic mining
  • enrichment and interlinking with other datasets (e.g., from the Linked Data cloud)
  • insights into GIScience as research field
  • adding social roles and aspects

The raw data can be queried via SPARQL at spatial.linkedscience.org/sparql. Submissions to the data challenge are to be submitted through EasyChair as a brief description of the entry, along with a link to the demo/analysis/dataset. Entries to the challenge will be evaluated by the program committee based on innovativeness and potential impact. The winner will be awarded a $250 price sponsored by 52North and will present at the workshop.

Keynote

Gilberto Camara, the Director of Brazil’s National Institute for Space
Research (INPE) will kick off the workshop with a keynote on Big
Geo-Data and related topics.

Important Dates

  • Submission due: 18. June 2012
  • Acceptance Notification: 6. July 2012
  • Camera-ready Copies: 16. July 2012

Organizers

  • Krzysztof Janowicz, University of California, Santa Barbara, USA
  • Carsten Keßler, University of Münster, Germany
  • Tomi Kauppinen, University of Münster, Germany
  • Dave Kolas, BBN Technologies, USA

Programme Committee

  • Benjamin Adams, University of California, Santa Barbara, USA
  • Boyan Brodaric, Geological Survey of Canada, Canada
  • Oscar Corcho, Universidad Politecnica de Madrid, Spain
  • Isabel Cruz, University Of Illionois, USA
  • Mike Goodchild, University of California, Santa Barbara, USA
  • Willem Robert van Hage, Vrije Universiteit Amsterdam, NL
  • Pascal Hitzler, Wright State University, USA
  • Werner Kuhn, University of Muenster, Germany
  • Jens Lehmann, , University of Leipzig, Germany
  • Matthew Perry, Oracle, USA
  • Simon Scheider, University of Muenster, Germany
  • Christoph Schlieder, University of Bamberg, Germany
  • Claus Stadler, University of Leipzig, Germany
  • Kristin Stock, University of Nottingham, UK

Please feel free to contact the organizers for further questions at jano
@ geog . ucsb. edu.


There will be a poster and a demo presented at the 9th Extended Semantic Web Conference (ESWC2012) related to Linked Science and to the LODUM project:


The University Library here in Münster has released their entire collection of bibliographic data as Open Data under a Creative Commons Zero license. Here’s a translation of the press release:

The University Library Münster (ULB) has released their catalog data as Open Data. The approximately 3.4 million records include most of the scientific literature in the library system of the University of Münster, including links to electronic full texts available there. The data are published under a Creative Commons Zero license. This results in new opportunities to query and link the catalog data and build new applications, search, and analysis tools on the basis of these data without any restrictions.

With the release of catalogue data, the ULB Münster catalog strongly supports open access to scientific information. Under the initiative LODUM (Linked Open Data University of Munster), the university is the first university in Germany to explore and use this technology. “For us, the publication of the catalog data, therefore, only the first step,” said Dr. Beatrice Troeger, director of the ULB. The library plans to expand its offerings in this area and continually improve.

The ULB has been active in the Open Access initiative for many years and is now starting to adopt the concept of Linked Data. But whereas in “open access” to the published scientific information itself goes, turns at the library for information about data sharing information. “The so-called metadata have always been the core business of libraries,” said Jörg Lorenz, Head of the Department for Digital Services. Without such data it is simply impossible to get through the ever-increasing flood of information passed to relevant scientific publications.

We have now converted these data to RDF and provide them through our store.


Heads up: If you’re not into Web development, this post is going to bore the hell out of you.

While it is quite straightforward these days to load data dynamically via AJAX, everyone who has tried to augment their website with Linked Data via JavaScript has probably hit the wall of the same origin policy. While a script can be loaded from any arbitrary server, this policy allows the script to communicate exclusively with the same port on the same host through the same protocol. So you’re even lost if you want to load data from your triple store hosted on the same machine, but on port 8080, for example.

While the policy is there for very good (security) reasons, it can be quite an annoyance when you are trying to develop applications that pull in data from various sources in the Linked Data Cloud. JSON with padding, better known as JSONP, has been introduced as a creative workaround for this problem. It basically packs the results of a GET request as a JavaScript function; the result becomes a script of its own (the callback function), and these can be loaded from anywhere (see above). The only problem with this approach is that the server needs to support JSONP, as the results need to be packed into a JS function. In other words, if the SPARQL endpoint you are loading data from does not support JSONP, you are still lost and have to come up with a workaround such as a local proxy script.

And this is pretty much what our proxy service at http://jsonp.lodum.de does. Fire any GET requests with the following three parameters at it, and it will return the SPARQL results wrapped as JSONP:

  • endpoint: The URL of the SPARQL endpoint you want to query.
  • query: Your SPARQL query.
  • callback: The name for the callback function you are expecting.

If you are using jQuery, you don’t even need the callback function. jQuery takes care of that part for you; Here’s a quick example:

var endpoint = "http://data.uni-muenster.de/sparql";
var queryUrl = "http://jsonp.lodum.de/?endpoint=" + endpoint;
$.ajax({
   dataType: "jsonp",
   data: {query: "SELECT * WHERE { ?a ?b ?c . } LIMIT 10"},
   url: queryUrl,
   success: function(data) {
      // do something with the results
   }
});

jQuery takes care of assigning the callback function a name and you’re ready to use the SPARQL results in JSON notation. No need to say that this also works with other endpoints.

If you don’t want to rely on our server being up, the script is also available for download, so that you can set up your own proxy.


full_text_search

We added a FullTextIndex (Owlim/Lucene) to the Lodum store containing all literal values. The index is named “lodumLiteralIndex” and can be queried combining SPARQL and the Lucene sytanx (Sparql Example).

On top we built a small GUI with JQuery, making the search capabilities accessible to the wider public. Even though, the index contains blank notes es well (e.g. authorlists), we still working on a smart way to integrate those into the GUI.

The RDFpad got an update as well and is now working with both Etherpad versions ( Etherpad and EtherpadLite). In addition, we built a small on-the-fly SPARQL-Tool (SPARQLfly) in order to query small, remote RDF Triples using the FROM clause (e.g. written in Etherpad or static files). The RDFpad provides automatically a link to SPARQLfly, which will then read and prepare the query form containing all namespaces.


Nice overview of how Linked Open Data works and why it’s useful from the Europeana folks.


spatial_linked

It just occurred to me that we have not even mentioned the new version of spatial.linkedscience.org here yet. Completely new UI, and tons of new data.


Workshop on GIScience in the Big Data Age 2012


One of the main goals of LODUM is to open up the university’s data silos, integrate the data, and make it easy to build applications on top of the data collection. This productivity map for Google Earth is an example of such an application. It renders the university buildings in 3D – the building height indicates the number of publications written by researchers working in the respective building.

The absolute number of papers is normalized by the number of researchers working in the given building for a more balanced impression. The buildings are split in two parts: the lower part indicates the number of journal papers, whereas the upper part represents all other publications.

Clicking either of these two parts opens a pop-up with the actual numbers. The distribution of publications between the different institutions in a building is visualized as a pie chart (generated by the Google Chart Tools). The pop-ups also include links to the SPARQL queries to pull the data for the given building out of our store, so that interested developers can learn how we built this map.

The KML file is also available for download.


I attended the Semantic Web in Bibliotheken (i.e., … in libraries) meeting in Hamburg this week and presented our work on behalf of the team. Besides the presentation, the organizers (HBZ and ZBW) invited me to participate in the panel discussion at the end of the conference, and I would like to stress again what I said there: I think the libraries are really on top of their game when it comes to Linked Open Data. Having looked at a number of other fields in the context of LODUM, I must say that very few disciplines embrace the idea of LOD in the way the libraries do.

I was very impressed by the quality of the meeting and the presentations and made some useful new contacts (and finally met some people I only knew by email). Feedback for LODUM was also very positive, which came in both in talks over coffee and via Twitter:  

It was quite a Twitter-friendly crowd in general, check the #SWIB11 hashtag on Twitter for a really impressive number of tweets from the meeting. I’d like to mention one last tweet: 

I mention this one because Chris Gutteridge from Southampton came up to me and said:

I love what you’re doing, but I hate your HTML pages.

And that’s for a good reason. I think we really can (and need to) do a lot better here, and that is indeed something I would like to work on in the coming weeks. I think we’ll give Chris’ Graphite framework a shot for this task. 

For the sake of completeness, my slides from SWIB11:

LODUM @ SWIB11 from Carsten Keßler

– Carsten