I spent the day listening in on the RDF Validation Workshop, which kind of spilled over into the Cambridge Semantic Web Meetup. Here are my general musings and notes from the day. They may be completely wrong for many reasons, including possible misunderstandings of what the speaker said and information that is now out of date.
Google Says They’re Triplifying the Web
The Google Knowledge Graph is including a need to support users in how they can provide rich snippets in Google search results. They are building a validator for these formats against the RDF representations of their microdata. Most of their constraints are property paths, and they use SPARQL for the rest. They are also mostly concerned with suitability to their purpose, which is based on rich snippets and knowledge graph. They are using SPARQL-based constraints and are using RDFlib for prototyping, but will be moving to their own parser, which is used by the Structured Data Testing Tool. Here is an example path constraint, with the resulting SPARQL queries that are generated from it:
schema:reservationFor/schema:flightNumber
SELECT ?context WHERE {?context schema:flightNumber ?constraint.}
ASK WHERE {?context schema:flightNumber ?constraint.}
Currently, they are only validating things that are necessary, they won’t check for things that are optional.
Semantic Web Meetup
The general idea seems to be that the RDF community needs to provide a means to say the following things about RDF graphs:
- The graph must at least contain X.
- The graph must contain at most Y.
- The graph can never contain Z.
The general idea seems to be to provisionally close any given RDF graph before validation in order to produce the report. That closure can include some fixed set of other graphs (such as vocabularies used), but ultimately, for the purposes of validation, the Unique Name Assumption and the Closed World Assumption need to be used to validate the graph as given. Eric Prud’hommeaux presented an interesting framework based on YACC-style grammars by providing “shapes” of objects to validate. This is similar to OSLC’s (Open Services for Lifecycle Collaboration) Resource Shape vocabulary, but with additional capabilities around disjunction and non-declarative validation processes.