This document describes WShEx (Wikibase Shape Expressions), a language inspired by ShEx to describe and validate Wikibase entities.

Introduction

WShEx is based on Shape Expressions ([[SHEX]]) and uses a similar syntax. The main difference is that while ShEx describes and validates RDF, WShEx describes and validates Wikibase entities.

Wikibase is the software toolkit on which Wikidata has been implemented. Although the Wikibase data model offers an RDF serialization, it contains several built-in features like labels, descriptions, aliases, qualifiers, references or ranks which are not directly supported by RDF.

WShEx supports those Wikibase features as part of the language, which makes it more concise and usable to describe Wikibase entities.

[[[#WShExRole]]] presents the role of WShEx sa a language that describes Wikibase entities directly, which is different to Entity Schemas in ShEx, which describes the RDF serializatoin of Wikibase entities. Both languages are complementary and it is possible to convert one to the other.

WShEx Role
WShEx role describing Wikibase entities

Namespaces

In this document we will employ the following aliases for namespaces

wdhttp://www.wikidata.org/entity/
wdthttp://www.wikidata.org/prop/direct/
phttp://www.wikidata.org/prop/
pshttp://www.wikidata.org/prop/statement/
pqhttp://www.wikidata.org/prop/qualifier/
xsdhttp://www.w3.org/2001/XMLSchema#

Example

The following example declares a <Researcher> shape with several triple constraints.

We include both the WShEx schema (first column) and the equivalent ShEx schema (also called Entity Schema by the Wikibase community)

WShEx primer

This section contains a primer about the WShEx language using several examples.

All the examples are represented using the WShEx compact syntax and the equivalent Entity Schema representation using ShEx compact syntax

Triple constraints

A triple constraint contains a declaration about a wikibase statement. It has a predicate, a value expression and a cardinality.

Node constraints

Node constraints are value expressions that constrain the possible values of a node

They can be:

Labels, descriptions and aliases

Each Wikibase entity contains a list of associated labels, descriptions and aliases associated with languages

WShEx adds the keywords Label, Description and Alias to describe the associated strings.

It is possible to add string constraints to the labels, description and aliases associated to some language tag.

Value sets

Value sets are marked by [ and ], for example [ :Q5 ] declares that the possible values must be exactly :Q5.

It is possible to have several elements in the value set.

Stems of value sets

A value set can also contain Stems for IRIs

Built-in datatypes

WShEx will support the built-in datatypes from Wikidata

Support for Built-in datatypes as part of value sets is still work-in-progress. One issue is the support for compact syntax for their values as well as facets.

  • CommonsMedia
  • GlobeCoordinate
  • Item
  • Property
  • String
  • MonolingualText
  • ExternalIdentifier
  • Quantity
  • Time
  • URL
  • MathematicalExpression
  • GeographicShape
  • MusicalNotation
  • TabularData
  • Lexeme
  • Form
  • Sense

Facets

It is possible to constraint the values of datatypes using facets

String facets

WShEx supports the same definition of string facets as ShEx, which can add constraining facets like specifying the Length, MinLength, MaxLength and a regular expression pattern marked by the slach characters (/)

Numeric facets

WShEx supports facets on numeric values like specifying MinInclusive, MinExclusive, MaxInclusive and MaxExclusive. These facets can be applied to numeric values like quantities.

Although it is not currently supported, in the future, WShEx could also define numeric facets to compare the values of other datatypes, like time values, for example.

Qualifiers

One aspect that is different between Wikibase and RDF is the use of qualifiers. A qualifier annotates a statement with further statements about it.

In WShEx, qualifiers are annotated using {| and |}

References

References enable authors to associate reference information about statements. References are similar to qualifiers, but the wikibase data model allows references to be grouped, so for each statement, it is possible to have a list of references.

In WShEx, references are added after the keyword References and each property specification is annotated using {| and |}

Other features

In this section we include some features which we may include in WShEx but have not been implemented yet

Ranks

WShEx may add the keywords NormalRank, PreferredRank and DeprecatedRank to declare the ranks of statements

Semantic specification

The semantic specification is still work in progress. A first draft of a semantic specification based on a subset of WShEx has been published in the paper presented at Wikidata Workshop 2022: WShEx: A language to describe and validate Wikibase entities

WShEx grammar

An ANTLR Grammar of the compact syntax of current WShEx is available at: WShExDoc.g4

WShEx implementations