Multilingual Linked Open Data Patterns

Include Language in URIs

Description

This pattern proposes to insert a language identifier in the URI. Thus, datasets of different languages can be easily recognized by the URI.

Context

It can be applied when datasets are clearly separated by language. In this way, the different datasets may be generated, and even maintained by different servers which publish their corresponding language dependent datasets separately.

Example

The Armenian version of the country Armenia could be:

http://hy.example.org#Հայաստան

where hy represents the Armenian language (Hayeren). All the triples in Armenian could be hosted in hy.example.org while the triples in other languages, for example Spanish, could reside in es.example.org.

Discussion

In a multilingual setting, being able to easily recognize the language of a resource may facilitate the development. A more practical benefit is to separate different datasets, which can be obtained from different sources, by language.

However, notice that the employment of a language tag in the URI may contradict the Cool URIs lemma in the sense that we are encoding extra information in the URI that may be subject to changes in the future.

Adding languages to the URI can become unwieldy if we consider sub-languages, dialects and regions. There are more than 7000 languages already registered which can be very specialized.

For example, hy-Latin-IT-arevela represents eastern Armenian written in Latin script as used in Italy. Including such a detailed information in the URI may not be reasonable.

Another design decision is where to put the language tag inside the URI. It is possible to have alternative URI schemes depending on where we include the language tag.

For example, it is possible to have alternative URI schemes depending on where we include the language tag in the URI:

http://example.org/hy#Հայաստան
http://example.org/Հայաստան/hy

The last pattern is less convenient as it mixes the Unicode characters of the local name with the ASCII characters of the language tag.

See also

This pattern can be combined with the language content negotiation pattern. It is possible to have a language agnostic URI for a concept without language tag and to use HTTP language content negotiation to redirect to the preferred dataset.

This pattern is also related to the Patterned URIs. in the sense that URIs are defined to follow a naming pattern (including the language).

It is also related to Hierarchical URIs.