Multilingual Linked Open Data Patterns

Full IRIs

Description

This patterns consists of using unrestricted IRIs which can contain Unicode characters outside the ASCII repertoire.

Context

In a multilingual setting, it is necessary to take into account that human-readability is not a generic aspect, but depends heavily on people’s culture and background. URIs employing only ASCII characters are difficult to handle by people used to non-Latin alphabets.

This pattern can solve that situation by allowing the application to use Unicode characters in resource identifiers.

Example

A full IRI using the Armenian language can be:

http://օրինակ.օրգ#Հայաստան

Discussion

IRIs with Unicode characters are more natural for people whose primary language is not Latin based. Since machines should be able to identify resources in either encoding and the technologies have already been developed, a further step is to make resource identifiers human friendly.

Although it is said that the end user should not be exposed to URIs and that they should act as internal identifiers, in practice, they are handled by application developers and sometimes even by end users.

Human friendly IRIs can facilitate the adoption of linked data technologies by more people in the long term. However, the use of IRIs may be exposed to visual spoofing attacks given that glyphs with the same appearance may refer to different characters.

Another important issue is the lack of support of IRIs by current software libraries. Although the support is improving, nowadays it is still a challenge and most of the tools only offer partial support.

See also

Unicode has published some security considerations for IRIs which should be taken into account [UTR 36]. A soft version of this pattern is Internationalized local names.

According to [Kontokostas 12, section 6.1], IRI dereferencing should be handled carefully as the HTTP Protocol (RFC 2616) can transfer only URIs.