Handling Multilingual sites - A Tango with Django
Most internet users today will have some idea of what a multilingual site is. One of the most accessible examples is Wikipedia. For users the experience should be seamless: they should be reading the content they have come to the site for, in their preferred language, and with the least amount of distraction possible.
But what goes on behind the scenes? What is different about a multilingual site that needs special consideration?
There is no solid line separating a multilingual and a single language website but there are many considerations and many things that can be included. Because of this I will only talk about some of the key points to note.
The type of website we can focus on is dynamic and requires a web framework to talk to a database. Unlike static sites where the page you see is exactly as it is on the server, a dynamic site builds the pages as you browse. This means that site administrators can add and change content (like this article) after the site has been built. The content will sit in a database independently of the HTML/CSS of the website and the magic to bring them both together is done by the web framework.
The web framework Django comes with built-in support for the translation of static strings of text and the interface itself, and can also handle separate localities, which for example is useful for separating American or Australian English, should you need to.
What Django doesn't come with is a way to translate and manage database content in different languages. However, the large and active Django community has made several applications that extend its functionality.
The most common solution duplicates the fields that require translation in your database tables for the separate languages. Good examples include django-modeltranslation and django-linguo. Here the number of tables in your database does not change but the table structure does. Alternatively some solutions, such as django-model-i18n, produce separate tables for the separate languages, a system whereby the table structure is preserved but Django needs to handle the related tables. Both these solutions require the database structure to be changed when a new language is added or removed.
The preference of the team at Coracle is to leave the database structure alone as much as possible because we don't want unrelated changes in site functionality to require restructuring of the multilingual configuration. So my two main options are either to create translations that are external to the main tables as is the case with django-datatrans, or to allow the various translations to be handled by the original tables but as separate entries. I tend to choose the latter. A good overview of the various options can be found on the Django wiki
If the same table is used for the various language versions, two new identification fields are required for each article. One identifies the language and the other associates it with the other language versions – a structure I call the “article-language” group. Notice that in this way a group of entries in the same “article-language” group doesn't require a default language in order to exist and so all entries take equal bias. This is not to say the site can't have a default language but it is not prescribed by the database. In the same way, articles could exist in the database in languages unsupported by the site.
One motivation for building our own multilingual Django solution is for the sake of the site administration. Site administration is via a web interface that allows administrators, staff and translators access to the site’s database. Django has a good administration site that can be bent to the developer’s will. Much of the work to make a well functioning multilingual site is in building a well functioning multilingual administration site. If it is not just a simple multilingual site you can expect the administration to have multiple users each requiring different permissions to edit, add and delete articles (database entries) depending on their language and role and each user may also require a default language in their user profile.
Keeping Google happy with your multilingual content
Let’s have a quick look at the URLs. Navigating dynamically-generated pages in this sort of framework can result in URLs that appear peculiar to search engines. For example if the URL mysite.org/en/trees gives us the English version of the page at mysite.org/it/alberi and the user is browsing in Italian, they could feasibility enter mysite.org/en/alberi to view the English version. However you indicate the different language versions in the URL, you’ll need to help Google index your content without treating it as duplicate and consequently penalising your site.
- rel="alternate" is used to indicate the page is available in other languages and needs to be followed by the ‘hreflang’ attribute used to identify the language.
- rel="canonical" is used to indicate identical content. The canonical page is the default version.
This means that if you use two different URLs to access the same page content you should add a link to the canonical page from all non-canonical versions, using rel="canonical".
To finish on a lighter note I’ll leave you with the little task of finding the original languages and meanings of the words 'Tango' and 'Django' that appear in the title. Submit your answer using the enquiry form below and on Feb 28, 2013 we’ll draw a random winner and them a £20 iTunes voucher!