Kitsune
master
  • Join this project!
  • Development Setup
  • Contact us
  • Conventions
  • Patching Kitsune
  • Development
  • All about testing
  • Celery
  • Running Kitsune with mod_wsgi
  • Email from Kitsune
  • Localization
  • Search
  • Frontend Infrastructure
  • Elevated Browser Permissions
  • Zendesk integration
  • Other Notes
  • API
  • Kitsune Deployments
  • SUMO Kubernetes Support Guide
  • Service Level Agreement
  • Architectural Decision Records
    • 1. Record architecture decisions
    • 2. Storing localized content in Search
      • Status
      • Context
      • Decision
      • Consequences
    • 3. AAQ structure in Search
  • Users
  • Ask A Question
  • Badges
  • Advanced Search
Kitsune
  • »
  • Architectural Decision Records »
  • 2. Storing localized content in Search
  • Edit on GitHub

2. Storing localized content in Search¶

Date: 2020-10-27

Status¶

Pending

Context¶

Kitsune supports many locales, and has content which we want to be searchable in those locales.

Elasticsearch has support for many language-specific analyzers: https://www.elastic.co/guide/en/elasticsearch/reference/7.9/analysis-lang-analyzer.html

Search v1 used per-document analyzers, that is to say, within the same index:

doc_1: { "content": "Hello world" }
doc_2: { "content": "Hallo Welt" }

doc_1.content could be analyzed using an english analyzer, and doc_2.content could be analyzed using a german analyzer.

Well before version 7 ES removed this feature, and now all fields of the same name across an index must be analyzed the same, so we must take a different approach with the current Search implementation.

We can either place separate locales in their own index, and set up locale-specific analyzers for the same field name across indices. Or we can keep separate locales within the same index, and define unique field names for each field which needs to be analyzed under a specific locale.

Decision¶

Heavily influenced by: https://www.elastic.co/blog/multilingual-search-using-language-identification-in-elasticsearch

We will store all documents within the same index and use an Object field for fields which need to use locale-specific analyzers.

We will call this field SumoLocaleAwareTextField and will have a key for each locale, with the appropriate analyzer defined on that key, such that:

doc_1: { "content": { "en-US": "Hello world" }}
doc_2: { "content": { "de": "Hallo Welt" }}

doc_1.content.en-US is analyzed using an english analyzer, and doc_2.content.de is analyzed using a german analyzer.

Consequences¶

We won’t need to manage many indeces of wildly different sizes, as we would if we used one index per locale.

Documents within a specific locale can be searched for by searching on the field.locale-name field, for instance content.en-US.

Searching across all locales can be performed with a wildcard (like content.*).

Next Previous

© Copyright 2011 - 2021 Mozilla Foundation. Revision 88701d66.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: master
Versions
master
latest
Downloads
On Read the Docs
Project Home
Builds