Auto-Complete Suggestions with OpenSearch
Introduction
It is a known fact that many new programs come with suggestions on typing that help in suggesting what is likely to be entered as one types for. Such programs enhance user experiences making it more probable that user will get what they are looking for hence saving time that would otherwise be spent searching manually. So we can say that OpenSearch is an appropriate ground for auto-complete suggestions because it is backed by strong open-source search and analytics engine which handles all types of data as well as queries.
Understanding Index Mappings in OpenSearch
Index mappings in OpenSearch define the structure and behaviour of the data in the index. It is important to select adequate data types for the fields that would be used in searching and suggesting while creating an index for autocomplete predictions.
keyword mapping: if you define a field to be of type keyword like this.
PUT /bookstore
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then when you make a search query on this field you have to insert the whole value (keyword search) so keyword field.
POST /bookstore/_doc
{
"title": "The Lord of the Rings: The Fellowship of the Ring"
}
when you execute search like this:
GET products/_search
{
"query": {
"match": {
"title.keyword": "The Lord"
}
}
}
it will not match any docs. You have to search with the whole word “The Lord of the Rings: The Fellowship of the Ring”.
text mapping on the other hand is analysed and you can search using tokens from the field value. a full text search in the whole value:
GET products/_search
{
"query": {
"match": {
"title": "The Lord"
}
}
}
This will return a matching documents.
By default, keyword fields are both indexed (since index
is enabled) and store them on disk (because doc_values
is enabled). To save disk space instead of indexing them, you can specify that a field may not be indexed by setting index
to false
. If you’re looking for a field that needs ‘full-text’ search, assign it text
instead.
You can check this to more details keyword vs. text
Implementing Auto-Complete Suggestions with Wildcard Queries
Wildcard queries are a good choice for auto-complete suggestions when using keyword mappings, as they allow for prefix-based searching.
Apart from it, there is match_phrase_prefix
query that can also perform the auto-complete suggestions but the quality of it was not up to the mark, and also it has some issues with spaces as well, it doesn’t work where there are spaces in the search terms.
And like this, if you use wildcard
query with text or any other mapping then it also has problem with spaces, as it will not search anything after a space.
So the best option is to use wildcard
with keyword
mappings if you have requirement to perform a auto-complete on the non analysed data.
Let’s assume that we have some documents in the index, such as:
POST /bookstore/_doc
{
"title": "The Lord of the Rings: The Fellowship of the Ring"
}
POST /bookstore/_doc
{
"title": "The Lord of the Rings: The Two Towers"
}
POST /bookstore/_doc
{
"title": "The Lord of the Rings: The Return of the King"
}
Now, using the following query we can perform the auto-complete search on OpenSearch
GET /bookstore/_search
{
"query": {
"wildcard": {
"title.keyword": {
"value": "The Lord of the Rings*"
}
}
}
}
We include keyword
while searching so that it can search on the non analysed field, and get results without facing any issues. The query returns all documents that have a title that starts with “The Lord of the Rings”, which are all books in the “The Lord of the Rings” series.
I‘s important to note that wildcard
queries are case-insensitive by default. However, you can use the case_insensitive
option to control this behaviour. For example, to perform a case-sensitive search, you can set the case_insensitive
option to false
:
GET /bookstore/_search
{
"query": {
"wildcard": {
"title.keyword": {
"value": "The Lord of the Rings*",
"case_insensitive": false
}
}
}
}
In this case, the query will only match documents where the title starts with “The Lord of the Rings” in the same case as the search term. And if you don’t have such requirements you can set it to true
then you search with lower case as well.
This is simplest way to perform the auto-complete suggestion, as I’ve implemented the same logic in our product too, as we have dataset of around more than 1 million documents and it get me the results less than 100ms, so I can say that it worked very fine.
As there are very limited resources on the same so I’d to do lots of research to find this, that’s why I thought to deliver this to the community as well.