Article
Elasticsearch: Creating a fuzzy search-as-you-type feature
Tutorial: How to Create a Fuzzy Search-as-you-type Feature with Elasticsearch and Django
Recently, I had to figure out how to implement a fuzzy search-as-you-type feature for one of our Django web APIs. I couldn’t find any comprehensive tutorial on how to build this specific feature, so I decided to combine multiple sources and document the path I ended up taking.
In this tutorial, we will be using the elasticsearch-dsl library to implement fuzzy search-as-you-type functionality into a Django web app. Elasticsearch-dsl is a high-level library around elasticsearch-py, which is a low-level library for interacting with Elasticsearch.
Randall Tateishi, Django wizard at Fresh, helped me with the high-level approach to implementing this feature.
Prerequisites
Before starting this tutorial, you should already be familiar with Docker, Django, and Django Rest Framework. There are many ways you can set this all up, but this was the path I ended up taking.
I’d recommend digging through the official Elasticsearch documentation and working through the tutorials there before attempting to use elasticsearch-dsl.
Step 1: Install Elasticsearch and elasticsearch-dsl
Add the following to requirements.txt
.
requirements.txt
elasticsearch
elasticsearch-dsl
You may need to run docker-compose build
to install the packages.
Step 2: Add Elasticsearch container to your docker setup
Your docker-compose.yml
file should look something like this. When you run docker-compose up
, it should automatically pull the official Elasticsearch image and spin up an Elasticsearch server.
docker-compose.yml
services:
db:
image: postgres
environment:
- POSTGRES_USER=fresh_artichoke
- POSTGRES_PASSWORD=fresh_artichoke
- POSTGRES_DB=fresh_artichoke
web:
build: .
environment:
- ENVIRONMENT=local
env_file:
- .env
volumes:
- .:/app
ports:
- 8000:8000
depends_on:
- db
- elastic
links:
- db
elastic:
image: docker.elastic.co/elasticsearch/elasticsearch:6.1.1
ports:
- 9200:9200
- 9300:9300
expose:
- "9200"
- "9300"
Step 3: Verify the elasticsearch server is working
To do this, you can use curl, Postman, or any other http client of your choice. Hit https://127.0.0.1:9200/
with a GET request and make sure your response looks something like this:
{
"name": "mO2x_2W",
"cluster_name": "docker-cluster",
"cluster_uuid": "KPapsLdrQSiwRJvQjJaFcg",
"version": {
"number": "6.1.1",
"build_hash": "bd92e7f",
"build_date": "2017-12-17T20:23:25.338Z",
"build_snapshot": false,
"lucene_version": "7.1.0",
"minimum_wire_compatibility_version": "5.6.0",
"minimum_index_compatibility_version": "5.0.0"
},
"tagline": "You Know, for Search"
}
If you see this, it means your Elasticsearch instance is up and running.
Step 4: Define a DocType for your model
For the purposes of this tutorial, assume you already have a model named Skill
. Here, we will define a DocType for your Skill
model. DocType is an elasticsearch-dsl abstraction for defining your Elasticsearch mappings. (A mapping is a way to define how your data should be indexed and how the search should behave.)
First we create an analyzer
that tells us how we want the name
field to be analyzed when it is indexed and searched. In this case, the edge_ngram
option gives us the fuzziness factor, so we will still get back relevant results even when there is a typo. For more details on how that all works, check out the Elasticsearch docs.
The using='art'
meta specifies the Elasticsearch connection we are using, which we haven’t defined yet.
skills/doc_type.py
from elasticsearch_dsl import DocType, Text, Integer, Completion, analyzer, tokenizer
my_analyzer = analyzer('my_analyzer',
tokenizer=tokenizer('trigram', 'edge_ngram', min_gram=1, max_gram=20),
filter=['lowercase']
)
class SkillDoc(DocType):
name = Text(
analyzer=my_analyzer
)
id = Integer()
class Meta:
index = 'skill'
using = 'art'
In our model, we add an indexing
instance method that adds the object instance to the Elasticsearch index via the DocType we just created. I borrowed the idea from this article.
skills/models.py
from django.db import models
from elasticsearch_dsl import Index
from .doc_type import SkillDoc
class Skill(models.Model):
name = models.CharField(max_length=30)
def __str__(self):
return self.name
class Meta:
ordering = ('name',)
def indexing(self):
doc = SkillDoc(
meta={'id': self.id},
name=self.name,
id=self.id
)
doc.save()
return doc.to_dict(include_meta=True)
Step 5: Set up signal to update index whenever object is saved
We create a signals.py
file where we define a post save hook to update the index whenever an instance is saved.
skills/signals.py
from django.db.models.signals import post_save
from django.dispatch import receiver
from .models import Skill
from .doc_type import SkillDoc
@receiver(post_save, sender=Skill)
def my_handler(sender, instance, **kwargs):
instance.indexing()
In the app ready method, we import the signals and then create the connection to Elasticsearch. We give our connection an alias of art
, which we can reference from other parts of our app. We also wrap our connection code in try block in case the connection fails.
skills/apps.py
from django.apps import AppConfig
from elasticsearch_dsl import connections
from django.conf import settings
class SkillsConfig(AppConfig):
name = 'skills'
def ready(self):
import skills.signals
try:
connections.create_connection(
'art',
hosts=[{'host': settings.ES_HOST, 'port': settings.ES_PORT}])
except Exception as e:
print(e)
Don’t forget this line in __init__.py
, or else the signals won’t be properly loaded.
skills/init.py
default_app_config = 'skills.apps.SkillsConfig'
Step 6: Write a management command to index data
The next step is to write a management command that will create an Elasticsearch index and then do a bulk indexing of your data into that index.
skills/management/commands/index_skills.py
import time
import os
from django.conf import settings
from django.core.management.base import BaseCommand, CommandError
from elasticsearch_dsl import Search, Index, connections
from elasticsearch.helpers import bulk
from elasticsearch import Elasticsearch
from skills.models import Skill
from skills.doc_type import SkillDoc
class Command(BaseCommand):
help = 'Indexes Skills in Elastic Search'
def handle(self, *args, **options):
es = Elasticsearch(
[{'host': settings.ES_HOST, 'port': settings.ES_PORT}],
index="skill"
)
skill_index = Index('skill', using='art')
skill_index.doc_type(SkillDoc)
if skill_index.exists():
skill_index.delete()
print('Deleted skill index.')
SkillDoc.init()
result = bulk(
client=es,
actions=(skill.indexing() for skill in Skill.objects.all().iterator())
)
print('Indexed skills.')
print(result)
Make sure you set the correct environment variables for Elasticsearch.
.env
ELASTIC_SEARCH_HOST=elastic
ELASTIC_SEARCH_PORT=9200
settings.py
import os
ES_HOST = os.environ.get('ES_HOST')
ES_PORT = os.environ.get('ES_PORT')
Next, you will want to “ssh” into your docker container. To do that, run this command docker ps
to see a list of your running containers. Then find your container’s name and then run docker exec -it name_of_your_container bash
. After that you can run python manage.py index_skills
to run the management command.
Step 7: Verify the search endpoint is working
Now you can make a POST request to https://127.0.0.1:9200/skill/_search
with a body of:
{
"query": {
"match": {
"name": {
"query": "anglar",
"max_expansions": 3
}
}
}
}
As you can see, we purposely included a typo in the query parameter, and the search will still return the best results it can find. You can also test this by adding one letter at a time to your query parameter. For example, a
, an
, ang
, etc. to see more precise results as you “type.”
According to the docs, max_expansions
is the maximum number of terms that the query will expand to.
Step 8: Create a Django endpoint to return Elasticsearch results
Create a view to make a request to Elasticsearch based on the query param that was passed through. (This code assumes you already have a serializer set up for your model. If not, first follow the documentation for Django Rest Framework.)
skills/views.py
import json
import os
from rest_framework.response import Response
from rest_framework.views import APIView
from elasticsearch_dsl import connections
import django_filters.rest_framework
from .models import Skill
from .serializers import SkillSerializer
from .doc_type import SkillDoc
class SkillSearchView(APIView):
def get(self, request):
query = request.query_params.get('q')
ids = []
if query:
try:
s = SkillDoc.search()
s = s.query('match', name=query)
response = s.execute()
response_dict = response.to_dict()
hits = response_dict['hits']['hits']
ids = [hit['_source']['id'] for hit in hits]
queryset = Skill.objects.filter(id__in=ids)
skill_list = list(queryset)
skill_list.sort(key=lambda skill: ids.index(skill.id))
serializer = SkillSerializer(skill_list, many=True)
except Exception as e:
skills = Skill.objects.filter(name__icontains=query)
serializer = SkillSerializer(skills, many=True)
return Response(serializer.data)
The code makes a search request to the Elasticsearch index, which returns a list of documents sorted by best match. First it parses for the ids of the objects, then it makes an “in” query for all the skills that match the ids in the list.
However, Django ORM returns the results in a different order, so we’ll have to reorder them with a sort based on the original ordering of the ids.
We also wrap the Elasticsearch query in a try block and if it fails we fall back to a standard Django ORM query.
Next, register the route for that view.
urls.py
...
from skills import views as skills_api
...
urlpatterns = [
...
url(r'^api/skills_search/', skills_api.SkillSearchView.as_view()),
...
]
Step 9: Test your Django API endpoint
Make a GET request to https://127.0.0.1:8000/api/skills_search/?q=angular
. This should return a list of objects sorted by most relevant match.
I hope this tutorial helped you to get a fuzzy search-as-you-type functionality going for your Django web API! Are there any tips you’d add?