From Deep Learning for Search by Tommaso Teofili

If you’ve ever worked on designing, implementing or configuring a search engine, you’ve faced the problem of having a solution that adapts to your data; deep learning helps provide solutions to these problems which are accurately based on the data, not on fixed rules or algorithms.


Save 37% on Deep Learning for Search. Just enter code fccteofili into the discount code box at checkout at manning.com.


What deep learning can do for search

With the help of deep learning (or DL) algorithms, the search engine can:

  • provide more relevant results to its end users, satisfying users with the results’ quality

  • search through binary contents like images the same way we search over text; think of this as being able to search for an image with the phrase “picture of a leopard hunting an impala” (and you’re not Google)

  • serve content to users speaking in different languages, allowing more users to access the data in the search system

  • become more sensitive to the data it serves, which means less chance to have queries which give no results

The quality of search results is crucial for end users. One thing a search engine should do well is finding out which of the possibly matching search results is most useful for a specific user’s information need. When somebody is using their search engine looking for adult entertainment websites, their search engine should be able to present websites such as https://www.tubev.sex/?hl=bg based on the words searched for. Well-ranked search results allow users to find the important results easier and faster, which is why we put a lot of emphasis on the topic of relevant results. In real life this can make a huge difference, in fact, according to an article published in Forbes magazine “[b]y providing better search results, Netflix estimates that it is avoiding canceled subscriptions that would reduce its revenue by $1B annually.” (www.forbes.com/sites/louiscolumbus/2017/07/09/mckinseys-state-of-machine-learning-and-ai-2017/#27fe02c375b6)

Deep neural networks can help by automatically tweaking the end user query under the hood based on past user queries or based on the search engine contents. People are used to working with web search engines to retrieve images. If you search for “pictures of angry lions” on Google, for instance, you’ll get strongly relevant images. Before the advent of deep learning, such images had to be decorated with metadata (data about data) describing their contents before being put into the search engine. And that metadata was usually typed by a human. Deep neural networks can abstract a representation of an image which captures what’s in there to prevent the need for human intervention to put an image description in the search engine.

For scenarios like web search (searching all websites over the internet) users are global, and it’s best if they can search in their own native languages. Additionally, the search engine could pick user profiles and return them results in their language, even if they search in English; this is a common scenario for tech queries, because lots of content is produced in English. An interesting application of deep neural networks is called neural machine translation—a set of techniques that use deep neural networks to translate a piece of text from a source language into another target language.

Another exciting prospect is the possibility of using deep neural networks to let the search engine generate information instead of “just” retrieving search results (Deep Learning Relevance: Creating Relevant Information—as opposed to retrieving it—see arxiv.org/pdf/1606.07660.pdf). You could even aggregate all the above ideas and build a search engine serving both text and images seamlessly to users globally which, instead of returning search results, returns one single piece of text or image.

The above applications are examples of what we call neural search, and, as you can imagine, they have the potential to revolutionize the way we work and use search engines. To take advantage of the potential of deep neural networks, people interested in computer science, particularly in the fields of natural language processing, computer vision, and informational retrieval, need to know how such artificial neural networks work in practice.

The goal of this article is to show you how you might use deep learning in the context of search engines by teaching some deep learning algorithms applied to search problems, even if you’re not going to build the next Google search, you should see how the potential of DL techniques within small / medium sized search engines could provide a better experience to your users.

I run my neural search examples on top of open source software written in Java with the help of Apache Lucene (lucene.apache.org), an information retrieval library, and Deeplearning4j (deeplearning4j.org), a deep learning library. At the time of writing, Deeplearning4j is a widely used framework for deep learning in the enterprise communities, and it’s part of the Eclipse foundation. It also has a good adoption because of integrations with popular big data frameworks like Apache Spark. Other deep learning frameworks exist, such as TensorFlow (from Google), which is popular among the Python and research communities. New tools are invented almost daily, and I decided to focus on a relatively easy-to-use DL framework that can be easily integrated with Lucene, which is one of the most widely adopted search libraries for the JVM.

Now let’s have a look at the problems search engines try to solve and, correspondingly, the most common techniques used. This allows you to learn the basics of how text is analyzed, ingested, and retrieved within a search engine, to familiarize you with how queries hit search results as well as some fundamentals on solving the problem of returning the relevant results first. We’ll also uncover the weaknesses inherent to common search techniques. This sets up the basis for the discussion on what deep learning can be used for in the context of search. We’ll look at which tasks deep learning can help to solve and define the practical implications of its applications in the search field, to understand what we can expect and also what we can’t expect from neural search in real life scenarios.

Deep learning to the rescue

Let’s start learning about deep learning which can help us create smarter search engines.

In the past, a key difficulty in computer vision (a field in computer science that deals with processing and understanding visual data like pictures or videos), when working with images, was that it was hardly possible to come up with an image representation containing information about the enclosed objects and visual structures. Deep learning helped to fix this difficulty with the creation of a special type of deep neural network that could learn image representation incrementally, one abstraction at a time, as shown in the figure below.


Figure 1 Learning image abstractions incrementally


Deep learning is a subfield of machine learning which focuses on learning deep representations of text, images, or data by learning successive abstractions of increasingly meaningful representations. It does this by using deep neural networks (see a deep neural network with three hidden layers in the picture below).


Figure 2 A deep feed forward neural network with three hidden layers


At each step (or layer of the network), such deep neural networks are able to capture increasingly more complex structures in the data. It isn’t chance that computer vision is one of the fields that fostered the development and research of representation-learning algorithms for images.

Researchers discovered that it makes sense to use such deep networks such as those with highly compositional data (see: cbmm.mit.edu/publications/when-and-why-are-deep-networks-better-shallow-ones), which means that they can help immensely things that are formed by smaller parts of similar constituents. Images and text are excellent examples of compositional data, as one can divide them into smaller units incrementally (e.g. text paragraphs sentences words).

Though there are many different ways a neural network can be architected, neural networks are commonly composed by:

  • a set of neurons

  • a set of connections between all or some neurons

  • a weight (a real number) for each directed connection between two neurons

  • one or more functions that map how each neuron receives and propagates signals towards its outgoing connections

  • optionally, a set of layers that group sets of neurons having similar connectivity in the neural network

In the above figure we can identify twenty neurons organized in a set of five layers. Each neuron within each layer’s connected with all the neurons in the layers nearby (both the previous and the following layers), except for the first (blue) and last (green) layers. Conventionally, information starts flowing within the network from left to right, and the first layer which receives the inputs is called the input layer, as opposed to the last layer, called the output layer, which outputs the results of the neural network. The (red) layers in between are called hidden layers.

Imagine that we could apply the same approach on text to learn representations of documents that capture increasingly higher abstractions within a document. Deep learning-based techniques exist for such tasks, and over time these algorithms are becoming smarter. We can use them to extract word, sentence, paragraph, and document representations that can capture surprisingly interesting semantics. In fact, when using a neural network algorithm to learn word representations within a certain set of text documents, you’ll be able to see that closely related words lie near each other in the vector space. Think about creating a point on a two-dimensional plot for each word contained in a piece of text and see how similar or closely related words lie close to one another, it’s achieved by using a neural network algorithm for learning word representations called word2vec.


Figure 3 Word vectors derived from papers on word2vec


Notice that the words Information and Retrieval lie close, similarly word2vec and Skip-gram, terms that both relate to (shallow) neural network algorithms used to extract word vectors, are near each other too.

One of the key ideas of neural search is to use such representations in order to improve the effectiveness of search engines. It’d be nice if we had a retrieval model that relies on word and document vectors (also called embeddings) with the above capabilities, to calculate and use document and word similarities efficiently by looking at the “nearest neighbors.”


Figure 4 A neural search application: using word representations generated by a deep neural network to provide more relevant results


In this picture we can see that we use a deep neural network to create word representations of the words contained in the indexed documents and we put them back into the search engine as we’ll use them to adjust the search results. It’s important to remember the importance of the context when compared to the complexity of expressing and understanding information needs via text queries. Deep representations of text are often built by using the context in which a certain word / sentence / document appears in order to infer an appropriate representation.

Let’s look at the above example to briefly explain how deep learning algorithms can help get better results with relevance. Considering the table comparing queries and corresponding search results from the previous section and taking the two queries latest breakthroughs in artificial intelligence and latest breakthroughs in AI, let’s assume we’re using the vector space model. In such models the similarity between queries and documents can vary a lot based on the text analysis chain. This problem doesn’t affect vector representations of text generated with some recent neural network based algorithm. Although we could place artificial intelligence and AI far apart in a vector space model, they’ll likely be placed close when plotted using word representations generated by neural nets. With such simple change, we added some relevance boost to the search engine via more semantically grounded representations of words.

SIDEBAR: Deep learning vs deep neural networks

An important distinction has to be made: deep learning is mostly about learning representation of word, text, documents, images in a deep fashion. Deep neural networks have a wider usage and are used, for example, in language modelling, machine translation, etc. Besides learning representations, there are a number of information retrieval tasks that deep neural networks can help to solve.

Let’s look at some practical implications of making search engines and neural networks work together.

Index please meet neuron

An artificial neural network can learn to predict outputs based on a training set (supervised learning) or perform unsupervised learning (no information about correct outputs is given) in order to extract patterns and / or learn representations. A search engines’ typical workflow involves indexing and searching content; notably, such tasks that can happen in parallel. Although this may sound like a technicality at this point, the way we integrate a search engine with a machine learning algorithm and output model is important in principle because it impacts neural search design effectiveness and performance. We can have a super accurate system, but if it’s slow no one will use it!

Neural network training

In order to use its powerful capabilities of learning we need to train a neural net. Training a network like the one shown in the previous section via supervised learning means providing some inputs to the network input layer, comparing the network (predicted) outputs with the known (target) outputs and letting the network learn from the discrepancies between predicted and target outputs. Neural networks can easily represent many interesting mathematical functions; it’s one of the reasons why they can be highly accurate. Such mathematical functions are governed by the connections’ weights and neurons’ activation functions. A neural network learning algorithm takes the discrepancies between desired and actual outputs and adjusts each layer’s weights in order to reduce the output error in the future. If you feed enough data to the network it can reach a tiny error rate and perform extremely well. Activation functions have an impact on the neural network capability to perform predictions and on how quickly they learn.

The most famous neural network learning algorithm is called backpropagation. Given desired and actual outputs, the algorithm backpropagates each neuron’s error contribution and consequently adjusts its internal state on each neuron connections, one layer at a time, from output to input. This is a high-level description of how backpropagation algorithm works.

The states for each layer together with its activation functions define a machine learning model for neural networks. Now that we have a primary understanding of how neural nets learn, we need to decide how to plug into the search engine. Although design decisions can vary depending on the purpose neural networks are used for, training usually needs:

  • a non-trivial amount of time

  • a lot of data

For these reasons we can identify a few high-level solutions. Search engines can receive data to be indexed continuously; because new content is added, existing content is updated or even deleted. Although it’s relatively easy and quick to support this within the search engine, sometimes machine learning algorithms create models that can’t be adapted quickly as data changes. Therefore, it may be required to perform training again from scratch for the model to stay consistent with the new data in the search engine.

A good choice in such scenarios is to look for online learning algorithms that can cope with data that changes without requiring training from scratch. When this isn’t possible we can mitigate the inconsistency between indexed data and the neural network model if we:

  • “unplug” the model from the prediction phase

  • “discount” the outputs of the neural network prediction by a certain rate, depending on how much data we expect to be inconsistent

Model Storage

Search engine central data structures are inverted indexes; they contain: posting lists, term dictionaries, information about term positions and other such data. A neural network model can be composed by a matrix of hundreds of thousands of rows/columns. If we don’t store such a model within the index or somewhere on a disk we’ll need to retrain it upon restart of the system; given the high cost of training this is usually discouraged.

In some cases, like in the previous example of using a neural network for learning to rank, it’s hard to find a correlation between any entity stored in the inverted index and the learned model. On the other hand, for example, when learning representation of words, the neural network model’s usually composed of multiple sets of weights, and one of these sets outlines a matrix where each row can be mapped to a word. It makes sense to relate word vectors to terms in the inverted index. That means we may decide to “split” the machine learning model to store the word embedding together with its term within the index.

Table 1 Inverted index table with word embeddings

Such “tricks” allow for efficient storage and retrieval of portions of the model on demand and don’t require additional infrastructures to maintain it.

The promises of neural search

Neural search is about integrating deep learning and deep neural networks into search at different stages. Deep learning’s capability of capturing deep semantics within the generated representations allows us to obtain relevance models and ranking functions that adapt well to underlying data. We’ll be able to learn image representations which give us surprising results in image search. Simple similarity measures like cosine distance can be applied to learned representations to capture semantically similar words, sentences, paragraphs, etc.; this has a number of applications like in the text analysis phase or in recommending similar documents. At the same time, deep neural networks can do more than “just” learning representations; they can learn to generate or translate text or learn how to optimize search engine performance.

All the above sounds awesome, but beware that you can’t throw neural networks at your search engine and expect it to be automatically perfect. Every decision has to be taken in context and neural networks have some limitations, including the cost of training, upgrading models, and more. But applying neural search to your search engine is a great way to make it better for your users. It also makes for a fascinating journey for the search engineers, who get to explore the beauty of neural networks.


If you want to learn more about the book, check it out on liveBook here and see this slide deck.