In part 1 and 2 of this series, we found out that performing a text search on listings within a specific area is problematic, because you cannot use two indexes simultaneously. I've found 5 different solutions to work around this problem and will discuss them here.


Solution 1: just don't

Firstly, consider whether or not you even really need this feature. Because when I was doing research online, I found out that our use case is pretty unique.

It is very rare that an application needs to let the user perform a text search and a geospatial search at the same time. It's usually either/or.

Either you're looking for something specific, close to your location, like a hotel, in which case you can use a few filters (e.g. 2 bedrooms + swimming pool) to narrow down the search; OR you're performing an actual fuzzy text search and location is largely irrelevant. Think about it:

  • Airbnb: location + filters, no text search
  • Amazon: text search + filters, no location
  • Zillow: location + filters, no text search
  • Netflix: text search + filters, no location
  • Gym locator: location + filters, no text search
  • Udemy: text search + filters, no location

There are exceptions here, of course, but the feature is not as common as you might think. Usually, a geospatial search does not have to be combined with a full text search.

So the first solution you should consider is if you could get away with doing a location search providing filters, similar to how AirBnb does it. Chances are, you don't even need to combine text and geospatial search and a few clever filter options will do the trick.

Solution 2: $geoNear + regex

Another solution is to keep using the indexed (fast) $geoNear operator, but instead of using $text, you filter the result set with a regular expression regex in the query, like so:


Listing.aggregate([
    {
      $geoNear: {
        near: {
          type: 'Point',
          coordinates: coordinates
        },
        distanceField: "dist.calculated",
        maxDistance: parseFloat(req.query.distance),
        query: {
            $or: [
                {title: regex},
                {description: regex}
              ],
              price: { $lt: pricemax}
        },
        spherical: true
      }
    }
  ])
  

This works pretty well, but the downside here is that you can not use the magic of the $text search operator, such as stemming (baking --> bake) and removal of irrelevant words (e.g. 'the').

You can only search for what a user literally types in; and that also means that it’s going to require extra effort to figure out how to deal with multiple words. This could turn out to be a problem, because so-called longtail searches are becoming more and more common.

Solution 3: $text + $geoWithin

Another solution is to put the search magic of $text first, and then filter the results using the $geoWithin operator. This works, because the $geoWithin operator does not require an index.


Listing.find({ 
    $text: { 
         $search: "bake coffee cake" 
     },
     geometry: {
        $geoWithin: {
            $centerSphere: [coordinates, distance]
        }
    }    
 })

Upside here is text search is going to work very well; because it will use Mongo’s full-text-search technology.

Downside is that not using an index on the geospatial search makes the query relatively slow and inefficient; because it is going to have to go through each and every document that matches the text search to see which of those are within range of the specified coordinates.

So to decide if this is the right option for you, you have to decide which part of the query will narrow down the results more, the text search or the geospatial.

Solution 4: server-side search library

The fourth possible solution is very similar to solution 2. You keep using the indexed (fast) $geoNear operator, but instead of using $text or a regular expression regex in the query, you use an external search library to perform a text search through the results, right on the server, before returning the results.

Because I work in NodeJS on the backend, I am looking at javascript search libraries, such as FuseJS. This lightweight fuzzy search library makes searching an array of objects incredibly easy:


let options = {
  keys: ['title', 'description'],
  id: 'id'
}
let fuse = new Fuse(listings, options)

fuse.search("bake coffee cake")

The upside of this solution is that you can keep using the very efficient $geoNear operator, and also implement a 'smarter' search functionality than a mere regular expression. This search library in particular will actually return results according to relevance, which is something you can't do with regex.

Downside is that you are performing search operations on the server, instead of in the database itself, which is not the fastest or most efficient solution.

Solution 5: search library in frontend

Finally, there is also the possibility to implement the search functionality in the frontend of the application. Just like with solution 4, you also keep using the indexed (fast) $geoNear operator, and use the external search library to perform a text search, instead of using $text or a regular expression regex in the query.

Downside of doing this in the frontend is that more data has to be processed and sent, seeing that all the listings that meet the geospatial query have to be shipped to the frontend, before they can be searched. This will increase initial load time.

The upside of this solution, though, is that after the initial listings (those within range of the user) are loaded, many searches can be performed without having to reload any data. This could turn out to be more efficient and yield a better UX in the end.


Conclusion:

Personally, I ultimately decided to implement solution 5. This does mean that we initially load all the listings that are within range of the user, but that's fine, because in our use case it is not much of a problem if users initially see a lot of things that they weren't necessarily looking for. In fact, it can lead to serendipity, which is a beautiful thing that is actually likely to help us grow the platform.

Your use case may be entirely different, though, which is why only you can decide which solution is best for your project. In any case, I hope that me learning out loud about this has been helpful to you. Good luck!