Daniele Tassone
Backend software engineer / Tech Lead

Legal Clause library with MongoDB Atlas Search

31/05/2023
10 min
Text Link

Introduction

At Genie, we handle a significant number of legal contracts, which are comprised of various clauses. As part of our commitment to open-sourcing the law, we recently made our Clause Library available to the public. In this blog post, we will talk about how MongoDB Atlas Search played a crucial role in enabling us to deliver a great user experience, specifically when it comes to finding the most suitable clause for a legal contract. We'll discuss the benefits and features of MongoDB Atlas Search that empowered us to achieve a five-star user experience and (not last!) a great dev experience!

What is a legal clause?

... but what is a legal clause and what is a clause library?

A legal clause is a part or provision within a legal document that outlines the rights, responsibilities, and limitations of the parties concerned. For a software engineer it is just "text" with "metadata/fields".

So, for instance, an NDA legal contract might contain these clauses:

  1. keep the Confidential Information secret and confidential;
  2. not use or exploit the Confidential Information in any way except for the Purpose;

What is GenieAI Clause Library?

GenieAI Clause Library (from an engineering perspective) looks like an "NPM" registry, comprising multiple libraries.
But here you don't have libraries but "legal clauses".

A Clause Library is a collection of clauses - namely, regulations or conditions that govern the actions of the agreement's participants.

So what is a clause library for?
In the legal sector, a long-standing challenge has been the inconsistency in the drafting of clauses, such as those relating to "Confidentiality Obligations". Thanks to GenieAI Clause Library we want to provide the community with a complete list of clauses that solve specific issues.

With a vast database of clauses, it becomes crucial to ensure easy accessibility and discoverability for users. This requires:

1. Designing a flexible and adaptable database schema that accommodates various metadata associated with each clause, such as sector, contract types, and more.
2. Implementing a robust search engine system to facilitate efficient clause retrieval.

Let's see what solution we implemented.

Data structure using Agile approach

When it comes to database modeling, a successful approach we tend to use is to start by analysing the workload related to our project rather than the relationship between entities. Apparently, this sounds like nonsense because as a software engineer you want to have a clear understanding of your data relationship. However, modeling a MongoDB database from the relationship could lead to a database schema that won't work well either in terms of simplicity or performance.

So the approach we used was to first analyse the workload: how many read and write operations should we have?

Workload analysis:

1. On the main page a Clause must be searchable based on many criteria - this means we have a lot of reads on the Clause.
2. On some other sub-pages, a Clause needs to be grouped by one of its metadata - it might be a slow read operation due to a $group stage
3. We can insert and update a clause but we don't expect to have many update operations, so we don't need to worry about it

Based on point 1, it was clear that we needed 1 collection of "Clauses" and each clause exposes a lot of metadata for our search engine. But we weren't sure how to fix point 2 because the $group stage could be a bottleneck, and our UI required a lot of group operations. Considering we could have ended up with +100 groups for each page view, $group seemed not ideal for the job.

Here a detail of our page with many $group stages:

At the same time, we were not sure if it was worth optimising the project at that stage.

In the end, we decided to create a second collection called "Issues" that is a materialised view containing "Clauses already grouped" (*)
This design pattern is powerful, easy to implement and easy to maintain so it didn't sound as a premature scaling.

Even if ... yes, we have data duplication across 2 collections but thanks to our Event-Driven backend we can easily keep the data in sync

(*) MongoDB design patterns are well described as part of the MongoDB Certification. If you want to read my experience with it, read my blog post here.

Well, now that our collections Clauses and Issues were created we started implementing our search engine!

The search engine: MongoDB Atlas Search

MongoDB Atlas Search is a powerful SaaS solution that enhances the capabilities of MongoDB by providing full-text search functionality. Built on top of Lucene, Atlas Search offers a range of features that greatly improve the developer experience and enable efficient searching within MongoDB databases.

Here are the key features that have been utilized:

1. Autocomplete: Atlas Search enables the implementation of autocomplete functionality, making it easier for users to find relevant legal clauses.

2. Full-text search: With Atlas Search, the Clause Library benefits from robust full-text search capabilities.

3. Facets: Facets provide a powerful way to refine search results by allowing users to filter and narrow down their searches based on various attributes or metadata associated with the clauses.

By leveraging these features of MongoDB Atlas Search, GenieAI has significantly improved the search functionality within the Clause Library, enabling users to easily discover and access relevant legal clauses for their specific requirements.

Let's see how they work in practice!

The search engine: MongoDB Atlas Search

The first problem we faced was how to avoid metadata duplication when adding a new clause.

Let's say that a legal clause is composed of metadata (eg: "type of contract") and each user can provide new values while inserting their Clauses. How to ensure that the clause library has data consistency? How to avoid storing typos like "AND" while the user indicates it as "NDA"?

To address the challenge, we considered two solutions:

Solution 1,
involved downloading all the metadata in the frontend and providing an autocomplete-like user interface for users to select the appropriate metadata values. However, this approach posed challenges due to the large amount of metadata available, which could have potentially impacted the frontend's performance.

Solution 2,
on the other hand, focused on implementing the autocomplete feature at the backend or database level. By leveraging MongoDB Atlas Search, we were able to implement this feature seamlessly. The frontend only needed to send an input string to the backend, which would then perform the autocomplete search and return the relevant results. If the user types "AND", the backend returns it back as "NDA?".

This approach ensured data consistency and avoided storing typos errors or inconsistent metadata values in the Clause Library.
How to use autocomplete with Atlas Search?

  1. First of all, we have to declare an index for a field and mark it as autocomplete and submit it to Atlas Search. You can declare it by using an Atlas UI or a set of API (we are using Atlas API)

    
    "contractTypes": [
    {
      "type": "autocomplete",
      "analyzer": "lucene.standard",
      "tokenization": "edgeGram",
      "foldDiacritics": true
    }
  ]
    
    
    
  1. Now, we can query the data. Suppose we search for "NDA":

{
    $search: {
      index: 'clause-solutions_default',
      autocomplete: {
        query: 'NDA', // input received 
        path: 'contractTypes',
        tokenOrder: 'any'
      }
    }
}
  1. MongoDB Atlas will return a set of results like this:

Great! It works! 

But as you can see, the result doesn't look great due to some unexpected duplicate data in the result set. We could use a $group stage to eliminate duplicates but it might create unnecessary extra computation on MongoDB. Atlas Search provides an alternative approach by using facet and $searchMeta that can fix this problem.

Atlas Search - facets

Our ideal MongoDB response looks like this:
`[ { label: "NDA", label: "one-way NDA" } ]` and with `facets` we can eliminate duplicate data. Let's see how it works.


Hold on! What is a Facet? Definition:

The `facet` collector groups results by values or ranges in the specified faceted fields and returns the count for each of these groups.

In other words, MongoDB bundles the results for you. Will it work as expected?

Let's rewrite our query using `facets`:


{
 	$searchMeta: {
 		index: "clause-solutions_default",
 		facet: {
 			operator: {
 				autocomplete: {
 					query: "NDA", // query input
 					path: "contractTypes",
 					tokenOrder: "any"
 				}
 			},
 			facets: {
 				contractTypes: {
 					type: "string",
 					path: "contractTypes",
 					numBuckets: 1000
 				}
 			}
 		}
 	}
}    

and now let's run the same query again, and this is our updated result:

Hurray! We implemented our autocomplete feature using MongoDB Atlas Search and here's what it looks like in our web app:

Our clause library home page - with Facets

As a user looking for a legal clause, they can search for it using different filters depending on their needs.
The Clause Library page shows filters using a facet so that all options are grouped (by Sections, Issues, Clause Types, Sectors, etc.).

Once the user selects an option, the backend calls MongoDB Atlas Search by combining many search criterias and the interface gets updated accordingly.

Atlas Search - final thoughts

GenieAI Clause Library has provided great value to our users and we are happy with our technical decision to implement this feature using MongoDB Atlas Search.

This is a list of the benefits we've seen with MongoDB Atlas Search:

Impressive search results
- It's super fast and efficient
- You can do all kinds of searches like finding keywords and phrases, ignoring case and even fuzzy matches

Easy to use
- Works seamlessly with MongoDB, the popular NoSQL database, so you can search multiple fields, nested documents, arrays, and even geospatial data
- The best part is that it's easy to use. You don't have to worry about managing infrastructure or updates because it's fully managed by MongoDB Atlas

Built on MongoDB Aggregation Framework
- Atlas Search works well with MongoDB aggregation framework.
- This means you can perform advanced analytics and filter your search results, giving you even more insights from your data

If you want to see it in action just click here: link.

Thanks for reading it.
Daniele Tassone
Backend software engineer / Tech Lead @ GenieAI
LinkedIn

Interested in joining our team? Explore career opportunities with us and be a part of the future of Legal AI.

Related Posts

Show all