“I work for the tool instead of the tool working for me”
The main drawback of supervised learning is that it often relies on large data sets that need to be cleaned and labelled by humans in order to train the models. Very often, it implies pre-work from the customers before they can actually use the tool. In the case of e-discovery, it is not rare to see entire legal teams spending days working on the labeling of the models to achieve a specific task. Worse, during the dispute resolution process, new tasks may surface which require replicating the same fastidious process. This has raised some frustration and one customer summed it up very well: “I work for the tool instead of the tool working for me”. Moreover each person, labeling the data introduces his own biases to the final outcome and thus the work of less experienced people, usually involved in this process, forms the basis of the final result.
As you translate the above in terms of cost, you quickly understand why such technology has been mostly used by law firms with deep pockets. This also means there is a tremendous opportunity to come up with a different approach.
“In many document reviews, having the different patterns and concepts automatically identified by the algorithms is highly valuable”
Self supervised learning is suitable for exploring unknown data. As in the human education process, generally we are not taught to only solve a particular task for the whole life, based on a predefined algorithm, we are studying the ways how to use our knowledge to adapt available skills for unknown situations. Same holds for textual information: humans learn how to read generic text and then to extract some facts from it. Based on specific documentation, like the legal documents, we can learn by our self how it is organised and where to search particular information. Why should the machine behave differently? Let’s learn directly from the texts, which are available and contain the knowledge we are interested in. In many document reviews, having the different patterns and concepts automatically identified by the algorithms is highly valuable. This means you start searching your data right away through contextual search. The algorithms understand the meaning or the developed concept and their related context in your search request and look for all documents in the data set that are developing the same idea even if they are using a different wording. For instance, it will help you put into perspective any counter party conclusions by copy-pasting them into your contextual search and see all documents that are related.
At EisphorIA, we compared the different methodologies (self learning and supervised learning) through multiple simulations and we analyzed the outcomes. While we were expecting the supervised models to fulfill better the dedicated tasks, we were surprised to notice the reverse in terms of relevance for clustering the information and for searching in the data set. The self supervised learning model provided much more robust and stable results .
This is exciting as this means you don’t need to invest as much as you would with supervised learning and for a larger number of law firms, it represents a great opportunity not only to catch up but actually to make a big jump in terms of technology capabilities and improve their competitiveness.
At EisphorIA, we are betting on self supervised learning and we truly believe it can completely change the experience of the lawyer to be truly assisted by technology in searching for the critical information.
Thanks to this modern methodology, the right architecture in place and an ultra modern UX/design, we can guarantee the following:
Curious to discover our solution more in detail, please contact us.