Behind the scenes of Semeris’ AI-driven document library products

Published 13. September 2021.

CLO documents have changed and evolved considerably over the past twenty years; have your process and toolkit evolved alongside them? Semeris Docs for CLOs brings order-of-magnitude efficiencies to an often tricky and time-consuming element of the CLO investment process.

Our mission at Semeris is to help our CLO-focused customers make sense of complex legal documents and help them make better decisions faster.

To do this, we are constantly looking for the most appropriate NLP/AI techniques. These techniques form the various pieces of our tech-led solution. We aim to hide the complexity from our customers and let them focus on their primary role.

Our value creation stands on four pillars:

  • AI/NLP data extraction
  • Ongoing R&D and Continuous Improvement
  • Tooling to enhance and optimize human curation
  • Growing document corpus

Let’s discuss each to see how they help our efforts, and in turn, let customers achieve more with Semeris solutions.

Automated AI/NLP data extraction

Our software and machine learning models help convert documents and pull out text, data, and structure from PDF documents. We are using a combination of machine learning, rule-based, and human-assisted techniques to extract hundreds of data points from each legal contract uploaded to Semeris Docs.

Even in our rule-based solutions, we are using neural language models to detect patterns that our rules might have missed, and we are using that insight to improve our rules further.

While processing more than half a million pages of CLO agreements, we detected 20k+ defined terms. Machine learning techniques are then applied to discover the meaning of these terms. Using the results, we are able to offer synonyms that might be used in other documents which help when familiar ones aren’t in the document in front of you, saving you time and helping increase your team’s knowledge base for the ever-growing lexicon of CLO defined terms.

Ongoing AI R&D

Our data science team uses general natural language processing techniques such as named entity recognition, semantic word vectors, topic modeling, similarity searches, neural language models, and more.

Each new application or product capability is matched to the most efficient, most fitting technique in our toolkit. Cooperation with University College London and the Budapest University of Technology and Economics helps our team find new ideas, new angles while also assisting students with their studies.

Proprietary tooling for fast human curation

For Semeris, an AI-firm that uses machine learning, efficient tooling for human annotation and verification is critical for our mission. Finance applications require (close to) 100% accuracy in their data. Our specialists use our in-house document curation and data verification solution in their deal curation work.

At Semeris, we keep improving these solutions to speed up data gathering, which leads us to the final point.

Growing document corpus for machine learning

The more data we have, the better positioned we are to build high-quality question answering and other data extraction models. Our document libraries are reaching 1 million pages of legal documents in the CLO sector, with 20k+ analyzed legal terms covering more than half a trillion words.

More data leads to better AI/ML results in the long run, and continuous improvement is a big part of our guiding principles at Semeris

Want to learn more? Contact us to schedule your demo today and up your game in CLO deal document review.