Can Computers Understand Legal Texts and What Could This Mean for the Banking Industry?
Where we used to hear about FinTech, now all we hear about is RegTech, the management of regulatory processes within the financial industry through technology. With the evolution of RegTech, banks’ compliance teams are seeing real developments in the technology that helps them to not only monitor regulations but also report and stay compliant.
As a firm in the RegTech space, Governor Software has been working with the FCA on a project to digitize the FCA Handbook, providing firms with their own live actionable version of the FCA Handbook that they can use in-house to map, track action and report regulatory compliance. (learn more) The FCA Handbook, however, is just one of the RegTech projects that Governor Software are currently involved in. Working with Selja Seppälä to help computers understand legal text is another.
Can Computers Understand Legal Texts and What Could This Mean?
Selja Seppälä, PhD, is a Marie Skłodowska-Curie Career-FIT Fellow at University College Cork. Her research project, launched in February 2019, investigates natural language processing (NLP) methods with an objective to develop a computer-assisted definition authoring and formalisation system for legal experts particularly within the banking industry. In other words, she is working to devise how RegTech systems can automatically read and understand the definitions of legal terms in regulatory texts to help banks report their data properly so that the regulators can see that there is no fraudulent activities going on. Governor Software are pleased to be working with Selja on this project along with Enterprise Ireland and the European Marie Curie Actions.
Selja recently gave a public talk about the project at the 2019 Pint of Science festival. Below is a transcript of that talk.
How to Teach RoboCop the Law
Dr Selja Seppälä (Marie Curie Career-FIT Fellow, UCC)
How do you teach a computer the meaning of legal terms needed to understand the law? RegTech, as in "regulatory technology", develops systems to extract meaning from tens of thousands of pages of financial regulation. Such systems are in high demand by banks to automate and reduce the costs of checking their compliance with financial regulations. We show how RegTech systems can automatically read and understand definitions of legal terms in regulatory texts. This new knowledge will improve the way systems interpret legal rules and identify the people, processes, and products that break them.
Hello everyone! I’m Selja Seppälä and I’m a Researcher at UCC. I’m very excited to be here to talk about my research and, especially, about my idea for a RoboCop sequel called…“RoboCop 4.0: RoboLawyer”, which obviously should be a movie starring me. Why? Because I’m a Marie Curie Fellow, so I qualify to be the film’s science person, and my background is in computational linguistics, terminology, and applied ontology. Meaning: terminology is the study of specialized vocabularies and dictionaries; applied ontology is the use of more formal versions of domain-specific vocabularies in computers. I’ll discuss later.
So, to clarify, in my research, I use natural language processing methods, also known as NLP methods, to process written language and help computers understand legal texts. In simple terms, I find ways to automatically analyse the content of natural language definitions so that it can be represented in a computer-understandable format.
Why is this so important to my RoboCop movie? To help paint this picture, let me set the backdrop to my movie. The film is set in the aftermath of the 2008 financial crisis, after the global economy collapsed, changing the lives of millions. The question on everyone’s lips is “How can the Governments put this right?” And to make this a RoboCop movie – an idea is hatched where the Governments decide the answer is RoboCop and they meet with RoboCop to see if he really can help.
The dialogue begins:
“RoboCop, you’re great. We love what you’re doing. Shooting everyone until all the crime goes away. It’s super effective. But, as you know, we have a new problem with the financial crisis, and we were thinking maybe you could try stopping these financial crises from happening again?” …. “Now RoboCop, here is what we are doing. We know that this financial crisis was caused by obscure financial transactions in the front office of investment banks, so, we have started writing a lot more laws to try and stop all this funny business in investment banks from ever happening again. While we did this, we found out that international finance law is really complicated, and the armies of lawyers (who, believe it or not, still work with pen and paper) are really expensive, so we came to a conclusion, we need your help, we need your supercomputing powers to process the tens of thousands of pages of legal documents for us. Can you help us? Would you help us?”
Obviously Robocop says “Yeah, sure” – Why? Because his third Prime Directive is to “Uphold the Law”, which is like one of his main deals. He then says, “But can I solve this problem by shooting people, like I usually do?” Mmm, awkward pause. The Governments have to say “Please no! But we have a plan, we can get someone to teach you to understand all these financial laws, and you can use your robot brain to make sure the banks are following them.”
“But who? Who can do that?” asks RoboCop.
And out steps…. Me, Selja Seppälä, your local computational linguistics, terminology, and applied ontology researcher. And I say “Wassup RoboCop! Any questions?” Robocop “Several… First of all, what’s a bank? Are they good guys or bad guys?” So, I respond “Well, it’s not that easy. What you need to know is that a bank is “a financial institution that accepts deposits and channels the money into lending activities” (WordNet 3.1).”
“I’m not sure I understand that…” says RoboCop.
I look at him and ask “But how can you not understand what a bank is? Do you understand the words I’m saying right now? Aren’t you part man?” And RoboCop curtly lets me know “Yes of course, but I’m also part machine. And I am all cop.”
And this is the scene in RoboCop 4.0 where I realise that I’m going to have to make these definitions understandable to both his human part AND his machine part.
But language is messy and filled with ambiguities. For example, the word “bank” can be a noun that means at least two different things. It can denote a riverbank, on the left, and a financial bank, on the right (that’s the European Central Bank in Frankfurt). To add to the confusion, “bank” can also be a verb… Actually, several verbs!
All these linguistic complexities have to be explained and disambiguated to the computer, which is going to be tricky, because, languages are messy. Translating that kind of messiness to a machine is going to be very difficult indeed. Fortunately for RoboCop, I have a PhD in terminology and computational linguistics. I work in natural language processing applied to legal informatics and the emerging field of regulatory technology, where the most advanced systems use artificial intelligence techniques to extract meaning from hundreds of thousands of pages of legal and other documents.
In my current research project, I am developing new software to complement existing ones. My software is called RegDef, for regulatory definitions. Its aim is to help machines read and understand the meaning of terms that are defined in legal texts. This means I can show you how to develop a system that will allow RoboCop to automatically parse definitions and understand them. For example, here is the definition of a bank: “a financial institution that accepts deposits and channels the money into lending activities” (WordNet 3.1) It’s not the only definition, but it’s the one we’re using. I need to explain to RoboCop how he is to read it.
So How Do I Do That?
First, I break the definition into its canonical parts. The first part is called the genus, it tells us the type of a thing a bank is, which in this case is a financial institution. And the distinguishing characteristics of the financial institution that make it a bank are called the differentiae.
But RoboCop says “I’m still not getting it; can you break it down further?”
So, we are now moving further and further away from human-understandable, but we need to move even further to make it machine-understandable. These parts of definitions need to be analysed further into smaller meaningful parts. This is where I use NLP methods to automatically analyse the English text and identify these smaller parts. This means identifying:
- terms (in red), usually nouns composed of one or more words,
- and relations (in orange), usually verbs, with or without a preposition.
Leaving us with a series of terms and relations.
We now need to link these parts to more descriptive, machine-understandable representations, kind of labels, that tell RoboCop how these smaller parts relate to each other and to other labels. See the relations in lowercase and what are called “classes” in uppercase?
In this example, the genus is mapped to the label < BANK is a FINANCIAL INSTITUTION>. The label adds semantic information that is implicit in the genus, see the is_a relation. The last part is mapped to two distinct labels, which shows how this level of analysis is more detailed than the previous one.
The elements in these descriptive labels (that is, the classes and the relations) can come from an ontology. Ontologies are computer artefacts built by knowledge engineers or otologists who compile these semantic units into a network of classes and relations. Ontologies are used for formally modelling domains of knowledge to allow computers to engage in reasoning tasks.
In our case, the labels have to be compatible with RoboCop’s own knowledge representation format or his ontology. (Let’s say the one we see here.)
Once these definitions have been cut up into pieces and labeled with an ontology, they can be represented, for example, as a mini graph. Then the mini ontology graph can be linked to the larger ontology graph that is already in RoboCop’s system.
With my program, RoboCop can learn new definitions and link them to his ontology to access a big web of networked terms and definitions. And now the definition is machine understandable. So, this is how I will help RoboCop become a lawyer, by adding legal knowledge from definitions to his knowledge of the world.
RoboCop says “So that’s what a bank is. Thank you Dr. Seppälä.”
“No problem RoboCop, any more questions?”
And of course, there are many more – RoboCop asks: “What are interest, premium, provision, property, services, security, performance, contract, loss, non-performance of contract, without prejudice, investment, sum, payment, credit union? Oh yeah, and what is money?”
You think I would have to be here teaching RoboCop for a while, but no! I have RegDef, allowing me to run the program on all the definitions at once and get their meaning really fast! (note the research project covers the NLP part of the understanding process and excludes the ontology labelling part, therefore this statement only applies to the fictional world of RoboCop 4)
So, it’s actually a pretty short movie, which allows me and RoboCop to save the day. RoboCop is a little disappointed however, as nothing explodes, and no-one gets shot. Definitely a big departure for the series, but still better than RoboCop 3. In fact, in the final scene of RoboCop 4.0, Robocop has graduated in computational law and has changed careers and become RoboLawyer. The job involves a whole lot less shooting but a whole lot more money!
The movie’s disclaimer:
- No lawyers or their jobs were harmed in the making of this presentation.
- These AI improvements are meant to unburden them from tedious automatable tasks.
- Lawyers will get to focus on the interesting legal issues instead.
And notes of thanks:
- Professor Tom Butler and Richard Pike, CEO, Governor Software, my mentors
- Leona O’Brien, my advisor in financial law
- Cathal O’Donovan, who helped with this storyline
And, of course, none of this would be possible without the generous support of the European Marie Curie Actions, Enterprise Ireland, and Governor Software, the RegTech company that backs my research project.
About: Selja Seppälä, PhD; Marie Skłodowska-Curie Career-FIT Research Fellow; University College Cork
Selja is a Marie Skłodowska-Curie Career-FIT Fellow at University College Cork (UCC), Ireland, investigating natural language processing (NLP) methods to develop a computer-assisted definition authoring and formalisation system for legal experts (RegDef). She holds a PhD in Multilingual Information Processing from the University of Geneva, Switzerland, and has previously conducted postdoctoral research in the areas of NLP for regulatory technologies in the financial industry at the Governance, Risk and Compliance Technology Centre (GRCTC) at UCC (2017-2019); and of applied ontology and biomedical informatics in the Department of Health Outcomes and Policy at the University of Florida (2016-2017) and the Departments of Philosophy and Biomedical Informatics at the State University of New York at Buffalo (UB) (2012-1016), where her research was funded from 2012 to 2015 by the Swiss National Science Foundation (SNSF).
Selja’s interdisciplinary research aims at laying the groundwork for developing computer-assisted natural language definition writing tools leveraging ontological data. Such tools are intended to be used by terminologists and ontologists, as well as domain experts such as scientists and lawyers. Her research focuses on the automation of definition production, editing, and checking in dictionaries, legal documents, technical manuals, and ontologies. This work includes research on ontology mapping and versioning, mapping of lexico-semantic resources to ontologies, as well as the use of corpus linguistics and NLP techniques and methods to process data. Her primary domains of expertise are NLP, terminology, and applied ontology.
Pint of Science is a non-profit organisation that brings some of the most brilliant scientists to your local pubs and cafes to discuss their latest research and findings with you. You don't need any prior knowledge, and this is your chance to meet the people responsible for the future of science (and have a drink with them). Our festival runs over a few days in May every year, but we occasionally run events during other months.
In 2012 Dr Michael Motskin and Dr Praveen Paul were two research scientists at Imperial College London in the UK. They started and organised an event called ‘Meet the Researchers’. It brought people affected by Parkinson’s, Alzheimer’s, motor neurone disease and multiple sclerosis into their labs to show them the kind of research they do. It was inspirational for both visitors and researchers. They thought if people want to come into labs to meet scientists, why not bring the scientists out to the people? And so Pint of Science was born. In May 2013 they held the first Pint of Science festival in just three UK cities. It quickly took off around the world and is now in nearly 300 cities.