Coreference resolution refers to the task of identifying all the expressions in a text that refer to the same entity, such as pronouns, nouns, or noun phrases, and linking them to their referring entity. This post, inspired by a real world problem, describes a few challenges and explores a few approaches, along with code snippets.
First, a simple example. Given the text “John Smith lives in Singapore. He has a 2-year-old golden retriever”, the coreference resolution would identify that “He” refers to “John Smith”.
The Challenges
Real World Challenges
Listing a few challenges observed in real world publications.
Real World Challenge #1
(RWC1 was extracted from [#6])
On November 3, 1992, Clinton was elected the 42nd president of the United States, and the following year Hillary Clinton became the first lady. In 2013, he won the Presidential Medal of Freedom.
Challenge (for the model) : “he” refers to “Clinton” or “Hillary Clinton”?
Real World Challenge #2
(RWC2 was extracted from this news article published by The Straits Times)
While it is normal for defendants charged with felonies to be handcuffed – as former Trump Organisation chief financial officer Allen Weisselberg was in 2021 – one of Trump’s lawyers, Mr Joseph Tacopina, has said he does not expect that to occur.
Challenge (for the model) : “he” refers to “Joseph Tacopina”, “Trump” or “Allen Weisselberg”?
Real World Challenge #3
(RWC3 was extracted from this news article published by The Straits Times)
Mr Li, who was then the Shanghai chief, said PM Lee had gone to great lengths to discuss “people’s well-being and how to deliver tangible benefits to the people”. “I was impressed by our conversation. You also talked about cultural diversity and inclusiveness,” he told PM Lee.
Challenge (for the model) : “he” refers to “PM Lee” or “Mr Li”? “I” refers to “Mr Li” or “PM Lee”?
Winograd Schema Challenge
The Winograd Schema Challenge (WSC) is a test that assesses a system’s capability to perform common sense reasoning. It also serves as an alternative to the Turing Test. A Winograd schema is a set of two sentences that have only one or two different words, but a highly ambiguous pronoun. The pronoun has different meanings in the two sentences, and it requires commonsense knowledge to resolve it correctly. The examples were created to be easy for humans to solve but challenging for machines, as they need to understand the text’s context and the situation it describes on a deeper level.
A few of Winograd Schema (from [#8]) :
WSC02 : “The trophy does not fit into the brown suitcase because it is too large.”
Question asked : What is too large? the trophy or the suitcase? visualize this
WSC06 : “The delivery truck zoomed by the school bus because it was going so slow.”
Question asked : What was going so slow? the delivery truck or the school bus? visualize this
WSC10 : “John couldn’t see the stage with Billy in front of him because he is so tall.”
Question asked : Who is so short? John or Billy
A Few Approaches
Over the years, various approaches have been proposed to improve the performance of coreference resolution. These include:
- Rule-based approaches:
- Use a set of hand-crafted rules to identify and cluster mentions in a text.
- Eimple and interpretable, but may not generalize well to new datasets.
- Mention-ranking approaches:
- Use machine learning techniques to rank candidate antecedents for each mention based on their features. Features may include syntactic, semantic, and discourse-level information. The highest-ranking antecedent is then selected as the final antecedent.
- Effective, but can be computationally expensive.
- Entity-based approaches:
- Hybrid approaches: These approaches combine multiple techniques to improve the performance of coreference resolution. Examples include,
Mention-ranking approach
Mention-ranking approaches to coreference resolution work by assigning scores to pairs of mentions based on their likelihood of coreference. Consider the following text as an example: “Nirmalya invited his old friends Suraj and Sandeep for dinner. The three friends had not seen each other in years, so the catch-up was long overdue.” There are several mentions, “Nirmalya”, “his”, “Suraj”, “Sandeep”, and “they”. A model based on the mention-ranking approach will first compute the probabilities
P("Nirmalya", "his") = 0.5 # OK
P("Suraj", "they") = 0.05
P("Sandeep", "they") = 0.03
P("Nirmalya", "they") = 0.02
P("Suraj and Sandeep", "they") = 0.4 # OK
Evaluation
At the time of writing (March 2023), I am aware of 3 main Python libraries used for coreference resolution. These are: allennlp
, fastcoref
and neuralcoref
. These 3 libraries have a few conflicting dependencies.
Library | Version | Dependencies | License |
---|---|---|---|
allennlp | 2.10.1 | spacy >= 2.10 | Apache 2.0 |
fastcoref | 2.1.1 | spacy == 3.0.6 | MIT |
neuralcoref | 4.0.0 | spacy >= 2.10 < 3.0.0 | MIT |
Ease Of Installation (based on my experience)
- allennlp, ⭐⭐⭐
- fastcoref, ⭐⭐⭐⭐⭐
- neuralcoref, ⭐⭐
Evaluation Methodology
For each library evaluated, I used the texts RWC1, RWC2, RWC3, WSC02, WSC06, and WSC10. I then calculated a score (maximum 6.0) for each, based on correct (1 point), partial (0.5 points) and incorrect (0 points) identification of the expected clusters.
Please note:
- This is not intended to be an academic / scientific / research-grade evaluation.
- The intent is to get a quick idea of the performance of the libraries against a few real world examples, challenges encountered (if any) and licenses.
Texts Used For Evaluation
texts_for_testing=[
"On November 3, 1992, Clinton was elected the 42nd president of the United States, and the following year Hillary Clinton became the first lady. In 2013, he won the Presidential Medal of Freedom.",
"While it is normal for defendants charged with felonies to be handcuffed – as former Trump Organisation chief financial officer Allen Weisselberg was in 2021 – one of Trump’s lawyers, Mr Joseph Tacopina, has said he does not expect that to occur.",
"""Mr Li, who was then the Shanghai chief, said PM Lee had gone to great lengths to discuss "people’s well-being and how to deliver tangible benefits to the people". "I was impressed by our conversation. You also talked about cultural diversity and inclusiveness," he told PM Lee.""",
"The trophy does not fit into the brown suitcase because it is too large.",
"The delivery truck zoomed by the school bus because it was going so slow.",
"John couldn’t see the stage with Billy in front of him because he is so tall.",
]
allennlp
Installing allennlp
was not smooth for me. I have created a Google Colab notebook to make it easier for you to try it out.
from allennlp.predictors.predictor import Predictor
model_url = "https://storage.googleapis.com/allennlp-public-models/coref-spanbert-large-2020.02.27.tar.gz"
predictor = Predictor.from_path(model_url)
for text in texts_for_testing:
print(" TEXT :", text)
prediction = predictor.predict(document=text)
tokens = prediction["document"]
l_clusters = []
for clusters in prediction["clusters"]:
a_cluster = []
for _,c in enumerate(clusters):
s = " ".join(tokens[c[0]:c[1]+1])
a_cluster.append(s)
l_clusters.append(a_cluster)
print("CLUSTERS :", l_clusters)
Results For allennlp
Example | Clusters | Correct? |
---|---|---|
RWC1 | [['Clinton', 'he']] |
Correct |
RWC2 | [['Trump Organisation', 'Trump ’s'], ['one of Trump ’s lawyers , Mr Joseph Tacopina ,', 'he'], ['be', 'that']] |
Correct |
RWC3 | [['Mr Li , who was then the Shanghai chief', 'I', 'he'], ['PM Lee', 'You', 'PM Lee']] |
Correct |
WSC02 | [['The trophy', 'it']] |
Correct |
WSC06 | [['The delivery truck', 'it']] |
Correct |
WSC10 | [['John', 'him'], ['Billy', 'he']] |
Correct |
Evaluation Score For allennlp
: 6.0 / 6.0
fastcoref
from fastcoref import FCoref
model = FCoref(device='cuda:0')
preds = model.predict(texts=texts_for_testing)
for i, text in enumerate(texts_for_testing):
print(" TEXT :", text)
print("CLUSTERS :", preds[i].get_clusters())
Results For fastcoref
Example | Clusters | Correct? |
---|---|---|
RWC1 | [['Clinton', 'he']] |
Correct |
RWC2 | [['one of Trump’s lawyers, Mr Joseph Tacopina', 'he'], ['handcuffed', 'that']] |
Correct |
RWC3 | [['Mr Li, who was then the Shanghai chief,', 'I', 'he'], ['PM Lee', 'You', 'PM Lee']] |
Correct |
WSC02 | [['the brown suitcase', 'it']] |
Incorrect |
WSC06 | [['The delivery truck', 'it']] |
Correct |
WSC10 | [['John', 'him', 'he']] |
Incorrect |
Evaluation Score For fastcoref
: 4.0 / 6.0
neuralcoref
Installing neuralcoref
was not smooth for me. So, I have added a separate subsection for help with troubleshooting.
import neuralcoref
import spacy
nlp = spacy.load("en_core_web_sm")
for i, text in enumerate(texts_for_testing):
doc_x = nlp(text)
print(" TEXT :", text)
print("CLUSTERS :", doc_x._.coref_clusters)
Results For neuralcoref
Example | Clusters | Correct? |
---|---|---|
RWC1 | [Clinton: [Clinton, Hillary Clinton, he]] |
Incorrect |
RWC2 | [Trump: [Trump, Trump], Mr Joseph Tacopina: [Mr Joseph Tacopina, he]] |
Correct |
RWC3 | [PM Lee: [PM Lee, he, PM Lee]] |
Partial |
WSC02 | [The trophy: [The trophy, it]] |
Correct |
WSC06 | [] |
Incorrect |
WSC10 | [John: [John, him, he]] |
Incorrect |
Evaluation Score For neuralcoref
: 2.5 / 6.0
Installation Troubleshooting For neuralcoref
- If you encountered an error similar to the error below, follow the instructions articulated in this answer on StackOverflow
Installing collected packages: neuralcoref
error: subprocess-exited-with-error
× Running setup.py install for neuralcoref did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
Running setup.py install for neuralcoref ... error: legacy-install-failure
× Encountered error while trying to install package.
╰─> neuralcoref
- If you encountered an issue similar to the warning below, you could try suggestion articulated in this Medium post.
RuntimeWarning: spacy.tokens.span.Span size changed, may indicate binary incompatibility. Expected X from C header, got Y from PyObject
- I have created a Google Colab notebook. I adapted it from this answer on StackOverflow.
TL;DR
- Coreference resolution is hard.
- Python libraries such as
allennlp
,fastcoref
andneuralcoref
make the task simpler. - Balancing speed, ease of installation and performance on real world examples, I suggest using
fastcoref
. Moreover, it comes with a MIT license, thus making it very suitable for use in commercial applications.
References
- Modeling Local Coherence: An Entity-Based Approach (2008)
- Entity-Centric Coreference Resolution with Model Stacking (2015)
- Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules (2013)
- Deep Reinforcement Learning for Mention-Ranking Coreference Models (2016)
- SpanBERT: Improving Pre-training by Representing and Predicting Spans (2019)
- Improving Coreference Resolution by Leveraging Entity-Centric Features with Graph Neural Networks and Second-order Inference (2020)
- A Brief Survey on Recent Advances in Coreference Resolution (2021)
- NYU’s Collection of Winograd Schemas
- Here are two great sites for understanding open source software licenses : FOSSA, and choosealicense.com