The Story That Explains NER
A human expert could do it — slowly, expensively, and with occasional mistakes. Now imagine doing it for 10 million articles, in 50 languages, in real time.
That is exactly what Named Entity Recognition (NER) does — automatically, at superhuman speed. It reads raw text and labels every important "thing" with a category. It is the highlighter that never tires.
Named Entity Recognition is an NLP (Natural Language Processing) task that identifies and classifies named entities — real-world objects — within unstructured text. A "named entity" is any noun that refers to a specific, identifiable thing: a person, an organisation, a location, a date, a currency, a product, and so on.
Most NLP tasks classify an entire sentence (sentiment analysis) or generate new text (summarisation). NER is a token-level classification task — it labels every individual word (or sub-word) in a sentence. This makes it fundamentally harder: the model must understand not just what a word means, but where it starts, where it ends, and what type of thing it refers to.
Why NER Matters — Real Applications
NER is not an academic exercise. It is a core component of some of the most important information systems in the world — powering everything from search engines to financial compliance systems.
The Standard Entity Types
Different NER systems use different label sets. The most universally used is the CoNLL-2003 / spaCy standard. Here are the most common entity categories:
| Label | Entity Type | Example | Domain |
|---|---|---|---|
| PERSON | People, real or fictional | Elon Musk, Sherlock Holmes | Universal |
| ORG | Companies, agencies, institutions | OpenAI, NASA, WHO | Universal |
| GPE | Geo-political entities (countries, cities) | India, New York, the EU | Universal |
| LOC | Non-GPE locations (mountains, rivers) | the Amazon, Mount Everest | Universal |
| DATE | Absolute or relative dates and periods | January 2024, last Tuesday, Q3 | Universal |
| MONEY | Monetary values including currency | $4.2 billion, £50,000 | Finance |
| PRODUCT | Objects, vehicles, foods | iPhone 15, Tesla Model S | E-commerce |
| EVENT | Named hurricanes, battles, elections | World War II, the Olympics | News |
| LAW | Named documents made into laws | GDPR, the US Constitution | Legal |
| PERCENT | Percentage including "%" | 85%, a quarter | Finance |
"Apple announced record profits" — is Apple a fruit or a company? "Jordan signed the agreement" — is Jordan a person or a country? "May confirmed the decision" — is May a month, a name, or a modal verb? This context-dependency is why simple rule-based approaches fail and why modern NER uses deep contextual models like Transformers.
How NER Works — The BIO Tagging Scheme
How do we tell a model where an entity starts versus where it continues? The answer is a simple but powerful labelling scheme called BIO tagging: Beginning, Inside, Outside.
Some systems use BIOES: Beginning, Inside, Outside, End, Single. The E tag marks the last token of a multi-word entity, and S marks a single-token entity. This gives the model stronger positional signals and often improves performance on longer entity spans.
Three Generations of NER Approaches
NER has evolved dramatically over the past 30 years. Understanding all three generations gives you the context to choose the right tool for any situation.
Quick Start — NER with spaCy
spaCy is the most production-ready NLP library in Python. Its NER pipeline is pre-trained and ready to use in under five lines of code.
# Step 1: Install spaCy and download a model
# Run in terminal: pip install spacy
# Run in terminal: python -m spacy download en_core_web_sm
import spacy
# Load a pre-trained English model
nlp = spacy.load("en_core_web_sm")
# Process a sentence
text = "Apple CEO Tim Cook announced a $110 billion share buyback in Cupertino on Tuesday."
doc = nlp(text)
# Print all detected entities
for ent in doc.ents:
print(f"{ent.text:25} | {ent.label_:10} | {spacy.explain(ent.label_)}")
spacy.explain("GPE") returns a human-readable description of any label.
Never guess what a cryptic tag means — always explain it. Use this liberally
when exploring a new model's output or debugging unexpected labels.
Visualising NER Output — displaCy
spaCy ships with a built-in visualiser called displaCy that renders NER annotations beautifully inline in Jupyter Notebooks or as standalone HTML.
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
text = """Elon Musk founded SpaceX in 2002 in Hawthorne, California.
The company secured a $1.6 billion NASA contract in 2008."""
doc = nlp(text)
# Render inline in a Jupyter Notebook
displacy.render(doc, style="ent", jupyter=True)
# Or save as standalone HTML file
html = displacy.render(doc, style="ent", page=True)
with open("ner_output.html", "w", encoding="utf-8") as f:
f.write(html)
Elon Musk PERSON founded SpaceX ORG in 2002 DATE in Hawthorne, California GPE. The company secured a $1.6 billion MONEY NASA ORG contract in 2008 DATE.
This is how displaCy renders entity spans directly inside your browser or notebook.
Choosing Your Model — sm vs md vs lg vs trf
spaCy ships four model tiers. Choosing the right one is a trade-off between accuracy, speed, and memory. Here is the definitive comparison:
| Model | Architecture | NER F1 | Speed | RAM | Best For |
|---|---|---|---|---|---|
| en_core_web_sm | CNN + HashEmbed | ~0.85 | Very fast | ~12 MB | Prototyping, edge devices |
| en_core_web_md | CNN + GloVe vectors | ~0.85 | Fast | ~43 MB | General production, better OOV |
| en_core_web_lg | CNN + large GloVe | ~0.86 | Moderate | ~741 MB | When embedding coverage matters |
| en_core_web_trf | RoBERTa Transformer | ~0.90 | Slow (needs GPU) | ~435 MB + GPU | Maximum accuracy in production |
Start with en_core_web_sm for all prototyping.
Switch to en_core_web_trf only when accuracy
on ambiguous or domain-specific text is critical — and only if you have a GPU.
For most batch-processing pipelines, en_core_web_md hits the
ideal accuracy/speed sweet spot.
NER with HuggingFace Transformers
For the highest accuracy — particularly on domain-specific text — the HuggingFace
transformers library gives you access to hundreds of fine-tuned NER models
from the Model Hub. Here is how to use it with a single pipeline call:
from transformers import pipeline
# Load a BERT-based NER model from HuggingFace Hub
ner = pipeline(
"ner",
model="dslim/bert-base-NER",
aggregation_strategy="simple" # merges B/I tokens into spans
)
text = "Sundar Pichai, CEO of Google, visited London last week for a summit on AI safety."
results = ner(text)
for entity in results:
print(f"Entity : {entity['word']}")
print(f"Type : {entity['entity_group']}")
print(f"Score : {entity['score']:.4f}")
print(f"Span : chars {entity['start']}–{entity['end']}")
print("---")
Without aggregation_strategy="simple", HuggingFace returns one row
per sub-word token — so "New York" becomes three separate rows with
B-GPE, I-GPE, I-GPE. Setting it to "simple" or "first"
collapses these into a single entity span automatically. Always set this.
Training a Custom NER Model — The Full Pipeline
Pre-trained models are trained on news text. In medical, legal, scientific, or niche domains, you almost always need to train a custom NER model on domain-specific labelled data. Here is the complete workflow.
import spacy
from spacy.tokens import DocBin
from spacy.training import Example
# ── Step 1: Define training data ───────────────────────────
# Format: (text, {"entities": [(start, end, label), ...]})
TRAIN_DATA = [
("Give Bella 0.5ml of Metacam twice daily.",
{"entities": [(5, 10, "ANIMAL_NAME"), (11, 16, "DOSAGE"), (20, 27, "DRUG")]}),
("Max the Labrador received 10mg of Carprofen.",
{"entities": [(0, 3, "ANIMAL_NAME"), (8, 16, "SPECIES"), (26, 31, "DOSAGE"), (35, 44, "DRUG")]}),
]
# ── Step 2: Create a blank English model ───────────────────
nlp = spacy.blank("en")
ner = nlp.add_pipe("ner")
# ── Step 3: Add custom labels ──────────────────────────────
for _, annotations in TRAIN_DATA:
for _, _, label in annotations["entities"]:
ner.add_label(label)
# ── Step 4: Build DocBin for efficient training ────────────
db = DocBin()
for text, annotations in TRAIN_DATA:
doc = nlp.make_doc(text)
example = Example.from_dict(doc, annotations)
db.add(example.reference)
db.to_disk("./train.spacy")
# ── Step 5: Train (basic loop — use spacy train CLI for real projects)
nlp.initialize()
optimizer = nlp.begin_training()
for itn in range(30):
losses = {}
examples = []
for text, annotations in TRAIN_DATA:
doc = nlp.make_doc(text)
example = Example.from_dict(doc, annotations)
examples.append(example)
nlp.update(examples, drop=0.3, losses=losses)
if itn % 10 == 0:
print(f"Iteration {itn:3d} | Loss: {losses['ner']:.4f}")
# ── Step 6: Save and test the model ───────────────────────
nlp.to_disk("./vet_ner_model")
print("Model saved.")
# Test on a new sentence
nlp2 = spacy.load("./vet_ner_model")
doc = nlp2("Administer 0.2ml of Metacam to Rocky.")
for ent in doc.ents:
print(f"{ent.text} → {ent.label_}")
Evaluating NER — Metrics That Matter
NER is evaluated at the entity span level, not the word level. An entity is only counted as correct if both its boundary and its type are predicted correctly. A partial match counts as a full miss.
from seqeval.metrics import classification_report, f1_score
# True labels (BIO format, one list per sentence)
y_true = [["B-PERSON", "I-PERSON", "O", "B-ORG", "O"]]
y_pred = [["B-PERSON", "I-PERSON", "O", "B-GPE", "O"]]
# Note: model confused ORG → GPE for the 4th token
print(classification_report(y_true, y_pred))
print(f"Overall F1: {f1_score(y_true, y_pred):.4f}")
Never use scikit-learn's classification_report directly on BIO labels — it
evaluates at the token level and gives misleadingly high scores.
seqeval correctly handles span-level evaluation. Install with
pip install seqeval. Always use it.
NER in a Real Pipeline — Building an Entity Extractor
In production, NER is rarely used alone. It feeds downstream tasks like entity linking, relation extraction, or knowledge graph population. Here is a complete, production-ready entity extraction pipeline:
import spacy
import pandas as pd
from collections import Counter
from typing import List, Dict
# ── Load model ─────────────────────────────────────────────
nlp = spacy.load("en_core_web_sm")
# ── Sample news corpus ─────────────────────────────────────
corpus = [
"Amazon acquired MGM for $8.45 billion in 2021, strengthening its Prime Video library.",
"Satya Nadella, CEO of Microsoft, met with EU regulators in Brussels on Monday.",
"The Federal Reserve raised interest rates by 0.25% to combat inflation in the United States.",
"Tesla reported $25.1 billion in revenue for Q4 2023, beating Wall Street estimates.",
]
# ── Extract entities from all documents ────────────────────
def extract_entities(texts: List[str]) -> List[Dict]:
records = []
for doc in nlp.pipe(texts): # nlp.pipe() is much faster than looping
for ent in doc.ents:
records.append({
"text": doc.text[:60] + "...",
"entity": ent.text,
"label": ent.label_,
"start": ent.start_char,
"end": ent.end_char,
"description": spacy.explain(ent.label_)
})
return records
records = extract_entities(corpus)
df = pd.DataFrame(records)
# ── Summarise results ──────────────────────────────────────
print("=== Entity Type Distribution ===")
print(df["label"].value_counts().to_string())
print("\n=== Top Mentioned Organisations ===")
orgs = df[df["label"] == "ORG"]["entity"]
print(Counter(orgs).most_common(5))
nlp.pipe(texts) processes a list of documents in batches, making
full use of vectorised operations. It can be 10–50× faster than
calling nlp(text) in a loop. For large corpora, add
disable=["tagger", "parser"] to skip components you don't need —
running only the NER head gives another 2–3× speedup.
Common NER Pitfalls and How to Fix Them
| Pitfall | Symptom | Fix |
|---|---|---|
| Wrong entity type | "Apple" → PERSON instead of ORG | Use a larger model (trf); add context in training data |
| Span boundary errors | "New York" found but "New York City" missed | Fine-tune on domain text; add more multi-word examples |
| Missed rare entities | New company names not detected | Augment with rule-based patterns using EntityRuler |
| Domain mismatch | Medical/legal terms unrecognised | Fine-tune on domain-specific labelled data |
| Overlapping entities | "New York Times" tagged as both ORG and LOC | Prioritise by label; add entity to training to assert type |
| Slow batch processing | Processing 100k docs takes hours | Use nlp.pipe(); disable unused pipeline components |
import spacy
from spacy.pipeline import EntityRuler
# Fix: Add rule-based patterns for known entities the model misses
nlp = spacy.load("en_core_web_sm")
# EntityRuler runs BEFORE the statistical NER model
ruler = nlp.add_pipe("entity_ruler", before="ner")
patterns = [
{"label": "ORG", "pattern": "DeepMind"},
{"label": "ORG", "pattern": "Anthropic"},
{"label": "PRODUCT", "pattern": [{"LOWER": "gpt"}, {"IS_DIGIT": True}]},
{"label": "PRODUCT", "pattern": "Claude"},
]
ruler.add_patterns(patterns)
doc = nlp("Anthropic released Claude 3, while DeepMind launched Gemini Ultra.")
for ent in doc.ents:
print(f"{ent.text} → {ent.label_}")
NER Model Comparison — spaCy vs HuggingFace vs Stanford
| Property | spaCy (sm/md/lg) | HuggingFace Transformers | Stanford CoreNLP |
|---|---|---|---|
| Architecture | CNN / HashEmbed | BERT / RoBERTa | CRF (statistical) |
| CoNLL-2003 F1 | ~0.85–0.90 | ~0.91–0.94 | ~0.87 |
| Speed (CPU) | Very fast (~50k words/s) | Slow (~2–5k words/s) | Moderate |
| Custom training | Yes — easy CLI | Yes — Trainer API | Complex Java setup |
| Multilingual | 60+ languages | mBERT, XLM-RoBERTa | Limited |
| Best for | Production pipelines, speed | Max accuracy, research | Legacy Java systems |
Golden Rules of NER
nlp.pipe() for batch processing.
Calling nlp(text) in a loop is single-threaded and skips internal
batching optimisations. For 10,000+ documents, the difference can be an hour vs. minutes.
spacy.explain(label) when inspecting
unfamiliar labels. Never guess what NORP,
FAC, or WORK_OF_ART means —
documentation is one call away.
xlm-roberta-base) rather than training per-language models.
A single cross-lingual model fine-tuned on your target languages beats separate
models in almost every low-resource scenario.