The Story That Makes Chunking Click
In seconds, they've produced a summary skeleton: a list of highlighted phrases that carry the story's meaning. They haven't diagrammed every grammatical relationship — they've just grouped words into meaningful bundles.
That is precisely what chunking does in NLP. It is the art of identifying and extracting flat, non-overlapping phrases — noun groups, verb groups, prepositional groups — from raw text, without the full complexity of a constituency or dependency parse. Fast. Practical. The editor's highlighter for machines.
Chunking sits between two worlds: it is more powerful than simple tokenization and POS tagging, but lighter and faster than full syntactic parsing. This balance makes it one of the most practically useful tools in the NLP practitioner's toolkit.
Chunking (also called shallow parsing or partial parsing) is the task of identifying and grouping sequences of tokens into syntactically correlated chunks — most commonly Noun Phrases (NP), Verb Phrases (VP), and Prepositional Phrases (PP). Unlike full parsing, chunks are flat (non-recursive, non-overlapping) and are defined by surface patterns over Part-of-Speech tags, making them extremely fast to compute.
Where Chunking Lives in the NLP Pipeline
Chunking is a mid-level step. It depends on the outputs of earlier stages and feeds into downstream tasks like named entity recognition, relation extraction, and information retrieval.
Unprocessed string — no structure yet.
[NP The fast red car] [VP overtook] [NP a slow blue truck] [PP on the motorway]
The Three Core Chunk Types
While chunking can target any phrase type, three dominate practical NLP applications. Each has a characteristic POS tag pattern that a chunker can learn or be programmed to recognize.
Pattern:
DT? JJ* NN+Example: "a brilliant young scientist", "the new government policy", "three ancient stone temples"
Pattern:
MD? VB* VBD|VBZ|VBG|VBNExample: "has been running", "will have completed", "was quickly overtaken"
Pattern:
IN NPExample: "on the table", "in the morning", "across the busy motorway"
Penn Treebank POS Tags You Must Know for Chunking
| Tag | Part of Speech | Example Words | Role in Chunking |
|---|---|---|---|
| DT | Determiner | the, a, an, this, every | NP opener — signals start of noun phrase |
| JJ | Adjective | quick, brilliant, heavy | NP filler — modifies the head noun |
| JJR / JJS | Comparative / Superlative Adj | faster, brightest | NP filler variant |
| NN / NNS | Noun (singular / plural) | car, scientists | NP head — the core of the chunk |
| NNP / NNPS | Proper Noun (singular / plural) | Alice, United Nations | NP head for named entities |
| VB / VBD / VBG | Verb base / past / gerund | run, ran, running | VP head or filler |
| VBN / VBP / VBZ | Verb past-part / non-3rd / 3rd | seen, see, sees | VP head or filler |
| MD | Modal | will, would, can, must | VP opener |
| RB / RBR / RBS | Adverb / Comparative / Superlative | quickly, faster | VP or ADVP filler |
| IN | Preposition / Subordinating conjunction | on, in, over, because | PP opener |
| CD | Cardinal Number | three, 42, 1.5 | NP filler (quantifier) |
| PRP / PRP$ | Personal / Possessive Pronoun | he, she, their | Standalone NP |
IOB Encoding — How Chunkers Represent Phrases
This is IOB tagging — the universal encoding for chunked text. It turns the grouping problem into a simple sequence labelling problem, which statistical models like CRFs and neural networks excel at.
IOB Example — Sentence with NP and VP Chunks
| Token | POS Tag | IOB Tag | Chunk |
|---|---|---|---|
| The | DT | B-NP | NP: "The fast red car" |
| fast | JJ | I-NP | |
| red | JJ | I-NP | |
| car | NN | I-NP | |
| has | VBZ | B-VP | VP: "has overtaken" |
| overtaken | VBN | I-VP | |
| a | DT | B-NP | NP: "a slow blue truck" |
| slow | JJ | I-NP | |
| blue | JJ | I-NP | |
| truck | NN | I-NP | |
| on | IN | O | Outside any chunk |
| the | DT | B-NP | NP: "the motorway" |
| motorway | NN | I-NP | |
| . | . | O | Outside any chunk |
Some systems use BIOES encoding: Begin, Inside, Outside, End, Singleton (a one-token chunk). For example, a single-word NP like "Alice" gets S-NP instead of B-NP immediately followed by nothing. BIOES gives neural models a richer signal and often improves F1 by 0.5–1 point. NLTK uses IOB2 (a cleaner IOB variant where every chunk always starts with B-, never I- without a preceding B-).
Rule-Based Chunking with NLTK RegexpParser
The simplest chunker uses handcrafted regular expressions over POS tags.
NLTK's RegexpParser lets you define grammar rules that look exactly like the
phrase structure rules a linguist would write — readable, transparent, and easy to debug.
{…} — Define what TO include in the chunk (chunk rule)
}<TAG>{ — Define what to EXCLUDE / chink from an existing chunk
<DT>? — Optional tag (zero or one)
<JJ>* — Zero or more adjectives
<JJ>+ — One or more adjectives
<NN.*> — Any tag beginning with NN (NN, NNS, NNP, NNPS)
<VB.*> — Any verb tag (VB, VBD, VBG, VBN, VBP, VBZ)
Step 1 — Basic NP Chunker
import nltk
from nltk import RegexpParser, pos_tag, word_tokenize
nltk.download("punkt", quiet=True)
nltk.download("averaged_perceptron_tagger", quiet=True)
# ── Define a Noun Phrase grammar ──────────────────────────────
# DT? JJ* NN+ → optional determiner, zero-or-more adjectives, one-or-more nouns
np_grammar = r"""
NP: {<DT>?<JJ.*>*<NN.*>+}
"""
np_parser = RegexpParser(np_grammar)
# ── Tokenize and POS-tag a sentence ──────────────────────────
sentence = "The brilliant young researcher published a groundbreaking paper on AI safety."
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)
print("POS Tags:")
print(pos_tags)
print()
# ── Parse into chunks ─────────────────────────────────────────
tree = np_parser.parse(pos_tags)
print("Chunk Tree:")
tree.pretty_print()
# ── Extract only the NP chunks ────────────────────────────────
print("\nExtracted Noun Phrases:")
for subtree in tree.subtrees(filter=lambda t: t.label() == "NP"):
phrase = " ".join(word for word, tag in subtree.leaves())
print(f" NP → '{phrase}'")
Step 2 — Multi-Rule Grammar: NP + VP + PP
# ── Full shallow parse grammar ────────────────────────────────
full_grammar = r"""
NP: {<DT>?<JJ.*>*<NN.*>+} # Noun Phrase
{<PRP>} # Pronoun as standalone NP
{<CD><NN.*>+} # Number + noun e.g. "three scientists"
VP: {<MD>?<RB>?<VB.*>+} # Verb Phrase: optional modal + optional adverb + verb(s)
PP: {<IN><NP>} # Prepositional Phrase: preposition + NP
"""
full_parser = RegexpParser(full_grammar)
sentences = [
"She will quickly send three important reports to the committee.",
"The old professor has been teaching quantum mechanics at MIT for thirty years.",
]
for sent in sentences:
tokens = word_tokenize(sent)
pos_tags = pos_tag(tokens)
tree = full_parser.parse(pos_tags)
print(f"\nSentence: {sent}")
print("-" * 60)
for subtree in tree.subtrees(filter=lambda t: t.label() in ("NP", "VP", "PP")):
phrase = " ".join(w for w, _ in subtree.leaves())
print(f" {subtree.label():4} → '{phrase}'")
Step 3 — Chinking: Removing Words From Chunks
# Chinking = punching holes in chunks to EXCLUDE certain tags
# Use }TAG{ syntax to define what to remove from inside a chunk
chink_grammar = r"""
NP: {<.*>+} # Chunk everything
}<VBD|VBZ|IN>{ # Then chink (remove) verbs and prepositions
"""
chink_parser = RegexpParser(chink_grammar)
sentence = "The quick fox jumps over the lazy dog near the old river."
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)
tree = chink_parser.parse(pos_tags)
print("After chinking (verbs and prepositions removed from chunks):")
for subtree in tree.subtrees(filter=lambda t: t.label() == "NP"):
phrase = " ".join(w for w, _ in subtree.leaves())
print(f" NP → '{phrase}'")
Statistical NP Chunking with spaCy
spaCy's noun_chunks property uses a statistical model trained on dependency
parse annotations to extract noun phrases. It is more accurate than regex rules on real-world text
because it uses contextual information — not just local POS patterns.
spaCy's noun_chunks are derived from the dependency tree, not from
surface POS patterns. This means they respect linguistic boundaries more accurately — for example,
they correctly handle possessives ("the company's CEO"), coordinated NPs ("cats and dogs"),
and embedded clauses. For most production NLP work, prefer spaCy over NLTK regex chunking.
Use NLTK regex chunking only when you need a transparent, auditable rule set or are working
without a trained model.
import spacy
nlp = spacy.load("en_core_web_sm")
text = """
The World Health Organization announced new guidelines on Monday.
A team of brilliant researchers from Oxford University has developed
a promising vaccine candidate for the tropical disease.
Global markets reacted positively to the unexpected news.
"""
doc = nlp(text.strip())
print(f"{'Noun Chunk':40} {'Root':15} {'Root Dep':12} {'Root Head'}")
print("-" * 80)
for chunk in doc.noun_chunks:
print(f"{chunk.text:40} {chunk.root.text:15} {chunk.root.dep_:12} {chunk.root.head.text}")
Filtering and Enriching Noun Chunks
from collections import Counter
# ── Filter by chunk role in the sentence ─────────────────────
doc = nlp("The curious cat chased the tiny scared mouse across the dusty old floor.")
subjects = [chunk.text for chunk in doc.noun_chunks
if chunk.root.dep_ in ("nsubj", "nsubjpass")]
objects = [chunk.text for chunk in doc.noun_chunks
if chunk.root.dep_ in ("dobj", "obj", "pobj")]
print("Subject NPs:", subjects)
print("Object NPs: ", objects)
# ── Count most frequent noun chunks across a corpus ───────────
corpus = [
"The data science team finished the quarterly report.",
"A data science expert reviewed the quarterly report.",
"The engineering team deployed the new model.",
"The new model exceeded expectations across the entire team.",
]
chunk_counts = Counter()
for text in corpus:
for chunk in nlp(text).noun_chunks:
chunk_counts[chunk.text.lower()] += 1
print("\nMost frequent noun chunks in corpus:")
for phrase, count in chunk_counts.most_common(6):
print(f" {phrase:30} × {count}")
Visualizing Chunk Structure
Understanding chunk structure visually is critical for debugging your chunker and communicating results. Below are three visualization approaches ranging from terminal ASCII to rich SVG diagrams.
[NP She] [VP will send] [NP three urgent reports] to [NP the committee] Token-by-token IOB labels: She/PRP → B-NP will/MD → B-VP send/VB → I-VP three/CD → B-NP urgent/JJ → I-NP reports/NNS → I-NP to/IN → O the/DT → B-NP committee/NN → I-NP
Blue = NP chunk | Amber = VP chunk | Green = NP chunk | Red = Outside (O)
Rendering with NLTK draw() and spaCy displacy
import nltk
from nltk import RegexpParser, pos_tag, word_tokenize
import spacy
from spacy import displacy
# ── NLTK: Draw chunk tree (opens GUI window) ──────────────────
grammar = r"NP: {<DT>?<JJ.*>*<NN.*>+}"
parser = RegexpParser(grammar)
tokens = word_tokenize("The brilliant scientist found a new planet.")
tags = pos_tag(tokens)
tree = parser.parse(tags)
# In a desktop Python session (not Jupyter):
# tree.draw()
# As a pretty-printed text tree:
tree.pretty_print()
# ── spaCy: Visualize with displacy (works in Jupyter) ─────────
nlp = spacy.load("en_core_web_sm")
doc = nlp("The brilliant scientist found a new distant planet.")
# displacy in "dep" style shows the full dependency tree
# Noun chunks are highlighted within it
svg = displacy.render(doc, style="dep", jupyter=False)
# Save SVG for embedding in a web page
with open("chunk_tree.svg", "w", encoding="utf-8") as f:
f.write(svg)
# ── Custom HTML span visualizer ───────────────────────────────
def highlight_chunks(doc):
"""Return HTML with noun chunks highlighted in colored spans."""
chunk_spans = {(chunk.start, chunk.end): chunk.text
for chunk in doc.noun_chunks}
html_parts = []
i = 0
while i < len(doc):
found = False
for (start, end), text in chunk_spans.items():
if i == start:
html_parts.append(
f'<mark style="background:#6366f120;border:1px solid #6366f1;'
f'border-radius:4px;padding:1px 4px;">{text}</mark>'
)
i = end
found = True
break
if not found:
html_parts.append(doc[i].text_with_ws)
i += 1
return " ".join(html_parts)
html_output = highlight_chunks(doc)
print(html_output)
Statistical Chunking with a CRF Model
Rule-based chunkers break on irregular text. A Conditional Random Field (CRF)
learns chunking from annotated data, handling context and exceptions automatically.
The sklearn-crfsuite library makes training a CRF chunker straightforward.
A CRF looks at the surrounding context — the previous and next tags, the previous IOB label, even the word itself — and makes a globally optimal decision across the entire sequence. It learns, from thousands of examples, that "flying" before "planes" usually signals an NP, not a VP. Context wins.
import sklearn_crfsuite
from sklearn_crfsuite import metrics as crf_metrics
from sklearn.model_selection import train_test_split
import nltk
from nltk.corpus import conll2000
nltk.download("conll2000", quiet=True)
# ── Feature extraction for CRF chunker ───────────────────────
def word_features(sent, i):
"""Features for token at position i in the POS-tagged sentence."""
word, pos = sent[i]
features = {
"word.lower": word.lower(),
"word[-3:]": word[-3:],
"word[-2:]": word[-2:],
"word.isupper": word.isupper(),
"word.istitle": word.istitle(),
"pos": pos,
"pos[:2]": pos[:2],
}
if i > 0:
pw, pp = sent[i - 1]
features.update({"-1:word.lower": pw.lower(), "-1:pos": pp})
else:
features["BOS"] = True # Beginning of sentence
if i < len(sent) - 1:
nw, np_ = sent[i + 1]
features.update({"+1:word.lower": nw.lower(), "+1:pos": np_})
else:
features["EOS"] = True # End of sentence
return features
def sent_to_features(sent):
return [word_features(sent, i) for i in range(len(sent))]
def sent_to_labels(sent):
return [iob for _, _, iob in sent]
# ── Load CoNLL-2000 chunking corpus ───────────────────────────
train_sents = conll2000.iob_sents("train.txt")[:7000] # 7k training sentences
test_sents = conll2000.iob_sents("test.txt")[:1000] # 1k test sentences
X_train = [sent_to_features(s) for s in train_sents]
y_train = [sent_to_labels(s) for s in train_sents]
X_test = [sent_to_features(s) for s in test_sents]
y_test = [sent_to_labels(s) for s in test_sents]
# ── Train the CRF model ───────────────────────────────────────
crf = sklearn_crfsuite.CRF(
algorithm="lbfgs",
c1=0.1, # L1 regularization
c2=0.1, # L2 regularization
max_iterations=100,
all_possible_transitions=True
)
crf.fit(X_train, y_train)
# ── Evaluate ──────────────────────────────────────────────────
y_pred = crf.predict(X_test)
labels = list(crf.classes_)
labels.remove("O") # Exclude "Outside" from report
print(crf_metrics.flat_classification_report(y_test, y_pred, labels=labels, digits=4))
The CoNLL-2000 shared task is the standard benchmark for chunking, measured by F1 score on the test split of the Wall Street Journal corpus. A basic CRF achieves ~93% F1. State-of-the-art neural chunkers (BERT fine-tuned) reach ~97% F1. For NP chunks specifically, the CRF above already achieves 93.4% F1 — sufficient for most production information extraction pipelines.
Phrase Structure — The Linguistic Foundation
Chunking is a practical approximation of full phrase structure. To truly understand it, you need to know what phrase structure theory says — and where chunking takes a deliberate shortcut.
• A head (the core word that names the phrase type: N, V, P)
• A specifier (a phrase in the outer left position: the determiner in an NP)
• Complements (phrases that the head requires: the object of a verb)
• Adjuncts (optional modifiers that can be freely added or removed)
Chunking captures the specifier + adjuncts + head part — the flat left side of the phrase — but deliberately ignores complements and recursive embedding. That is its deliberate design trade-off: speed and simplicity over completeness.
Phrase Types — Structure and Examples
"the brilliant young professor"
Head: professor | Spec: the | Adj: brilliant, young
Chunked flat: DT JJ JJ NN → one NP chunk
"will quickly announce"
Head: announce | Aux: will | Mod: quickly
Chunked flat: MD RB VB → one VP chunk
"across the busy motorway"
Head: across | Complement: the busy motorway
Chunked: IN + NP → one PP chunk
"very tall", "extremely important"
Head: tall/important | Mod: very/extremely
Chunked flat: RB JJ → one ADJP chunk
"very quickly", "quite slowly"
Head: quickly/slowly | Mod: very/quite
Chunked flat: RB RB → one ADVP chunk
"that she discovered", "when he arrived"
Head: discovered/arrived | Comp: that/when
Not typically chunked (contains full clause)
The Critical Difference — Chunking vs. Full Phrase Structure
| Node | Content | Depth |
|---|---|---|
| S | the whole sentence | 0 |
| NP | "The old man" | 1 |
| VP | "saw the woman with the telescope" | 1 |
| NP | "the woman with the telescope" | 2 |
| PP | "with the telescope" | 3 |
| NP | "the telescope" | 4 |
| Chunk | Content | Level |
|---|---|---|
| NP | "The old man" | flat |
| VP | "saw" | flat |
| NP | "the woman" | flat |
| O | "with" | outside |
| NP | "the telescope" | flat |
| No nesting — PP "with the telescope" is split | ||
Chunking's flatness means it cannot capture nested structure. The PP "with the telescope" in "the woman with the telescope" is split — "with" goes outside, "the telescope" becomes its own NP. If your task requires knowing that "the telescope" modifies "the woman" (not the verb), you need full constituency or dependency parsing. For most extraction tasks, flat chunks are sufficient and much faster.
Real-World Applications of Chunking
Production-Ready Chunking Pipeline
Below is a complete, self-contained chunking pipeline that handles raw text, extracts NP/VP/PP chunks, filters by grammatical role, and outputs structured JSON — ready to feed into a downstream information extraction or search indexing system.
import spacy
import json
from dataclasses import dataclass, asdict
from typing import List, Dict, Optional
nlp = spacy.load("en_core_web_sm")
# ── Data classes for structured output ───────────────────────
@dataclass
class Chunk:
text: str
label: str # "NP" | "VP" | "PP"
start_char: int
end_char: int
root_word: str
root_dep: str
role: Optional[str] = None # "subject" | "object" | "modifier"
@dataclass
class ParsedSentence:
text: str
chunks: List[Chunk]
# ── Core chunking function ────────────────────────────────────
def chunk_sentence(sent) -> ParsedSentence:
"""Extract and classify all chunks from a spaCy sentence span."""
chunks = []
# Noun Phrases — from spaCy's dependency-based noun_chunks
for np in sent.as_doc().noun_chunks:
role = None
if np.root.dep_ in ("nsubj", "nsubjpass"):
role = "subject"
elif np.root.dep_ in ("dobj", "obj"):
role = "object"
elif np.root.dep_ == "pobj":
role = "pobj"
chunks.append(Chunk(
text=np.text, label="NP",
start_char=np.start_char, end_char=np.end_char,
root_word=np.root.text, root_dep=np.root.dep_,
role=role
))
# Verb Phrases — collect auxiliary + main verb spans
for token in sent:
if token.dep_ == "ROOT" and token.pos_ == "VERB":
vp_tokens = [t for t in token.children
if t.dep_ in ("aux", "auxpass", "neg", "advmod")
and t.i < token.i]
vp_tokens.append(token)
vp_tokens.sort(key=lambda t: t.i)
vp_text = " ".join(t.text for t in vp_tokens)
chunks.append(Chunk(
text=vp_text, label="VP",
start_char=vp_tokens[0].idx,
end_char=vp_tokens[-1].idx + len(vp_tokens[-1].text),
root_word=token.text, root_dep="ROOT",
role="predicate"
))
# Sort chunks by their position in the sentence
chunks.sort(key=lambda c: c.start_char)
return ParsedSentence(text=sent.text, chunks=chunks)
# ── Run the pipeline ──────────────────────────────────────────
text = """
The European Space Agency successfully launched a new climate monitoring satellite.
Scientists will analyze the collected data over the next five years.
The mission could dramatically improve our understanding of global warming.
"""
doc = nlp(text.strip())
results = [chunk_sentence(sent) for sent in doc.sents]
for parsed in results:
print(f"\nSentence: {parsed.text}")
print(f" {'Label':4} {'Role':12} Text")
print(f" {'-'*50}")
for chunk in parsed.chunks:
print(f" {chunk.label:4} {(chunk.role or ''):12} '{chunk.text}'")
# Export to JSON
output = [asdict(r) for r in results]
print("\nJSON snippet:")
print(json.dumps(output[0], indent=2)[:500] + "\n...")
Evaluating a Chunker — Metrics and Benchmarks
Chunker quality is measured at the chunk span level — not per token, but per complete phrase. A predicted chunk is correct only if its boundary (start + end position) AND its label (NP, VP, PP) exactly match the gold annotation.
from seqeval.metrics import classification_report, f1_score
from seqeval.scheme import IOB2
# ── Example: Compare gold vs predicted IOB sequences ─────────
gold_labels = [
["B-NP", "I-NP", "I-NP", "B-VP", "B-NP", "I-NP", "O", "B-NP", "I-NP"],
["B-NP", "B-VP", "I-VP", "B-NP", "I-NP", "I-NP"],
]
pred_labels = [
["B-NP", "I-NP", "I-NP", "B-VP", "B-NP", "I-NP", "O", "B-NP", "O"], # last token missed
["B-NP", "B-VP", "I-VP", "B-NP", "I-NP", "I-NP"], # perfect
]
print(
classification_report(gold_labels, pred_labels, mode="strict", scheme=IOB2)
)
f1 = f1_score(gold_labels, pred_labels, mode="strict", scheme=IOB2)
print(f"Overall F1: {f1:.4f}")
| System | Approach | CoNLL-2000 F1 | Speed |
|---|---|---|---|
| NLTK RegexpParser | Rule-based regex | ~82–86% | Instant |
| CRF (sklearn-crfsuite) | Statistical, hand features | ~93–94% | Fast |
| spaCy noun_chunks | Dep-parse based (stat) | ~92–93% (NP only) | Fast |
| BiLSTM-CRF | Neural sequence labelling | ~95–96% | Medium (GPU) |
| BERT fine-tuned | Transformer sequence labelling | ~97%+ | Slow without GPU |
Common Pitfalls and Golden Rules
O (outside) with no warning. Always test on
diverse, real-world sentences and inspect uncovered spans. Add rules or switch to a statistical
chunker when coverage gaps appear.
CC (coordinating conjunction) or use spaCy's
dependency-based noun_chunks which handles coordination correctly.
O) is easy to predict. Always use
span-level F1 via the seqeval library or CoNLL evaluation scripts, which require
exact span + label matches.
Quick Reference — Chunking Cheat Sheet
| Task | Tool | Key API |
|---|---|---|
| Rule-based NP chunking | NLTK | RegexpParser(grammar).parse(pos_tags) |
| Extract NLTK chunk spans | NLTK | tree.subtrees(filter=lambda t: t.label()=="NP") |
| Statistical NP chunks | spaCy | doc.noun_chunks |
| Chunk text (string) | spaCy | chunk.text |
| Chunk root word | spaCy | chunk.root.text |
| Chunk grammatical role | spaCy | chunk.root.dep_ |
| Chunk head word | spaCy | chunk.root.head.text |
| Filter chunks by role | spaCy | if chunk.root.dep_ == "nsubj" |
| Train CRF chunker | sklearn-crfsuite | CRF().fit(X_train, y_train) |
| Evaluate chunk F1 | seqeval | f1_score(gold, pred, mode="strict", scheme=IOB2) |
| IOB classification report | seqeval | classification_report(gold, pred) |
| Visualize chunks inline | spaCy displacy | displacy.render(doc, style="dep") |
Chunking is shallow parsing: it reads POS-tagged text and groups tokens into flat,
non-overlapping, non-recursive phrase chunks — primarily NPs, VPs, and PPs — using either
hand-crafted regex rules (NLTK) or statistical models (CRF, spaCy, BERT).
It sits in the sweet spot between POS tagging (too fine-grained) and full syntactic parsing
(too expensive): fast enough for large corpora, rich enough for extraction, and interpretable
enough to debug. Master chunking, and you hold the fastest path from raw text to structured
meaning — the newspaper editor's highlighter, running at machine speed.