The Signal Behind Every Recommendation
Which friend gave you more information about how much she loved the film?
The first friend gave you explicit feedback — a direct, intentional, verbal declaration of preference. The second friend gave you implicit feedback — a rich stream of behavioural signals, none of them labelled, all of them honest. The second friend's signals are harder to read, but they are far more plentiful, and in many ways more truthful — because behaviour is difficult to fake and requires no effort to produce.
This is the central tension of modern recommender systems. Explicit feedback is clean but rare. Implicit feedback is noisy but abundant. Mastering both — knowing when to trust each, how to process each, and how to combine them — is the craft at the heart of recommendation engineering.
Netflix reports that fewer than 1% of users rate content explicitly. Amazon receives explicit reviews on fewer than 2% of purchases. Spotify users almost never thumb-rate individual songs. If recommendation systems relied solely on explicit feedback, they would have data on almost no one. The industry pivoted to implicit feedback around 2008 — and accuracy improved dramatically. Understanding the difference between these two signal types is not academic; it determines the entire architecture of a modern recommender system.
The choice between explicit and implicit feedback shapes every downstream design decision — from data collection infrastructure to model architecture and evaluation metrics.
Explicit Feedback — Ratings
Then in 2017, Netflix quietly removed the 5-star rating system and replaced it with a simple thumbs up / thumbs down.
Why? Because data showed that most users never gave 5-star ratings — they were too demanding of effort. The thumbs system generated 200% more ratings per user almost immediately. But even that wasn't the real revolution: Netflix had realised that what you watch tells them far more than what you rate. By 2017, ratings had become a secondary signal. Behaviour had become primary.
Ratings are the canonical form of explicit feedback. Users assign a numerical or categorical score to an item — expressing how much they liked or disliked it. Ratings are clean, unambiguous, and directly interpretable. They are also frustratingly rare.
On most platforms, ★1 and ★5 ratings are massively over-represented because users only bother to rate when they feel strongly. This bias must be corrected before training — typically by mean-centring each user's ratings.
import numpy as np
import pandas as pd
# ── Raw ratings dataset ───────────────────────────────────────
ratings = pd.DataFrame({
'user' : ['Ali','Ali','Ali','Ben','Ben','Ben','Cara','Cara'],
'item' : ['A','B','C','A','B','D','B','C'],
'rating': [5, 4, 1, 3, 4, 5, 2, 5],
})
# ── Problem: raw ratings confound user generosity with preference
# Ali gives 5,4,1 — tough rater. Cara gives 2,5 — wide spread.
# A raw "4" from Ali ≠ a "4" from a generous rater.
# ── Solution: mean-centre per user ───────────────────────────
user_means = ratings.groupby('user')['rating'].transform('mean')
ratings['rating_norm'] = ratings['rating'] - user_means
print(ratings.to_string(index=False))
print("\nUser mean ratings:")
print(ratings.groupby('user')['rating'].mean().round(2))
Ali's raw "4" and Ben's raw "4" look identical — but Ali averages 3.33 (tough rater) while Ben averages 4.00 (generous rater). Ali's "4" is actually above her norm; Ben's "4" is exactly average. After mean-centring, Ali's 4 becomes +0.67 and Ben's 4 becomes 0.00 — now they correctly signal different sentiment. Always mean-centre explicit ratings before training any collaborative filtering model.
Explicit Feedback — Likes & Dislikes
Binary feedback — thumbs up / thumbs down, heart / no heart, upvote / downvote — is the simplest form of explicit preference signal. What it loses in granularity, it more than compensates for in volume and participation rate.
| Platform | Signal Type | Positive | Negative | How It's Used |
|---|---|---|---|---|
| Netflix | Thumbs binary | 👍 Thumbs Up | 👎 Thumbs Down | Directly shifts genre weights in taste profile; 👎 triggers suppression of similar content for 6+ months |
| YouTube | Like / Dislike | 👍 Like | 👎 Dislike | Dislike removed from public count in 2021 but remains a private signal in ranking algorithm |
| Spotify | Heart + hide | ❤️ Like / Save | 🚫 Hide song | Heart adds to Liked Songs; Hide immediately removes from radio/playlist and suppresses artist for 30 days |
| Up/Down vote | ⬆️ Upvote | ⬇️ Downvote | Net score determines feed ranking; personalised feed weights subreddit affinity from upvote history | |
| TikTok | Heart | ❤️ Heart | — (no explicit negative) | Heart is explicit signal; TikTok infers negative preference from rapid scroll, "Not interested" press |
Positive and negative signals are not mirror images. A like is a weak positive signal — it costs nothing, users click it casually. A dislike is a strong negative signal — users only bother when they feel strongly enough to complain. This asymmetry means negative signals should carry greater weight per instance in your model. Spotify's "Hide" suppresses an artist for 30 days because they treat it as a high-confidence negative — far more informative than the absence of a like.
import pandas as pd
import numpy as np
# ── Binary feedback: asymmetric weighting ────────────────────
interactions = pd.DataFrame({
'user' : ['Ali','Ali','Ali','Ben','Ben','Cara','Cara'],
'item' : ['A','B','C','A','C','B','D'],
'signal' : ['like','dislike','like','like','dislike','like','like'],
})
# Asymmetric weights: dislike carries 3× the signal strength of a like
weight_map = {'like': 1.0, 'dislike': -3.0}
interactions['weight'] = interactions['signal'].map(weight_map)
# ── Build weighted user preference score per item ─────────────
user_item_score = (
interactions
.groupby(['user', 'item'])['weight']
.sum()
.reset_index()
.rename(columns={'weight': 'pref_score'})
)
print(user_item_score.to_string(index=False))
# ── Suppress disliked items from candidate pool ───────────────
suppress = interactions[interactions['signal'] == 'dislike'][['user','item']]
print(f"\nSuppressed (user, item) pairs:")
print(suppress.to_string(index=False))
Implicit Feedback — Clicks
The click happened. The signal was recorded. But the interpretation of that click requires context — position on page, how long they stayed, whether they scrolled, whether they added to cart, whether they bounced immediately back.
Clicks are the most abundant implicit signal in the digital world, and the most treacherous. They are simultaneously the most valuable data you have and the easiest to misread. Learning to weight, de-noise, and contextualise click data is one of the most important skills in industrial recommendation engineering.
Position Bias — The Elephant in the Room
Users are far more likely to click items that appear higher on a list — not because those items are better, but because they are seen first. This is called position bias, and it is the most dangerous confound in click data. If you train a model on raw click data, you will systematically reinforce items that were already ranked highly — regardless of their true quality.
This decay curve means training on raw clicks teaches the model "position 1 is good" rather than "this specific item is relevant." Inverse Propensity Scoring (IPS) corrects for this by down-weighting clicks from high positions.
import numpy as np
import pandas as pd
# ── Click data with position information ──────────────────────
clicks = pd.DataFrame({
'user' : ['Ali','Ali','Ben','Ben','Cara','Cara'],
'item' : ['A','B','A','C','B','D'],
'position': [1, 3, 1, 5, 2, 4],
'clicked' : [1, 1, 1, 1, 1, 1],
})
# ── Inverse Propensity Scoring (IPS) ──────────────────────────
# P(click | position) estimated from empirical CTR curve
propensity = {1: 0.20, 2: 0.13, 3: 0.09, 4: 0.07, 5: 0.05}
clicks['propensity'] = clicks['position'].map(propensity)
# IPS weight: down-weight easy (high-position) clicks
clicks['ips_weight'] = 1.0 / clicks['propensity']
print(clicks[['user','item','position','propensity','ips_weight']]
.to_string(index=False))
print("\nIPS-weighted click value by item:")
print(clicks.groupby('item')['ips_weight']
.sum().sort_values(ascending=False).round(2))
Item C was clicked at position 5 — with only a 5% chance of being seen. The fact that a user scrolled past four items to click it is a much stronger signal of genuine interest than clicking item A at position 1, which nearly everyone sees. IPS re-weights accordingly: item C's click is worth 20× a "fair" click; item A's click is worth only 5×. Raw click counts completely obscure this difference.
Implicit Feedback — Watch Time & Dwell Time
YouTube's engineering team ran the numbers and discovered something stunning: videos optimised for clicks had terrible completion rates. Users clicked, felt cheated, and left after 10 seconds. Meanwhile, genuinely excellent long-form content had lower click rates — but users who clicked watched it to the end.
YouTube switched their recommendation objective from "maximise clicks" to "maximise watch time". The clickbait era collapsed overnight. Watch time became — and remains — the single most important signal in the YouTube recommendation algorithm. It is the feedback signal that cannot be easily gamed, because you cannot fake spending 45 minutes on a video you didn't enjoy.
Watch time (for video), dwell time (for articles), listen duration (for podcasts/music), and reading time are time-based implicit signals that are considerably more informative than binary clicks. Time is a scarce resource — users who give it are signalling genuine engagement.
Completion rate is far more nuanced than binary click. A 10-second bounce after a click is a negative signal. A full watch plus replay is one of the strongest positive signals a recommender can receive.
import pandas as pd
import numpy as np
# ── Video watch events ────────────────────────────────────────
watches = pd.DataFrame({
'user' : ['Ali','Ali','Ben','Ben','Cara','Cara'],
'video' : ['V1','V2','V1','V3','V2','V3'],
'duration_s' : [1800, 1800, 1800, 2400, 1800, 2400], # video length
'watched_s' : [1750, 120, 900, 2380, 1800, 85], # seconds watched
'replayed' : [True, False, False, True, False, False],
})
# ── Compute completion rate ───────────────────────────────────
watches['completion'] = (watches['watched_s'] / watches['duration_s']).clip(0, 1)
# ── Sentiment label from completion rate ──────────────────────
def watch_sentiment(row):
c = row['completion']
if c < 0.10: return 'bounce'
elif c < 0.35: return 'browse'
elif c < 0.80: return 'engaged'
else: return 'loved'
watches['sentiment'] = watches.apply(watch_sentiment, axis=1)
# ── Implicit score: completion + replay bonus ─────────────────
watches['impl_score'] = watches['completion'] + watches['replayed'].astype('float') * 0.5
cols = ['user','video','completion','sentiment','impl_score']
print(watches[cols].to_string(index=False))
(1) Completion rate — a 97% completion is far stronger than a 50% completion, both qualitatively and quantitatively. (2) Replay / re-watch — watching something twice or returning to it signals the highest tier of positive sentiment. (3) Drop-off point — where in the video a user stopped tells you whether they abandoned due to a technical issue (very early) or after getting the value they wanted (near end). A drop at 95% is not a negative signal.
Implicit Feedback — Purchase History
You were buying for your niece's fifth birthday. You have no niece anymore — she's twenty-three. You will never buy princess fairy lights again.
Purchase history is the strongest implicit signal in e-commerce — someone actually spent money, which is far more committed than a click or even a long scroll. But it is also the most context-dependent. Every purchase exists in a moment: was it a gift? A one-time need? An impulse? A recurring necessity? Without modelling purchase intent, even the strongest signal can mislead catastrophically.
Purchase history is the gold-standard implicit signal for e-commerce recommendation. A transaction represents the maximum commitment a user can make — they gave money. It is a strong positive signal about intent and satisfaction. But purchases require careful contextualisation: gift-buying, seasonal purchases, one-time needs, and category exploration all create false taste signals if naively fed into a model.
| Purchase Pattern | Naive Interpretation | Correct Interpretation | Mitigation Strategy |
|---|---|---|---|
| One-time purchase of unusual category | Strong interest in category | Likely a gift or one-off need | Decay signal over time; require repeat purchase before high-weighting |
| Repeat purchase of same item | Strong preference — replenishment | Correct — high-confidence signal | Upweight heavily; trigger subscription or bulk-buy recommendation |
| Purchase immediately after browsing | Deliberate choice — strong signal | Correct — high intent | Full positive weight; expand into adjacent categories |
| Purchase + immediate return | Ambiguous | Negative — item did not meet expectations | Return event should flip purchase weight negative; suppress similar items |
| Seasonal spike purchase | Ongoing interest in category | Time-bound event (Christmas, birthday) | Timestamp features in model; only recommend seasonal content in window |
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
# ── Purchase history with context signals ─────────────────────
now = datetime(2024, 6, 15)
purchases = pd.DataFrame({
'user' : ['Ali','Ali','Ali','Ben','Ben'],
'item' : ['Blender','Coffee','Toy','Coffee','Headphones'],
'category' : ['Kitchen','Grocery','Toys','Grocery','Electronics'],
'date' : [now - timedelta(days=d)
for d in [400, 30, 360, 20, 5]],
'returned' : [False, False, True, False, False],
'repeat_buy': [False, True, False, True, False],
})
# ── Time-decay: recent purchases are more relevant ────────────
purchases['days_ago'] = (now - purchases['date']).dt.days
purchases['time_decay'] = np.exp(-purchases['days_ago'] / 90) # 90-day half-life
# ── Base signal: return = -1, normal purchase = +1 ───────────
purchases['base_signal'] = np.where(purchases['returned'], -1.0, 1.0)
# ── Repeat bonus multiplier ───────────────────────────────────
purchases['repeat_mult'] = np.where(purchases['repeat_buy'], 2.0, 1.0)
# ── Final implicit score ──────────────────────────────────────
purchases['impl_score'] = (purchases['base_signal']
* purchases['time_decay']
* purchases['repeat_mult'])
cols = ['user','item','days_ago','returned','repeat_buy','impl_score']
print(purchases[cols].round(3).to_string(index=False))
Ali's Blender purchase (400 days ago) is nearly irrelevant after time decay. Her Coffee — recent, repeated — scores 1.51, the strongest positive signal. The returned Toy correctly becomes a negative signal (−0.018), suppressing toy recommendations. Ben's repeated Coffee and recent Headphones purchase are both strong signals with different recency weights. This single feature engineering step transforms raw transaction logs into a nuanced taste profile.
Combining All Signals — The Unified Feedback Matrix
In production, no single feedback signal is trusted alone. Every signal has biases, gaps, and misinterpretation risks. The art of feedback engineering is building a unified confidence-weighted interaction score that aggregates all available signals into a single interpretable value.
The confidence hierarchy guides how much weight to assign each signal type. A single purchase or explicit dislike outweighs dozens of passive impressions. Production systems accumulate evidence across all tiers before drawing strong conclusions.
import pandas as pd
import numpy as np
# ── Unified implicit score from multiple signal types ─────────
# Signal weights: higher = more trustworthy
SIGNAL_WEIGHTS = {
'impression' : 0.01,
'click' : 0.10,
'watch_50pct': 0.30,
'watch_full' : 0.60,
'like' : 0.70,
'dislike' : -1.50, # negative and high-magnitude
'rating_4' : 0.65,
'rating_5' : 0.90,
'rating_1' : -1.20,
'purchase' : 1.00,
'rewatch' : 1.20,
}
# ── Example: Ali's interactions with Item V1 ─────────────────
ali_v1_signals = ['impression', 'click', 'watch_full', 'rewatch', 'like']
ali_v1_score = sum(SIGNAL_WEIGHTS[s] for s in ali_v1_signals)
# ── Example: Ben's interactions with Item V2 ─────────────────
ben_v2_signals = ['impression', 'click', 'watch_50pct', 'dislike']
ben_v2_score = sum(SIGNAL_WEIGHTS[s] for s in ben_v2_signals)
# ── Build interaction table ───────────────────────────────────
data = []
for sig in ali_v1_signals:
data.append({'user':'Ali','item':'V1','signal':sig,'weight':SIGNAL_WEIGHTS[sig]})
for sig in ben_v2_signals:
data.append({'user':'Ben','item':'V2','signal':sig,'weight':SIGNAL_WEIGHTS[sig]})
df = pd.DataFrame(data)
summary = df.groupby(['user','item'])['weight'].sum().reset_index()
summary.columns = ['user','item','unified_score']
print(df.to_string(index=False))
print("\nUnified Interaction Scores:")
print(summary.to_string(index=False))
Ali's unified score of +2.61 makes V1 a top recommendation candidate for Ali — confirmed by every signal tier. Ben's score of −1.09 marks V2 as content to be suppressed in Ben's feed, because the dislike signal (+click+watch) overwhelms the positive impressions. This is the power of unified signal weighting: a nuanced, multi-signal portrait of preference invisible to any single signal alone.
Explicit vs Implicit — The Complete Comparison
| Property | Explicit Feedback | Implicit Feedback |
|---|---|---|
| Volume / Density | Sparse — <2% of users provide it | Dense — 100% of users generate it continuously |
| Signal Noise | Low — intentional declaration | High — requires careful interpretation |
| Negative Signal | Clear — 1-star or thumbs-down | Ambiguous — not clicking ≠ disliking |
| User Effort | Requires deliberate action | Zero effort — automatic behavioural trace |
| Truthfulness | Susceptible to social desirability bias | Hard to fake — behaviour is honest |
| Best Use Case | Cold start seeding, preference calibration | Core training signal for large-scale CF |
| Key Preprocessing | Mean-centring per user, J-curve correction | IPS weighting, time decay, deduplication |
| Platforms That Rely On It | Yelp, IMDb, Goodreads, older Netflix | TikTok, YouTube, Amazon, Spotify, modern Netflix |