-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathheart-framework.html
More file actions
218 lines (198 loc) · 20.6 KB
/
heart-framework.html
File metadata and controls
218 lines (198 loc) · 20.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Google HEART + GSM — defining what success looks like</title>
<link rel="stylesheet" href="framework.css">
<style>
/* Page-accent — overrides framework.css fallback */
:root{--page-accent:var(--gold);--page-accent-soft:var(--gold-soft)}
/* HEART letters */
.letters{display:grid;grid-template-columns:repeat(5,1fr);gap:10px;margin:14px 0}
.ltr{background:#fff;border:1px solid var(--line);border-top:5px solid var(--page-accent);border-radius:10px;padding:14px;text-align:center}
.ltr .big{font-family:Georgia,serif;font-size:32px;color:var(--page-accent);font-weight:700;line-height:1}
.ltr .nm{font-weight:700;font-size:13px;margin:4px 0}
.ltr .ds{font-size:12px;color:var(--ink-soft);line-height:1.45}
/* GSM table */
.gsm{background:#fff;border:1px solid var(--line);border-radius:12px;padding:18px 22px;margin:14px 0;box-shadow:var(--shadow);border-left:5px solid var(--page-accent)}
.gsm table{width:100%;border-collapse:collapse;margin-top:10px;font-size:14px}
.gsm th,.gsm td{padding:9px 10px;text-align:left;border-bottom:1px solid var(--line);vertical-align:top}
.gsm thead th{background:var(--ink);color:#f3efe6;font-size:12px;letter-spacing:.05em;text-transform:uppercase}
.gsm tbody td:first-child{font-weight:700;color:var(--page-accent)}
@media(max-width:680px){.letters{grid-template-columns:repeat(3,1fr)}}
</style>
</head>
<body>
<nav class="sitenav">
<details>
<summary>📑 Jump to</summary>
<div class="navmenu">
<div class="navgrp"><h4>Start here</h4>
<a href="index.html"><b>← Home (goal & map)</b></a>
<a href="impact-saas-companies.html">SaaS / B2B field study</a>
<a href="impact-consumer-companies.html">Consumer-tech field study</a>
<a href="methodologies-comparison.html"><b>All methods compared →</b></a>
<a href="experiment-trustworthiness.html">How 40k tests actually work →</a>
<a href="jargon.html">Jargon (glossary)</a>
</div>
<div class="navgrp"><h4>Scoring & Input modeling</h4>
<a href="rice-framework.html">RICE (Intercom)</a>
<a href="north-star-framework.html">North Star (Amplitude / Slack)</a>
</div>
<div class="navgrp"><h4>Goal-laddering / Define first</h4>
<a href="v2mom-framework.html">V2MOM (Salesforce)</a>
<a href="pyramid-of-clarity-framework.html">Pyramid of Clarity (Asana)</a>
<a href="pr-faq-framework.html">PR-FAQ / Working Backwards (Amazon)</a>
<a class="cur" href="heart-framework.html">HEART (Google)</a>
<a href="dibb-framework.html">DIBB (Spotify)</a>
</div>
<div class="navgrp"><h4>Experimentation (SaaS)</h4>
<a href="microsoft-exp-framework.html">Microsoft ExP / CUPED</a>
<a href="linkedin-xlnt-framework.html">LinkedIn T-REX</a>
</div>
<div class="navgrp"><h4>Experimentation (Consumer)</h4>
<a href="netflix-experimentation.html">Netflix · ABlaze</a>
<a href="booking-experimentation.html">Booking.com</a>
<a href="airbnb-erf-framework.html">Airbnb ERF</a>
<a href="uber-xp-framework.html">Uber XP</a>
<a href="doordash-switchback-framework.html">DoorDash switchback</a>
<a href="lyft-experimentation.html">Lyft</a>
<a href="pinterest-ab-framework.html">Pinterest</a>
</div>
<div class="navgrp"><h4>AI labs</h4>
<a href="anthropic-pm-on-ai-exponential.html">Anthropic · PM on AI exponential</a>
<a href="google-customer-zero-2026.html">Google · "Customer zero" 2026</a>
</div>
<div class="navgrp"><h4>Written discipline</h4>
<a href="stripe-shaping-framework.html">Stripe shaping</a>
</div>
</div>
</details>
</nav>
<div class="wrap">
<header class="masthead">
<p class="kicker">Methods · Deep-dive · Define impact before building</p>
<h1>Google HEART + Goals-Signals-Metrics — defining what success looks like <span class="srcyr">2010</span></h1>
<p class="sub">From Google Research (Rodden, Hutchinson, Fu — CHI 2010): five UX outcome categories (<b>HEART</b>) plus a three-step ladder (<b>Goals → Signals → Metrics</b>) that turns intent into measurable numbers.</p>
<p class="sub">Used to set the success bar <em>before</em> an experiment runs — the metric-picking front end that pairs with controlled experimentation on the back end.</p>
<div class="goal"><span>Goal</span><br>Decide features by data-backed expected impact — choose by outcome, not by to-do list or opinion.</div>
</header>
<div class="eli">
<div class="lbl">🎓 8th-grade version</div>
Before you build a feature, write down what <b>winning</b> looks like — not "users will love it," but <em>one number</em> that will go up if it works. HEART gives you a 5-item menu of things a UX can move: are people <b>H</b>appy, do they <b>E</b>ngage, do new people <b>A</b>dopt, do old people come back (<b>R</b>etention), can they finish the <b>T</b>ask. For each one you pick, ladder it down: <b>Goal</b> (what the user wants) → <b>Signal</b> (an action that proves it) → <b>Metric</b> (the number you'll count). Pick 2–3 dimensions — not all five. Build, ship, then check whether the metric actually moved.
</div>
<nav class="toc">
<a href="#headline">Honest headline</a>
<a href="#anatomy">HEART + GSM</a>
<a href="#mechanism">How it picks metrics</a>
<a href="#example">Worked example</a>
<a href="#apply">Apply to a sheet</a>
<a href="#limits">What it doesn't do</a>
<a href="methodologies-comparison.html" style="color:var(--gold);font-weight:700">Comparison table →</a>
</nav>
<div class="finding" id="headline">
<h2>The honest headline: a metric-picker, not an impact estimator</h2>
<p>HEART/GSM is a tool for answering <b>"what does success look like for this feature?"</b> before you ship. It picks the <em>metrics</em>. Whether the feature actually <em>moves</em> those metrics is decided later — usually by a <a class="cite" href="microsoft-exp-framework.html">controlled experiment</a>.</p>
<p>So Google's published method is split: <b>HEART/GSM at the front</b> ("here's the metric we expect to move") + <b>experimentation at the back</b> ("here's the lift we actually measured"). Either half alone is incomplete.</p>
</div>
<!-- ANATOMY -->
<h2 class="sec" id="anatomy">The anatomy — HEART is five dimensions, GSM is the ladder</h2>
<p class="secsub">HEART says "here are five things a UX could move." GSM says "for any goal you pick, ladder it down to a measurable metric in three steps." Use HEART as the menu, GSM as the cookbook.</p>
<div class="letters">
<div class="ltr"><div class="big">H</div><div class="nm">Happiness</div><div class="ds">Subjective satisfaction (surveys, NPS — <a class="j" href="jargon.html#nps">Net Promoter Score</a>, a 0–10 "would you recommend" question).</div></div>
<div class="ltr"><div class="big">E</div><div class="nm">Engagement</div><div class="ds">Frequency, depth, intensity of use.</div></div>
<div class="ltr"><div class="big">A</div><div class="nm">Adoption</div><div class="ds">New users / new feature uptake.</div></div>
<div class="ltr"><div class="big">R</div><div class="nm">Retention</div><div class="ds">Returning users over time.</div></div>
<div class="ltr"><div class="big">T</div><div class="nm">Task success</div><div class="ds">Completion / time / error rates.</div></div>
</div>
<div class="gsm">
<h3 style="margin:0 0 4px;font-family:Georgia,serif;font-size:17px">Goals → Signals → Metrics — the ladder</h3>
<p style="margin:0;font-size:13.5px;color:var(--ink-soft)">For each HEART dimension you care about, ladder it down through three layers. The discipline is that <b>every metric must trace upward to a stated goal</b> — no orphan metrics.</p>
<table>
<thead><tr><th>Layer</th><th>Question</th><th>Example (Engagement)</th></tr></thead>
<tbody>
<tr><td>Goal</td><td>What is the user trying to do?</td><td>Find & rebook restaurants they liked.</td></tr>
<tr><td>Signal</td><td>What observable behaviour shows progress / failure?</td><td>Return visits to a merchant page; re-booking same restaurant.</td></tr>
<tr><td>Metric</td><td>How do we count that signal?</td><td><code>% of bookings from previously-visited merchants per user / month</code></td></tr>
</tbody>
</table>
</div>
<div class="src">Source: <a class="cite" href="https://research.google/pubs/measuring-the-user-experience-on-a-large-scale-user-centered-metrics-for-web-applications/">Google Research — Rodden, Hutchinson, Fu, "Measuring the User Experience on a Large Scale" (CHI 2010)</a> · <a class="cite" href="https://ixdf.org/literature/article/google-s-heart-framework-for-measuring-ux">IxDF explainer</a>.</div>
<!-- MECHANISM -->
<h2 class="sec" id="mechanism">How HEART/GSM actually picks the metrics</h2>
<p class="secsub">Four steps. Step 1 — picking <em>which</em> HEART dimensions to ignore — is more important than which to include.</p>
<div class="step"><div class="num">1</div><div><h3>Choose 2–3 HEART dimensions, not all five</h3><p>Most features don't move all five. A new search UI is about Task-success + Engagement, not Happiness. Naming the dimensions you're <em>not</em> measuring is the discipline that keeps the metric list short.</p></div></div>
<div class="step"><div class="num">2</div><div><h3>For each dimension, write a goal in user language</h3><p>"User can rebook in < 3 clicks" — not "the booking funnel is faster." The goal must be observable from the user's perspective.</p></div></div>
<div class="step"><div class="num">3</div><div><h3>Identify signals — observable behaviours</h3><p>What action would a user take that <em>proves</em> the goal is met? A signal is more concrete than a goal but not yet a number.</p></div></div>
<div class="step"><div class="num">4</div><div><h3>Pick the metric — how the signal is counted</h3><p>The metric is the formula. Often there are several candidates per signal — pick the one cheapest to compute and least gameable.</p></div></div>
<!-- EXAMPLE -->
<h2 class="sec" id="example">Worked example — HEART/GSM applied to Gmail's "Snooze" feature</h2>
<p class="secsub">Illustrative application of the published Rodden/Hutchinson/Fu framework to a real Gmail feature, to show how the ladder runs end-to-end. The dimensions and ladder are Google's; the specific goals/signals/metrics here are an educational reconstruction.</p>
<div class="gsm">
<table>
<thead><tr><th>Dimension</th><th>Goal</th><th>Signal</th><th>Metric</th></tr></thead>
<tbody>
<tr><td>Adoption</td><td>Users discover Snooze and try it.</td><td>First-time use of the Snooze action on a thread.</td><td><code>% of weekly active users who use Snooze at least once / month</code></td></tr>
<tr><td>Engagement</td><td>Snooze becomes a repeat habit, not a one-off.</td><td>Re-use of Snooze across multiple sessions.</td><td><code>median snoozes per active snoozer / week</code></td></tr>
<tr><td>Task success</td><td>Snoozed messages re-surface and get acted on.</td><td>Re-surfaced thread is opened, replied to, or archived.</td><td><code>% of snoozed threads acted on within 24h of re-surfacing</code></td></tr>
<tr><td>Happiness (guardrail)</td><td>Snooze doesn't cause messages to be lost or missed.</td><td>Complaints, "where did my email go?" support contacts.</td><td><code>support contacts mentioning Snooze / 1k snoozers</code></td></tr>
</tbody>
</table>
</div>
<p style="font-size:13px;color:var(--ink-soft)">Reading: four metrics, all traceable upward to a goal. The Happiness row is treated as a <em>guardrail</em> — it can't drive shipping the feature, but it can stop you from shipping if it goes the wrong way. (The word "guardrail" comes from the experimentation literature, not the 2010 HEART paper — adding one is common practice when HEART is used alongside <a class="cite" href="microsoft-exp-framework.html">controlled experiments</a>.) Notice how the ladder forces you off "engagement is up" hand-waving onto a specific number you can argue with.</p>
<!-- APPLY TO A SHEET -->
<h2 class="sec" id="apply">Apply to a feature sheet</h2>
<p class="secsub">HEART/GSM doesn't rank features — it tells you <strong>what metric each feature owes a result on</strong>. If you adopt it, every shipped feature in your backlog gains a HEART row: which dimension(s) it moves, the goal/signal/metric trio, baseline, target, actual, and the decision you make once the data lands.</p>
<div class="note" style="background:var(--teal-soft);border-left-color:var(--teal)"><b>Try it Monday morning (30 minutes).</b> Pick one shipped feature on your team. Open a doc. Write its goal in one sentence in <em>user</em> language ("a user wants to <em>X</em>"). Pick 2 HEART letters that fit. For each letter, name one signal (an action that proves the goal) and one metric (the number that counts the signal). That's it — you've HEART-laddered a feature. Do this for three features in a row and you'll spot which ones in your backlog have no measurable goal hiding behind them.</div>
<p style="font-size:13.5px;color:var(--ink-soft);margin:0 0 14px">From Google's CHI paper: pick 2–3 HEART dimensions per feature (not all five) — the paper notes that metrics should be chosen "based on the outcomes required" rather than applied wholesale. <span style="color:var(--ink-soft)">Common addition from the experimentation tradition: include at least one <em>guardrail</em> from a dimension you're <em>not</em> trying to lift (Happiness is the usual choice).</span></p>
<div class="extable" style="overflow-x:auto;margin:14px 0">
<table class="ex" style="border-collapse:collapse;width:100%;font-size:13.5px;background:#fff;border:1px solid var(--line);border-radius:10px;overflow:hidden">
<thead><tr><th>Column to add</th><th>What it captures</th><th>How you fill it</th></tr></thead>
<tbody>
<tr><td>Feature</td><td>Shipped or in-flight item</td><td>Backlog title</td></tr>
<tr><td>HEART dimension(s)</td><td>Which 2–3 dimensions this feature claims to move</td><td>H / E / A / R / T — pick 2–3, name the ones you're <em>not</em> moving</td></tr>
<tr><td>Goal</td><td>What the user is trying to do, in user language</td><td>One sentence — observable from the user's perspective</td></tr>
<tr><td>Signal</td><td>Observable behaviour that shows progress</td><td>The action that proves the goal is being met</td></tr>
<tr><td>Metric</td><td>The formula that counts the signal</td><td>Cheapest-to-compute, hardest-to-game candidate</td></tr>
<tr><td>Baseline</td><td>Where the metric is today (before the feature)</td><td>From analytics — must exist before the build</td></tr>
<tr><td>Target</td><td>Where you want the metric to land</td><td>An ambitious-but-defensible number</td></tr>
<tr><td>Guardrail</td><td>Dimension that must <em>not</em> regress (usually Happiness)</td><td>Support contacts, NPS, complaint rate</td></tr>
<tr><td>Actual</td><td>Result after ramp</td><td>Post-launch readout</td></tr>
<tr><td>Decision</td><td>Keep / Iterate / Roll back / Kill</td><td>Per the rule below</td></tr>
</tbody>
</table>
</div>
<h3 style="font-family:Georgia,serif;font-size:18px;margin:24px 0 8px">Worked example — backlog snapshot for an email-client team</h3>
<p style="font-size:13.5px;color:var(--ink-soft);margin:0 0 12px">Eight features at different post-ship stages, shaped after the Gmail-Snooze ladder above. Illustrative numbers — the point is the verdict logic.</p>
<div class="extable">
<table class="ex">
<thead><tr><th>Feature</th><th>HEART</th><th>Primary metric</th><th>Baseline</th><th>Target</th><th>Actual</th><th>Guardrail (Happiness)</th><th>Decision</th></tr></thead>
<tbody>
<tr class="top"><td>Snooze (canonical)</td><td>A + E</td><td>% <a class="j" href="jargon.html#wau-dau-mau">WAU</a> snoozing / month</td><td>0%</td><td>15%</td><td>12%</td><td>Support: −0.2 / 1k</td><td class="score">Keep</td></tr>
<tr class="top"><td>Smart Compose</td><td>E + T</td><td>% suggestions accepted</td><td>0%</td><td>25%</td><td>30%</td><td>NPS flat</td><td class="score">Keep</td></tr>
<tr class="top"><td>Schedule send</td><td>A</td><td>% WAU using / month</td><td>0%</td><td>8%</td><td>11%</td><td>OK</td><td class="score">Keep</td></tr>
<tr class="top"><td>Confidential mode</td><td>A</td><td>% threads w/ confidential on</td><td>0%</td><td>5%</td><td>1.8%</td><td>OK — niche feature</td><td class="score">Keep · niche</td></tr>
<tr><td>AI-summarize-thread</td><td>E + T</td><td>summaries clicked / thread</td><td>0%</td><td>20%</td><td>9%</td><td>Support: +1.4 / 1k (mild)</td><td class="score" style="color:var(--gold)">Iterate</td></tr>
<tr><td>Voice dictation</td><td>A + E</td><td>% weekly active users using</td><td>0%</td><td>10%</td><td>4%</td><td>OK</td><td class="score" style="color:var(--gold)">Iterate</td></tr>
<tr><td>Inline calendar suggestions</td><td>T</td><td>events created / email</td><td>1.2%</td><td>4%</td><td>0.6%</td><td>OK</td><td class="score" style="color:var(--accent)">Kill</td></tr>
<tr><td>Reading-pane redesign</td><td>T + H</td><td>time-to-first-action</td><td>5.2s</td><td>4.5s</td><td>4.8s</td><td>NPS −3 pts</td><td class="score" style="color:var(--accent)">Roll back</td></tr>
</tbody>
</table>
</div>
<div class="note" style="background:var(--accent-soft);border-left-color:var(--accent)"><b>The most important reading skill on this page.</b> Notice that the <em>decision rule</em> lives in the relationship between three columns — <b>primary metric</b> (did it move?), <b>guardrail</b> (did anything regress?), and <b>target</b> (did we say what success would be <em>before</em> shipping?). A feature without all three filled is unshippable not because it's bad, but because you can't tell whether it worked. HEART/GSM's entire point is forcing you to write those three columns down <em>before</em> you build.</div>
<div class="note"><b>Decision rule.</b> <b>Keep</b> when the primary metric reaches (or comes close to) target <em>and</em> the guardrail holds. "Niche" is a legal verdict — Confidential mode kept because it serves a small segment well without harming others. <b>Iterate</b> when the metric moved less than expected but guardrails hold; change the variant, re-run. <b>Roll back</b> when the guardrail breaches (the reading-pane row: faster but unhappier). <b>Kill</b> when the metric didn't move and there's no obvious next iteration. HEART/GSM defines <em>what</em> winning looks like; <a class="cite" href="microsoft-exp-framework.html">controlled experiments</a> tell you whether you won.</div>
<!-- LIMITS -->
<h2 class="sec" id="limits">What HEART/GSM does not do</h2>
<div class="warn">
<b>It does not estimate the lift.</b> HEART picks <em>which</em> metric to watch; it does <em>not</em> say "we expect +5pp." That number comes from prior data, a <a class="cite" href="rice-framework.html">RICE Impact score</a>, a <a class="cite" href="pr-faq-framework.html">PR-FAQ Internal FAQ</a>, or a controlled <a class="cite" href="microsoft-exp-framework.html">experiment</a>.<br><br>
<b>It will produce too many metrics if you pick all five HEART dimensions.</b> The whole point of the framework is forcing trade-offs — discipline yourself to 2–3.
</div>
<div class="note"><b>Why HEART is on the list.</b> It is the cheapest, most teachable way to stop teams from saying "we'll just look at engagement" without naming a number. Five letters, one ladder — a 30-minute exercise per feature.</div>
<footer>
Companion to <a href="impact-consumer-companies.html#define">← Consumer case studies · Define impact first</a> · <a href="methodologies-comparison.html">All methods compared</a> · feeds: <a href="microsoft-exp-framework.html">Microsoft ExP</a> (HEART picks the OEC — <a class="j" href="jargon.html#oec">Overall Evaluation Criterion</a>, the single number an experiment is judged on)<br>
<b>Grounded in</b> Rodden, Hutchinson & Fu (CHI 2010) <em>Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications</em> — the original Google paper. <b>Verbatim from paper:</b> the title, authors, venue (CHI 2010), and the framing that metrics should be chosen "based on the outcomes required" rather than applying all five dimensions. <b>Paraphrased structurally:</b> the per-dimension definitions (summarised from the paper's well-known table) and the Goals → Signals → Metrics ladder (the paper introduces this process; the three-row table form here is our restatement). <b>Added by us, not in the paper:</b> the word "guardrail" (from the experimentation tradition — see Microsoft ExP); the Gmail Snooze ladder (educational reconstruction); the email-client backlog table (illustrative).
</footer>
</div>
</body>
</html>