Web 性能监控与可观测性

现代 Web 应用的复杂度持续增长，仅靠开发阶段的调试远远不够——线上环境千差万别，用户设备、网络、地域各不相同。性能监控与可观测性体系的核心目标是：在真实用户的真实环境中，持续度量、发现、诊断和修复问题。

可观测性三支柱 (Three Pillars of Observability)

┌───────────────────────────────────────────────────┐
│                   可观测性                          │
│                                                   │
│   ┌───────────┐  ┌───────────┐  ┌───────────┐    │
│   │  Metrics  │  │   Logs    │  │  Traces   │    │
│   │  指标度量  │  │  日志记录  │  │  链路追踪  │    │
│   │           │  │           │  │           │    │
│   │ LCP/CLS/  │  │ JS Error  │  │ 用户行为  │    │
│   │ INP/TTFB  │  │ API Error │  │ 请求链路  │    │
│   │ 资源加载   │  │ 资源异常  │  │ 调用关系  │    │
│   └───────────┘  └───────────┘  └───────────┘    │
│                                                   │
│   "你无法优化你无法度量的东西"                        │
│                    —— Peter Drucker                │
└───────────────────────────────────────────────────┘

本文将从 指标体系 → 数据采集 → 错误监控 → 用户行为 → 上报策略 → Sentry 实践 → 自建监控架构 → 面试高频问题 八个维度，系统构建前端监控与可观测性的完整知识图谱。

一、性能指标体系

Core Web Vitals

Core Web Vitals 是 Google 提出的衡量真实用户体验的核心指标集合，直接影响搜索排名（SEO），覆盖三大体验维度：

Core Web Vitals（2024+ 正式标准）

┌──────────────────┬──────────────────┬──────────────────┐
│    加载体验        │    交互响应       │    视觉稳定性     │
│                  │                  │                  │
│      LCP         │      INP         │      CLS         │
│  Largest         │  Interaction     │  Cumulative      │
│  Contentful      │  to Next         │  Layout          │
│  Paint           │  Paint           │  Shift           │
│                  │                  │                  │
│  最大内容绘制时间  │  交互到下一帧绘制  │  累积布局偏移     │
└──────────────────┴──────────────────┴──────────────────┘

历史演进：
  2020  FID 作为交互指标
  2024.3  INP 正式替代 FID

LCP（Largest Contentful Paint）

LCP 衡量的是视口内最大可见内容元素的渲染完成时间，反映用户感知的"页面加载完了"的时刻。

候选元素类型：

<img> 元素
<svg> 内的 <image> 元素
<video> 的封面帧（poster）
通过 url() 加载背景图的块级元素
包含文本节点的块级元素

LCP 时间线：

navigationStart
  │
  ├── TTFB ──┤
  │          FCP
  │           │
  │           ├── 资源加载（图片/字体/CSS）──┤
  │           │                            LCP 触发
  │           │                             │
  ├───────────┼─────────────────────────────┤
  0s         1s                           2.5s (Good)

LCP 阈值：

  Good              Needs Improvement              Poor
  ≤ 2.5s         ─────────────────────          > 4.0s
├──────────────┼──────────────────────────┼──────────────┤
0s            2.5s                       4.0s

LCP 组成拆解：

LCP = TTFB + Resource Load Delay + Resource Load Time + Element Render Delay

┌────────┬──────────────────┬──────────────────┬──────────────────┐
│  TTFB  │ Resource Load    │ Resource Load    │ Element Render   │
│        │ Delay            │ Time             │ Delay            │
│        │ (发现到开始加载)  │ (资源下载耗时)    │ (下载完到渲染)    │
└────────┴──────────────────┴──────────────────┴──────────────────┘

INP（Interaction to Next Paint）

INP 取代了 FID，成为衡量页面交互响应性的核心指标。与 FID 只测量首次交互的延迟不同，INP 考量的是整个页面生命周期内所有交互中最慢的那一次（取 P98 近似值）。

INP 与 FID 的本质区别：

FID：只测量首次交互的 Input Delay（输入延迟）
     用户点击 → 事件处理开始
     ├── Input Delay ──┤
     仅此一次

INP：测量每次交互的完整延迟
     用户点击 → 事件处理 → 下一帧渲染
     ├── Input Delay ──┼── Processing ──┼── Presentation Delay ──┤
     │                                                          │
     整个交互延迟 = Input Delay + Processing Time + Presentation Delay

INP 取所有交互中最慢的 P98 值

INP 阈值：

  Good              Needs Improvement              Poor
  ≤ 200ms        ─────────────────────          > 500ms
├──────────────┼──────────────────────────┼──────────────┤
0ms          200ms                      500ms

INP 覆盖的交互类型：

鼠标点击（click）
触摸屏点击（tap）
键盘按键（keypress）
不包括 hover 和 scroll

CLS（Cumulative Layout Shift）

CLS 衡量页面在整个生命周期中所有意外布局偏移的总和，反映视觉稳定性。

布局偏移分数计算：

Layout Shift Score = Impact Fraction × Distance Fraction

Impact Fraction：受影响区域占视口的比例
Distance Fraction：元素移动距离占视口的比例

示例：
  一个占视口 50% 的元素向下移动了视口高度的 25%
  Score = 0.5 × 0.25 = 0.125

CLS 的会话窗口算法：

  ┌─ Session Window 1 ─┐      ┌─ Session Window 2 ─┐
  │ shift shift shift   │      │ shift  shift        │
  ├─────────────────────┤      ├─────────────────────┤
  │ gap < 1s, 总时长≤5s │      │ gap < 1s, 总时长≤5s │
  └─────────────────────┘      └─────────────────────┘
         ↓                            ↓
     窗口总分 0.15                 窗口总分 0.08
         ↓
  CLS = max(所有窗口总分) = 0.15

CLS 阈值：

  Good              Needs Improvement              Poor
  ≤ 0.1          ─────────────────────          > 0.25
├──────────────┼──────────────────────────┼──────────────┤
0             0.1                        0.25

常见导致 CLS 的原因：

没有尺寸的图片和视频
动态注入的内容（广告、弹窗）
Web 字体加载引起的 FOIT/FOUT
DOM 操作导致的元素位移

其他关键指标

指标全景图：

时间线：
  ├── TTFB ──┼── FCP ──┼────── LCP ──────┤
  │          │         │                 │
  │  服务端   │ 首次内容 │   最大内容绘制   │
  │  响应时间 │  绘制    │                 │
  └──────────┴─────────┴─────────────────┘

  ├─── FID/INP ───┤   (交互响应指标)
  ├─── TBT ───────┤   (主线程阻塞时间)
  ├─── CLS ───────────────────────────┤  (视觉稳定性)

指标	全称	含义	Good	Needs Improvement	Poor
TTFB	Time to First Byte	从请求发出到收到第一个字节的时间	≤ 800ms	800ms ~ 1800ms	> 1800ms
FCP	First Contentful Paint	首次渲染任何文本/图像/SVG/Canvas 的时间	≤ 1.8s	1.8s ~ 3.0s	> 3.0s
LCP	Largest Contentful Paint	最大可见内容渲染完成时间	≤ 2.5s	2.5s ~ 4.0s	> 4.0s
INP	Interaction to Next Paint	交互到下一帧渲染延迟 (P98)	≤ 200ms	200ms ~ 500ms	> 500ms
CLS	Cumulative Layout Shift	累积布局偏移分数	≤ 0.1	0.1 ~ 0.25	> 0.25
FID	First Input Delay	首次交互输入延迟（已被 INP 替代）	≤ 100ms	100ms ~ 300ms	> 300ms
TBT	Total Blocking Time	FCP 到 TTI 之间所有长任务阻塞主线程的时间总和	≤ 200ms	200ms ~ 600ms	> 600ms

指标之间的关系

指标关系图：

                    ┌──────────┐
                    │   TTFB   │
                    │ 服务端耗时 │
                    └────┬─────┘
                         │ 影响
                         ▼
                    ┌──────────┐
                    │   FCP    │
                    │ 首次内容  │
                    └────┬─────┘
                         │ 影响
                         ▼
                    ┌──────────┐
                    │   LCP    │◄──── 资源加载 + 渲染阻塞
                    │ 最大内容  │
                    └──────────┘

  ┌──────────┐     ┌──────────┐
  │   TBT    │────►│   INP    │
  │ 主线程阻塞 │     │ 交互响应  │
  └──────────┘     └──────────┘
   主线程繁忙            交互卡顿

  ┌──────────┐
  │   CLS    │◄──── 独立维度，不依赖加载速度
  │ 布局稳定性 │
  └──────────┘

核心结论：
  TTFB ─影响→ FCP ─影响→ LCP  (加载链路)
  TBT  ─关联→ INP             (交互响应)
  CLS  ─独立维度─              (视觉稳定)

二、性能数据采集

Performance API 基础

浏览器提供了完整的 Performance API 用于性能数据采集，核心包括 Navigation Timing、Resource Timing 和 PerformanceObserver。

Navigation Timing Level 2 时间线：

startTime
  │
  ├─ redirectStart ─── redirectEnd
  │                        │
  │                   fetchStart
  │                        │
  │                   domainLookupStart ─── domainLookupEnd
  │                                             │
  │                                        connectStart
  │                                             │
  │                                   secureConnectionStart (HTTPS)
  │                                             │
  │                                        connectEnd
  │                                             │
  │                                      requestStart
  │                                             │
  │                                      responseStart ──── TTFB
  │                                             │
  │                                      responseEnd
  │                                             │
  │                                    domInteractive
  │                                             │
  │                              domContentLoadedEventStart
  │                                             │
  │                              domContentLoadedEventEnd
  │                                             │
  │                                       domComplete
  │                                             │
  │                                   loadEventStart
  │                                             │
  │                                   loadEventEnd
  │
  └────────────────────────────────────────────────

const [entry] = performance.getEntriesByType('navigation');

const timing = {
  redirect: entry.redirectEnd - entry.redirectStart,
  dns: entry.domainLookupEnd - entry.domainLookupStart,
  tcp: entry.connectEnd - entry.connectStart,
  ssl: entry.secureConnectionStart > 0
    ? entry.connectEnd - entry.secureConnectionStart
    : 0,
  ttfb: entry.responseStart - entry.requestStart,
  download: entry.responseEnd - entry.responseStart,
  domParse: entry.domInteractive - entry.responseEnd,
  domContentLoaded: entry.domContentLoadedEventEnd - entry.domContentLoadedEventStart,
  domComplete: entry.domComplete - entry.domInteractive,
  loadEvent: entry.loadEventEnd - entry.loadEventStart,
  total: entry.loadEventEnd - entry.startTime,
};

PerformanceObserver

PerformanceObserver 是现代性能数据采集的核心 API，采用观察者模式异步收集性能条目，避免轮询带来的性能开销。

const observer = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    console.log(entry.name, entry.startTime, entry.duration);
  }
});

observer.observe({ type: 'largest-contentful-paint', buffered: true });

buffered: true 的作用是回溯获取在 Observer 注册之前已经产生的条目，这对于在页面加载后期才初始化监控 SDK 的场景至关重要。

采集 Core Web Vitals

采集 LCP

let lcpValue = 0;

const lcpObserver = new PerformanceObserver((list) => {
  const entries = list.getEntries();
  const lastEntry = entries[entries.length - 1];
  lcpValue = lastEntry.startTime;
});

lcpObserver.observe({ type: 'largest-contentful-paint', buffered: true });

document.addEventListener('visibilitychange', () => {
  if (document.visibilityState === 'hidden') {
    lcpObserver.takeRecords();
    lcpObserver.disconnect();
    report({ name: 'LCP', value: lcpValue });
  }
});

LCP 需要在 visibilitychange 时上报，因为用户切换 Tab 后浏览器不再更新 LCP 候选元素。

采集 FCP

const fcpObserver = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    if (entry.name === 'first-contentful-paint') {
      report({ name: 'FCP', value: entry.startTime });
      fcpObserver.disconnect();
    }
  }
});

fcpObserver.observe({ type: 'paint', buffered: true });

采集 CLS

let clsValue = 0;
let sessionValue = 0;
let sessionEntries = [];

const clsObserver = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    if (!entry.hadRecentInput) {
      const firstEntry = sessionEntries[0];
      const lastEntry = sessionEntries[sessionEntries.length - 1];

      if (
        sessionValue &&
        entry.startTime - lastEntry.startTime < 1000 &&
        entry.startTime - firstEntry.startTime < 5000
      ) {
        sessionValue += entry.value;
        sessionEntries.push(entry);
      } else {
        sessionValue = entry.value;
        sessionEntries = [entry];
      }

      if (sessionValue > clsValue) {
        clsValue = sessionValue;
      }
    }
  }
});

clsObserver.observe({ type: 'layout-shift', buffered: true });

CLS 的采集需要实现会话窗口（Session Window）算法：

同一会话窗口内相邻偏移间隔不超过 1 秒
窗口总时长不超过 5 秒
取所有窗口中得分最高的作为 CLS 值
hadRecentInput 用于排除用户主动操作引起的布局偏移

采集 INP

let inpValue = 0;
const interactions = [];

const inpObserver = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    if (entry.interactionId) {
      const existing = interactions.find(
        (i) => i.interactionId === entry.interactionId
      );
      if (existing) {
        existing.duration = Math.max(existing.duration, entry.duration);
      } else {
        interactions.push({
          interactionId: entry.interactionId,
          duration: entry.duration,
        });
      }
    }
  }

  interactions.sort((a, b) => b.duration - a.duration);
  const p98Index = Math.floor(interactions.length / 50);
  inpValue = interactions[p98Index]?.duration || 0;
});

inpObserver.observe({ type: 'event', buffered: true, durationThreshold: 16 });

INP 的采集要点：

使用 event 类型的 PerformanceObserver
同一个 interactionId 可能产生多个事件（如 keydown + keyup），取最大 duration
最终取 P98（第 98 百分位）作为 INP 值

web-vitals 库

Google 官方维护的 web-vitals 库封装了上述复杂逻辑，推荐在生产环境使用：

import { onLCP, onINP, onCLS, onFCP, onTTFB } from 'web-vitals';

function sendToAnalytics(metric) {
  const body = JSON.stringify({
    name: metric.name,
    value: metric.value,
    rating: metric.rating,
    delta: metric.delta,
    id: metric.id,
    navigationType: metric.navigationType,
  });

  if (navigator.sendBeacon) {
    navigator.sendBeacon('/analytics', body);
  } else {
    fetch('/analytics', { body, method: 'POST', keepalive: true });
  }
}

onLCP(sendToAnalytics);
onINP(sendToAnalytics);
onCLS(sendToAnalytics);
onFCP(sendToAnalytics);
onTTFB(sendToAnalytics);

web-vitals 库的优势：

完整实现了各指标的采集规范（会话窗口、P98 等）
自动处理 visibilitychange 和 pagehide 的上报时机
提供 rating 字段（good / needs-improvement / poor）
支持 attribution 模式，提供指标归因信息（哪个元素触发了 LCP，哪个交互导致了 INP）

import { onLCP } from 'web-vitals/attribution';

onLCP((metric) => {
  console.log(metric.attribution.element);
  console.log(metric.attribution.url);
  console.log(metric.attribution.timeToFirstByte);
  console.log(metric.attribution.resourceLoadDelay);
  console.log(metric.attribution.resourceLoadDuration);
  console.log(metric.attribution.elementRenderDelay);
});

Resource Timing API

Resource Timing API 提供了页面加载的每一个资源（JS、CSS、图片、字体等）的详细时间信息。

const resourceObserver = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    const resource = {
      name: entry.name,
      type: entry.initiatorType,
      duration: entry.duration,
      transferSize: entry.transferSize,
      decodedBodySize: entry.decodedBodySize,
      dns: entry.domainLookupEnd - entry.domainLookupStart,
      tcp: entry.connectEnd - entry.connectStart,
      ttfb: entry.responseStart - entry.requestStart,
      download: entry.responseEnd - entry.responseStart,
      isCache: entry.transferSize === 0,
    };

    if (resource.duration > 3000) {
      reportSlowResource(resource);
    }
  }
});

resourceObserver.observe({ type: 'resource', buffered: true });

Resource Timing 常用字段：

┌──────────────────────┬──────────────────────────────────┐
│ 字段                  │ 含义                             │
├──────────────────────┼──────────────────────────────────┤
│ name                 │ 资源 URL                         │
│ initiatorType        │ 触发类型 (script/link/img/fetch)  │
│ startTime            │ 请求开始时间                      │
│ duration             │ 总耗时                           │
│ transferSize         │ 传输大小（含 header，0=缓存命中）  │
│ encodedBodySize      │ 压缩后 body 大小                  │
│ decodedBodySize      │ 解压后 body 大小                  │
│ responseStart        │ 首字节时间                        │
│ responseEnd          │ 响应结束时间                      │
└──────────────────────┴──────────────────────────────────┘

Long Tasks API

Long Task 是指在主线程上执行超过 50ms 的任务，会阻塞用户交互，是导致页面卡顿的主要原因。

const longTaskObserver = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    report({
      type: 'long-task',
      duration: entry.duration,
      startTime: entry.startTime,
      attribution: entry.attribution.map((attr) => ({
        name: attr.name,
        containerType: attr.containerType,
        containerSrc: attr.containerSrc,
      })),
    });
  }
});

longTaskObserver.observe({ type: 'longtask', buffered: true });

Long Task 与 TBT 的关系：

Long Task：单个任务执行时间 > 50ms

主线程时间线：
├── Task A (30ms) ──┼── Task B (120ms) ──┼── Task C (200ms) ──┼── Task D (40ms) ──┤
                     │ 超出 50ms: 70ms   │  超出 50ms: 150ms  │
                     │ ← blocking time → │ ← blocking time →  │

TBT = Σ(每个 Long Task 超出 50ms 的部分)
    = 70ms + 150ms
    = 220ms

三、错误监控

错误监控是前端可观测性的基石。一个完善的错误监控系统需要覆盖所有可能的错误来源。

前端错误分类：

┌─────────────────────────────────────────────────────┐
│                    前端错误                           │
│                                                     │
│  ┌───────────────┐  ┌───────────────┐               │
│  │  JS 运行时错误  │  │  资源加载错误   │              │
│  │               │  │               │               │
│  │ · TypeError   │  │ · img 加载失败  │              │
│  │ · ReferenceErr│  │ · script 失败  │              │
│  │ · SyntaxError │  │ · link 失败    │              │
│  │ · RangeError  │  │ · font 失败    │              │
│  └───────────────┘  └───────────────┘               │
│                                                     │
│  ┌───────────────┐  ┌───────────────┐               │
│  │  Promise 异常  │  │  接口异常      │               │
│  │               │  │               │               │
│  │ · unhandled   │  │ · HTTP 状态码  │               │
│  │   rejection   │  │ · 超时         │               │
│  │               │  │ · 网络断开     │               │
│  └───────────────┘  └───────────────┘               │
│                                                     │
│  ┌───────────────┐  ┌───────────────┐               │
│  │  框架层错误     │  │  跨域脚本错误  │               │
│  │               │  │               │               │
│  │ · React       │  │ · Script error│               │
│  │   ErrorBoundr │  │   (无详情)     │               │
│  │ · Vue         │  │               │               │
│  │   errorHandler│  │               │               │
│  └───────────────┘  └───────────────┘               │
└─────────────────────────────────────────────────────┘

JS 运行时错误

window.onerror

window.onerror = function (message, source, lineno, colno, error) {
  report({
    type: 'js-error',
    message: message,
    source: source,
    lineno: lineno,
    colno: colno,
    stack: error?.stack,
    timestamp: Date.now(),
  });

  return false;
};

window.onerror 的特点：

可以捕获同步运行时错误和异步错误（setTimeout 等）
无法捕获 Promise 未处理的 rejection
无法捕获 资源加载错误
返回 true 阻止浏览器默认行为（控制台报错），返回 false 不阻止

window.addEventListener('error')

window.addEventListener('error', (event) => {
  if (event.target && (event.target.src || event.target.href)) {
    report({
      type: 'resource-error',
      tagName: event.target.tagName,
      url: event.target.src || event.target.href,
      timestamp: Date.now(),
    });
  } else {
    report({
      type: 'js-error',
      message: event.message,
      source: event.filename,
      lineno: event.lineno,
      colno: event.colno,
      stack: event.error?.stack,
      timestamp: Date.now(),
    });
  }
}, true);

第三个参数 true 表示在捕获阶段监听。资源加载错误不会冒泡，只能在捕获阶段被拦截。这是 addEventListener('error') 比 window.onerror 更强大的地方。

事件传播与错误捕获：

                    window (捕获阶段监听 ✅)
                       │
                    document
                       │
                     body
                       │
                    ┌──div──┐
                    │       │
                  <img>   <script>
                 (加载失败)  (加载失败)

资源加载错误不冒泡！
  window.onerror ❌ 无法捕获
  window.addEventListener('error', handler, true) ✅ 捕获阶段可拦截

Promise 未捕获异常

window.addEventListener('unhandledrejection', (event) => {
  let message = '';
  let stack = '';

  if (event.reason instanceof Error) {
    message = event.reason.message;
    stack = event.reason.stack;
  } else if (typeof event.reason === 'string') {
    message = event.reason;
  } else {
    message = JSON.stringify(event.reason);
  }

  report({
    type: 'promise-error',
    message: message,
    stack: stack,
    timestamp: Date.now(),
  });
});

unhandledrejection 触发的条件：Promise 被 reject 且没有 .catch() 处理。常见场景：

fetch('/api/data').then((res) => res.json());

async function loadData() {
  const res = await fetch('/api/data');
  return res.json();
}
loadData();

new Promise((resolve, reject) => {
  reject(new Error('something went wrong'));
});

资源加载错误

function observeResourceErrors() {
  const targetTags = ['IMG', 'SCRIPT', 'LINK', 'VIDEO', 'AUDIO'];

  window.addEventListener('error', (event) => {
    const target = event.target;
    if (!target || !targetTags.includes(target.tagName)) return;

    report({
      type: 'resource-error',
      tagName: target.tagName,
      url: target.src || target.href,
      outerHTML: target.outerHTML.slice(0, 200),
      xpath: getXPath(target),
      timestamp: Date.now(),
    });
  }, true);
}

function getXPath(element) {
  const parts = [];
  while (element && element.nodeType === Node.ELEMENT_NODE) {
    let index = 0;
    let sibling = element.previousSibling;
    while (sibling) {
      if (sibling.nodeType === Node.ELEMENT_NODE && sibling.tagName === element.tagName) {
        index++;
      }
      sibling = sibling.previousSibling;
    }
    parts.unshift(`${element.tagName.toLowerCase()}[${index}]`);
    element = element.parentNode;
  }
  return '/' + parts.join('/');
}

框架错误捕获

React ErrorBoundary

jsx

class ErrorBoundary extends React.Component {
  constructor(props) {
    super(props);
    this.state = { hasError: false, error: null };
  }

  static getDerivedStateFromError(error) {
    return { hasError: true, error };
  }

  componentDidCatch(error, errorInfo) {
    report({
      type: 'react-error',
      message: error.message,
      stack: error.stack,
      componentStack: errorInfo.componentStack,
      timestamp: Date.now(),
    });
  }

  render() {
    if (this.state.hasError) {
      return this.props.fallback || <div>Something went wrong</div>;
    }
    return this.props.children;
  }
}

jsx

function App() {
  return (
    <ErrorBoundary fallback={<ErrorPage />}>
      <Header />
      <ErrorBoundary fallback={<ContentFallback />}>
        <MainContent />
      </ErrorBoundary>
      <Footer />
    </ErrorBoundary>
  );
}

ErrorBoundary 的局限性：

无法捕获事件处理函数中的错误
无法捕获异步代码中的错误（setTimeout、requestAnimationFrame）
无法捕获服务端渲染（SSR）的错误
无法捕获 ErrorBoundary 自身的错误

Vue errorHandler

const app = createApp(App);

app.config.errorHandler = (err, instance, info) => {
  report({
    type: 'vue-error',
    message: err.message,
    stack: err.stack,
    componentName: instance?.$options?.name || instance?.$options?.__name,
    lifecycleHook: info,
    timestamp: Date.now(),
  });
};

app.config.warnHandler = (msg, instance, trace) => {
  report({
    type: 'vue-warning',
    message: msg,
    trace: trace,
    timestamp: Date.now(),
  });
};

Vue errorHandler 覆盖范围：

组件渲染函数和 watcher 回调中的错误
生命周期钩子中的错误
自定义事件处理函数中的错误
setup() 中的同步和异步错误

跨域脚本错误

当页面引用的脚本来自不同域时，window.onerror 只能获取到 Script error.，无法得到具体的错误信息。

跨域脚本错误问题：

页面：https://app.example.com
脚本：https://cdn.example.com/bundle.js

onerror 回调收到：
  message: "Script error."
  source:  ""
  lineno:  0
  colno:   0
  error:   null

原因：浏览器出于安全考虑，隐藏了跨域脚本的错误详情

解决方案：

html

<script src="https://cdn.example.com/bundle.js" crossorigin="anonymous"></script>

同时在 CDN 服务器配置 CORS 响应头：

Access-Control-Allow-Origin: https://app.example.com

crossorigin 属性的两种取值：

anonymous（默认）：
  请求不携带 Cookie 等凭证
  服务器需返回 Access-Control-Allow-Origin

use-credentials：
  请求携带 Cookie 等凭证
  服务器需返回 Access-Control-Allow-Origin（不能是 *）
  + Access-Control-Allow-Credentials: true

Source Map 还原

生产环境的代码经过压缩混淆，错误堆栈中的行列号无法直接对应源码。Source Map 是还原错误堆栈的关键。

Source Map 还原流程：

1. 构建时生成 Source Map 文件
   bundle.js  →  bundle.js.map

2. 上传 Source Map 到监控平台
   ⚠️ 不要将 .map 文件部署到线上！

3. 错误发生时上报原始堆栈
   TypeError: Cannot read property 'x' of undefined
     at e.render (bundle.js:1:23456)
     at t.update (bundle.js:1:78901)

4. 监控平台使用 Source Map 还原
   TypeError: Cannot read property 'x' of undefined
     at UserProfile.render (src/components/UserProfile.tsx:42:18)
     at App.update (src/App.tsx:15:6)

const { SourceMapConsumer } = require('source-map');
const fs = require('fs');

async function resolveSourceMap(stackFrame) {
  const { file, line, column } = stackFrame;
  const mapFile = file + '.map';
  const rawSourceMap = JSON.parse(fs.readFileSync(mapFile, 'utf-8'));

  const consumer = await new SourceMapConsumer(rawSourceMap);
  const originalPosition = consumer.originalPositionFor({
    line: line,
    column: column,
  });

  consumer.destroy();

  return {
    source: originalPosition.source,
    line: originalPosition.line,
    column: originalPosition.column,
    name: originalPosition.name,
  };
}

Webpack 配置 Source Map：

module.exports = {
  devtool: false,
  plugins: [
    new webpack.SourceMapDevToolPlugin({
      filename: '[file].map',
      append: false,
    }),
  ],
};

append: false 表示不在 JS 文件末尾添加 //# sourceMappingURL 注释，防止浏览器自动下载 Source Map。

四、用户行为监控

PV/UV 统计

PV（Page View）：页面浏览次数，每次页面加载计一次
UV（Unique Visitor）：独立访客数，同一用户一天内多次访问只计一次

SPA 应用 PV 统计的挑战：

传统 MPA：
  page1.html  →  page2.html  →  page3.html
       PV+1          PV+1          PV+1
  (每次完整页面加载)

SPA：
  index.html  →  #/page1  →  #/page2  →  #/page3
       PV+1       PV???        PV???       PV???
  (只有首次完整加载，后续都是前端路由切换)

class PVTracker {
  constructor(reportFn) {
    this.reportFn = reportFn;
    this.init();
  }

  init() {
    this.reportPV();

    const originalPushState = history.pushState;
    const originalReplaceState = history.replaceState;

    history.pushState = (...args) => {
      originalPushState.apply(history, args);
      this.reportPV();
    };

    history.replaceState = (...args) => {
      originalReplaceState.apply(history, args);
      this.reportPV();
    };

    window.addEventListener('popstate', () => {
      this.reportPV();
    });

    window.addEventListener('hashchange', () => {
      this.reportPV();
    });
  }

  reportPV() {
    this.reportFn({
      type: 'pv',
      url: location.href,
      referrer: document.referrer,
      title: document.title,
      timestamp: Date.now(),
    });
  }
}

UV 统计方案：

UV 标识策略对比：

┌──────────────┬────────────────────┬────────────┬──────────────┐
│ 方案          │ 实现方式            │ 精确度     │ 局限性        │
├──────────────┼────────────────────┼────────────┼──────────────┤
│ Cookie       │ 首次访问种 UUID     │ 中等       │ 用户可清除    │
│ LocalStorage │ 存储唯一标识        │ 中等       │ 隐身模式失效  │
│ 登录 ID      │ 用户登录后的 ID     │ 高         │ 未登录无法统计│
│ 指纹         │ canvas/WebGL 指纹  │ 较高       │ 隐私合规风险  │
└──────────────┴────────────────────┴────────────┴──────────────┘

面包屑记录用户在错误发生前的行为序列，帮助复现和定位问题。

错误发生时的面包屑示例：

[10:30:01] Navigation  → /dashboard
[10:30:02] Click       → button.refresh-btn
[10:30:02] XHR         → GET /api/data (200 OK)
[10:30:05] Click       → a.detail-link
[10:30:05] Navigation  → /detail/123
[10:30:06] XHR         → GET /api/detail/123 (500 Error)
[10:30:06] Console     → Error: Failed to fetch detail
[10:30:06] ❌ Error    → TypeError: Cannot read property 'name' of undefined

class BreadcrumbTracker {
  constructor(maxLength = 20) {
    this.breadcrumbs = [];
    this.maxLength = maxLength;
  }

  push(breadcrumb) {
    this.breadcrumbs.push({
      ...breadcrumb,
      timestamp: Date.now(),
    });
    if (this.breadcrumbs.length > this.maxLength) {
      this.breadcrumbs.shift();
    }
  }

  getBreadcrumbs() {
    return [...this.breadcrumbs];
  }

  initClickTracker() {
    document.addEventListener('click', (event) => {
      const target = event.target;
      this.push({
        category: 'click',
        data: {
          tagName: target.tagName,
          className: target.className,
          id: target.id,
          innerText: target.innerText?.slice(0, 100),
          xpath: this.getSimpleXPath(target),
        },
      });
    }, true);
  }

  initRouteTracker() {
    const originalPushState = history.pushState;
    history.pushState = (...args) => {
      originalPushState.apply(history, args);
      this.push({
        category: 'navigation',
        data: { from: document.referrer, to: location.href },
      });
    };

    window.addEventListener('popstate', () => {
      this.push({
        category: 'navigation',
        data: { to: location.href },
      });
    });
  }

  initConsoleTracker() {
    const levels = ['log', 'warn', 'error', 'info'];
    levels.forEach((level) => {
      const original = console[level];
      console[level] = (...args) => {
        this.push({
          category: 'console',
          data: { level, message: args.map(String).join(' ').slice(0, 200) },
        });
        original.apply(console, args);
      };
    });
  }

  initXHRTracker() {
    const self = this;
    const originalOpen = XMLHttpRequest.prototype.open;
    const originalSend = XMLHttpRequest.prototype.send;

    XMLHttpRequest.prototype.open = function (method, url) {
      this._monitor = { method, url };
      originalOpen.apply(this, arguments);
    };

    XMLHttpRequest.prototype.send = function () {
      const startTime = Date.now();
      this.addEventListener('loadend', function () {
        self.push({
          category: 'xhr',
          data: {
            method: this._monitor.method,
            url: this._monitor.url,
            status: this.status,
            duration: Date.now() - startTime,
          },
        });
      });
      originalSend.apply(this, arguments);
    };
  }

  getSimpleXPath(element) {
    const parts = [];
    while (element && element !== document.body) {
      const tag = element.tagName.toLowerCase();
      const id = element.id ? `#${element.id}` : '';
      parts.unshift(tag + id);
      element = element.parentElement;
    }
    return parts.join(' > ');
  }
}

点击热力图

热力图原理：

1. 采集：记录每次点击的坐标（相对视口或页面）
2. 上报：将坐标数据批量上报
3. 聚合：后端按页面 URL 聚合点击坐标
4. 渲染：前端用 Canvas 绘制热力分布

      ┌────────────────────────────┐
      │  Header          [Login]  │ ← 点击密集区（红色）
      │                           │
      │  ┌──────┐  ┌──────┐      │
      │  │ Card │  │ Card │      │ ← 中等点击（黄色）
      │  │  A   │  │  B   │      │
      │  └──────┘  └──────┘      │
      │                           │
      │  ┌────────────────┐      │
      │  │   Content      │      │ ← 少量点击（蓝色）
      │  │                │      │
      │  └────────────────┘      │
      │                           │
      │  Footer                   │ ← 几乎无点击（透明）
      └────────────────────────────┘

function trackClicks() {
  document.addEventListener('click', (event) => {
    const scrollX = window.scrollX || document.documentElement.scrollLeft;
    const scrollY = window.scrollY || document.documentElement.scrollTop;

    report({
      type: 'heatmap',
      x: event.clientX + scrollX,
      y: event.clientY + scrollY,
      viewportWidth: window.innerWidth,
      viewportHeight: window.innerHeight,
      pageWidth: document.documentElement.scrollWidth,
      pageHeight: document.documentElement.scrollHeight,
      url: location.href,
      timestamp: Date.now(),
    });
  });
}

录屏回放（rrweb）

rrweb（record and replay the web）是一个开源的 Web 录屏回放库，通过序列化 DOM 变更而非视频录制来实现轻量级录屏。

rrweb 工作原理：

录制阶段：
  ┌───────────────────────────────────────┐
  │ 1. 序列化完整 DOM 树 → 全量快照        │
  │ 2. 使用 MutationObserver 监听 DOM 变化 │
  │ 3. 监听鼠标/键盘/滚动等交互事件         │
  │ 4. 将所有变更序列化为增量快照           │
  └───────────────────────────────────────┘
        ↓
  [Full Snapshot] → [Mutation] → [Mutation] → [Input] → [Scroll] → ...
        ↓
  JSON 格式存储，体积远小于视频

回放阶段：
  ┌───────────────────────────────────────┐
  │ 1. 使用全量快照重建 DOM 树（iframe）    │
  │ 2. 按时间戳顺序应用增量快照            │
  │ 3. 重现鼠标移动、点击、输入等交互       │
  │ 4. 支持快进、暂停、倍速播放            │
  └───────────────────────────────────────┘

import { record } from 'rrweb';

let events = [];
let stopFn = null;

function startRecording() {
  events = [];
  stopFn = record({
    emit(event) {
      events.push(event);
      if (events.length > 1000) {
        events.shift();
      }
    },
    maskAllInputs: true,
    blockSelector: '.sensitive-data',
    sampling: {
      mousemove: true,
      mouseInteraction: true,
      scroll: 150,
      media: 800,
      input: 'last',
    },
  });
}

function onError(error) {
  const eventsSnapshot = [...events];
  report({
    type: 'replay',
    events: eventsSnapshot,
    error: {
      message: error.message,
      stack: error.stack,
    },
  });
}

import { Replayer } from 'rrweb';

const replayer = new Replayer(events, {
  root: document.getElementById('player'),
  speed: 1,
});

replayer.play();

rrweb 关键配置：

maskAllInputs：遮蔽所有输入框内容，保护用户隐私
blockSelector：完全屏蔽匹配选择器的元素
sampling：控制各类事件的采样频率，平衡详细度和数据量
通常只保留最近 N 条事件，在错误发生时上报，避免数据量过大

五、上报策略

上报方式

四种主要上报方式：

1. XMLHttpRequest / Fetch
   ┌───────────────────────────────────────┐
   │ 常规 HTTP 请求，功能最完善             │
   │ 可设置 header、支持 POST 大数据量      │
   │ 页面卸载时请求可能被取消               │
   └───────────────────────────────────────┘

2. navigator.sendBeacon
   ┌───────────────────────────────────────┐
   │ 专为数据上报设计的 API                 │
   │ 页面卸载时也能可靠发送                 │
   │ 异步非阻塞，不影响页面跳转             │
   │ 只支持 POST，数据量有限制（64KB）      │
   └───────────────────────────────────────┘

3. Image 像素点（1x1 GIF）
   ┌───────────────────────────────────────┐
   │ 使用 new Image().src 发送 GET 请求     │
   │ 无跨域限制（img 标签天然跨域）          │
   │ 不需要等待响应                         │
   │ 数据量受 URL 长度限制（约 2KB）         │
   └───────────────────────────────────────┘

4. Fetch with keepalive
   ┌───────────────────────────────────────┐
   │ fetch + keepalive: true               │
   │ 类似 sendBeacon 的可靠发送             │
   │ 支持自定义 header 和 method            │
   │ keepalive 请求总数据量限制 64KB        │
   └───────────────────────────────────────┘

function reportByBeacon(url, data) {
  const blob = new Blob([JSON.stringify(data)], { type: 'application/json' });
  if (navigator.sendBeacon) {
    navigator.sendBeacon(url, blob);
  } else {
    fetch(url, {
      method: 'POST',
      body: blob,
      keepalive: true,
    });
  }
}

function reportByImage(url, data) {
  const params = new URLSearchParams();
  Object.entries(data).forEach(([key, value]) => {
    params.append(key, typeof value === 'object' ? JSON.stringify(value) : value);
  });
  const img = new Image();
  img.src = `${url}?${params.toString()}`;
}

function reportByXHR(url, data) {
  const xhr = new XMLHttpRequest();
  xhr.open('POST', url);
  xhr.setRequestHeader('Content-Type', 'application/json');
  xhr.send(JSON.stringify(data));
}

上报方式对比

特性	XMLHttpRequest/Fetch	sendBeacon	Image 像素点	Fetch keepalive
请求方法	GET/POST/PUT...	POST	GET	GET/POST/PUT...
数据量限制	理论无限制	64KB	~2KB (URL长度)	64KB
页面卸载可靠性	❌ 可能被取消	✅ 可靠	⚠️ 不保证	✅ 可靠
跨域	需要 CORS	需要 CORS	✅ 无限制	需要 CORS
自定义 Header	✅	❌	❌	✅
获取响应	✅	❌	❌	✅
阻塞页面	同步模式会	❌ 不阻塞	❌ 不阻塞	❌ 不阻塞
兼容性	全部支持	IE 不支持	全部支持	较新浏览器
推荐场景	实时大数据上报	页面卸载时上报	简单打点/跨域	现代浏览器通用

上报优化策略

采样

class ReportStrategy {
  constructor(options = {}) {
    this.sampleRate = options.sampleRate || 1;
    this.errorSampleRate = options.errorSampleRate || 1;
  }

  shouldSample(type) {
    if (type === 'error') {
      return Math.random() < this.errorSampleRate;
    }
    return Math.random() < this.sampleRate;
  }
}

const strategy = new ReportStrategy({
  sampleRate: 0.1,
  errorSampleRate: 1,
});

采样策略建议：

错误数据：100% 上报（不采样或极低采样）
性能数据：10% ~ 30% 采样
用户行为：5% ~ 20% 采样
PV/UV：100% 上报
可按用户 ID 哈希做一致性采样，确保同一用户的数据完整性

批量上报

class BatchReporter {
  constructor(options = {}) {
    this.url = options.url;
    this.batchSize = options.batchSize || 10;
    this.interval = options.interval || 5000;
    this.queue = [];
    this.timer = null;
    this.init();
  }

  init() {
    this.timer = setInterval(() => this.flush(), this.interval);

    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden') {
        this.flush();
      }
    });

    window.addEventListener('beforeunload', () => {
      this.flush();
    });
  }

  push(data) {
    this.queue.push({
      ...data,
      timestamp: Date.now(),
    });

    if (this.queue.length >= this.batchSize) {
      this.flush();
    }
  }

  flush() {
    if (this.queue.length === 0) return;

    const batch = this.queue.splice(0);
    const blob = new Blob([JSON.stringify(batch)], { type: 'application/json' });

    if (navigator.sendBeacon) {
      navigator.sendBeacon(this.url, blob);
    } else {
      fetch(this.url, {
        method: 'POST',
        body: blob,
        keepalive: true,
      });
    }
  }

  destroy() {
    if (this.timer) {
      clearInterval(this.timer);
    }
    this.flush();
  }
}

离线缓存

class OfflineReporter {
  constructor(options = {}) {
    this.url = options.url;
    this.dbName = 'monitor-offline';
    this.storeName = 'reports';
    this.db = null;
    this.init();
  }

  async init() {
    this.db = await this.openDB();
    window.addEventListener('online', () => this.retryOfflineData());
  }

  openDB() {
    return new Promise((resolve, reject) => {
      const request = indexedDB.open(this.dbName, 1);
      request.onupgradeneeded = (event) => {
        const db = event.target.result;
        if (!db.objectStoreNames.contains(this.storeName)) {
          db.createObjectStore(this.storeName, {
            keyPath: 'id',
            autoIncrement: true,
          });
        }
      };
      request.onsuccess = (event) => resolve(event.target.result);
      request.onerror = (event) => reject(event.target.error);
    });
  }

  async report(data) {
    if (navigator.onLine) {
      try {
        await fetch(this.url, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify(data),
        });
      } catch {
        await this.saveToLocal(data);
      }
    } else {
      await this.saveToLocal(data);
    }
  }

  saveToLocal(data) {
    return new Promise((resolve, reject) => {
      const tx = this.db.transaction(this.storeName, 'readwrite');
      const store = tx.objectStore(this.storeName);
      store.add({ data, timestamp: Date.now() });
      tx.oncomplete = resolve;
      tx.onerror = reject;
    });
  }

  async retryOfflineData() {
    const tx = this.db.transaction(this.storeName, 'readwrite');
    const store = tx.objectStore(this.storeName);
    const request = store.getAll();

    request.onsuccess = async (event) => {
      const records = event.target.result;
      for (const record of records) {
        try {
          await fetch(this.url, {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify(record.data),
          });
          const deleteTx = this.db.transaction(this.storeName, 'readwrite');
          deleteTx.objectStore(this.storeName).delete(record.id);
        } catch {
          break;
        }
      }
    };
  }
}

节流与去重

class ThrottledReporter {
  constructor(options = {}) {
    this.minInterval = options.minInterval || 1000;
    this.lastReportTime = {};
    this.dedupeMap = new Map();
    this.dedupeTimeout = options.dedupeTimeout || 5000;
  }

  shouldReport(data) {
    const key = this.getDedupeKey(data);
    const now = Date.now();

    if (this.dedupeMap.has(key)) {
      const lastTime = this.dedupeMap.get(key);
      if (now - lastTime < this.dedupeTimeout) {
        return false;
      }
    }

    this.dedupeMap.set(key, now);

    if (this.dedupeMap.size > 100) {
      const oldest = [...this.dedupeMap.entries()]
        .sort((a, b) => a[1] - b[1])
        .slice(0, 50);
      oldest.forEach(([k]) => this.dedupeMap.delete(k));
    }

    return true;
  }

  getDedupeKey(data) {
    if (data.type === 'js-error' || data.type === 'promise-error') {
      return `${data.type}:${data.message}:${data.source || ''}`;
    }
    return `${data.type}:${JSON.stringify(data)}`;
  }
}

六、Sentry 实践

Sentry 核心功能

Sentry 功能全景：

┌─────────────────────────────────────────────────────┐
│                     Sentry                          │
│                                                     │
│  ┌───────────────┐  ┌───────────────┐               │
│  │  Error        │  │  Performance  │               │
│  │  Tracking     │  │  Monitoring   │               │
│  │               │  │               │               │
│  │ · 错误聚合    │  │ · Transaction │               │
│  │ · 堆栈还原    │  │ · Span        │               │
│  │ · Issue 分配  │  │ · Web Vitals  │               │
│  │ · 报警通知    │  │ · 慢查询追踪   │               │
│  └───────────────┘  └───────────────┘               │
│                                                     │
│  ┌───────────────┐  ┌───────────────┐               │
│  │  Session      │  │  Release      │               │
│  │  Replay       │  │  Health       │               │
│  │               │  │               │               │
│  │ · 录屏回放    │  │ · 版本追踪    │               │
│  │ · DOM 快照    │  │ · 部署关联    │               │
│  │ · 隐私遮蔽    │  │ · Crash Free  │               │
│  │ · 错误上下文  │  │   Rate        │               │
│  └───────────────┘  └───────────────┘               │
│                                                     │
│  ┌───────────────┐  ┌───────────────┐               │
│  │  Profiling    │  │  Crons        │               │
│  │  性能剖析     │  │  定时任务监控  │               │
│  └───────────────┘  └───────────────┘               │
└─────────────────────────────────────────────────────┘

Sentry SDK 集成

React 项目集成

import * as Sentry from '@sentry/react';

Sentry.init({
  dsn: 'https://examplePublicKey@o0.ingest.sentry.io/0',
  release: 'my-app@1.2.3',
  environment: 'production',
  integrations: [
    Sentry.browserTracingIntegration(),
    Sentry.replayIntegration({
      maskAllText: false,
      blockAllMedia: false,
    }),
  ],
  tracesSampleRate: 0.2,
  replaysSessionSampleRate: 0.1,
  replaysOnErrorSampleRate: 1.0,
  beforeSend(event) {
    if (event.exception) {
      const values = event.exception.values || [];
      const isChunkLoadError = values.some(
        (v) => v.type === 'ChunkLoadError'
      );
      if (isChunkLoadError) {
        return null;
      }
    }
    return event;
  },
});

jsx

import * as Sentry from '@sentry/react';

const SentryRoutes = Sentry.withSentryReactRouterV6Routing(Routes);

function App() {
  return (
    <Sentry.ErrorBoundary fallback={<ErrorPage />} showDialog>
      <BrowserRouter>
        <SentryRoutes>
          <Route path="/" element={<Home />} />
          <Route path="/dashboard" element={<Dashboard />} />
        </SentryRoutes>
      </BrowserRouter>
    </Sentry.ErrorBoundary>
  );
}

Vue 项目集成

import * as Sentry from '@sentry/vue';
import { createApp } from 'vue';
import { createRouter } from 'vue-router';

const app = createApp(App);
const router = createRouter({ ... });

Sentry.init({
  app,
  dsn: 'https://examplePublicKey@o0.ingest.sentry.io/0',
  release: 'my-vue-app@1.0.0',
  environment: 'production',
  integrations: [
    Sentry.browserTracingIntegration({ router }),
    Sentry.replayIntegration(),
  ],
  tracesSampleRate: 0.2,
  tracePropagationTargets: ['localhost', /^https:\/\/api\.example\.com/],
  replaysSessionSampleRate: 0.1,
  replaysOnErrorSampleRate: 1.0,
});

Node.js 集成

const Sentry = require('@sentry/node');

Sentry.init({
  dsn: 'https://examplePublicKey@o0.ingest.sentry.io/0',
  release: 'my-server@1.0.0',
  environment: process.env.NODE_ENV,
  tracesSampleRate: 0.3,
  profilesSampleRate: 0.1,
});

const express = require('express');
const app = express();

Sentry.setupExpressErrorHandler(app);

app.get('/api/data', async (req, res) => {
  const span = Sentry.startSpan({ name: 'db.query' }, () => {
    return db.query('SELECT * FROM users');
  });
  res.json(span);
});

DSN、Release 与 SourceMap

DSN（Data Source Name）结构解析：

https://<public_key>@<host>/<project_id>

示例：https://abc123def456@o123456.ingest.sentry.io/789

  ├── public_key: abc123def456    （客户端密钥，可公开）
  ├── host: o123456.ingest.sentry.io  （Sentry 服务地址）
  └── project_id: 789              （项目标识）

Source Map 上传配置（使用 @sentry/webpack-plugin）：

const { sentryWebpackPlugin } = require('@sentry/webpack-plugin');

module.exports = {
  devtool: 'source-map',
  plugins: [
    sentryWebpackPlugin({
      org: 'my-org',
      project: 'my-project',
      authToken: process.env.SENTRY_AUTH_TOKEN,
      release: {
        name: 'my-app@1.2.3',
      },
      sourcemaps: {
        assets: './dist/**',
        filesToDeleteAfterUpload: './dist/**/*.map',
      },
    }),
  ],
};

Source Map 上传流程：

构建阶段：
  ┌──────────────────────────────────────────────┐
  │ 1. Webpack 打包，生成 .js 和 .js.map 文件     │
  │ 2. Sentry Plugin 创建 Release                │
  │ 3. 上传 .map 文件到 Sentry                   │
  │ 4. 删除本地 .map 文件（安全考虑）              │
  │ 5. 部署 .js 文件到 CDN（不含 .map）           │
  └──────────────────────────────────────────────┘

错误还原阶段：
  ┌──────────────────────────────────────────────┐
  │ 1. 用户端发生错误，SDK 上报压缩后的堆栈       │
  │ 2. Sentry 根据 Release 找到对应 Source Map    │
  │ 3. 使用 Source Map 还原为源代码堆栈           │
  │ 4. 在 Issue 详情中展示源代码上下文             │
  └──────────────────────────────────────────────┘

Sentry 常用手动上报 API：

Sentry.captureException(new Error('Something failed'));

Sentry.captureMessage('User performed unusual action', 'warning');

Sentry.setUser({ id: '12345', email: 'user@example.com', username: 'john' });

Sentry.setTag('feature', 'checkout');
Sentry.setContext('order', { id: 'ORD-123', amount: 99.99 });

Sentry.addBreadcrumb({
  category: 'auth',
  message: 'User logged in',
  level: 'info',
});

七、自建监控系统设计

整体架构

自建前端监控系统架构：

┌──────────────────────────────────────────────────────────────────┐
│                        客户端（SDK 层）                           │
│                                                                  │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌──────────┐ │
│  │ 性能采集 │ │ 错误采集 │ │ 行为采集 │ │ 资源采集 │ │ 自定义埋点│ │
│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ └─────┬────┘ │
│       └──────────┬┴──────────┬┴───────────┘            │      │
│                  │           │                          │      │
│            ┌─────▼───────────▼──────────────────────────▼──┐   │
│            │               数据处理层                       │   │
│            │  采样 → 聚合 → 格式化 → 压缩 → 队列            │   │
│            └────────────────────┬──────────────────────────┘   │
│                                 │                              │
│            ┌────────────────────▼──────────────────────────┐   │
│            │               上报策略层                       │   │
│            │  sendBeacon / Image / Fetch + keepalive       │   │
│            │  批量上报 / 离线缓存 / 节流去重                 │   │
│            └────────────────────┬──────────────────────────┘   │
└─────────────────────────────────┼──────────────────────────────┘
                                  │
                          ════════╪════════  Network
                                  │
┌─────────────────────────────────▼──────────────────────────────┐
│                        上报网关层                               │
│                                                                │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Nginx / API Gateway                                     │  │
│  │  · 接收数据          · 鉴权校验                           │  │
│  │  · 流量控制          · 数据分流                           │  │
│  │  · 跨域处理          · 写入消息队列                       │  │
│  └──────────────────────────┬───────────────────────────────┘  │
└─────────────────────────────┼──────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      消息队列（Kafka）                           │
│                                                                 │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐   │
│  │ Topic:    │  │ Topic:    │  │ Topic:    │  │ Topic:    │   │
│  │ error     │  │ perf      │  │ behavior  │  │ resource  │   │
│  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘   │
└────────┼──────────────┼──────────────┼──────────────┼──────────┘
         │              │              │              │
         ▼              ▼              ▼              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    数据清洗与处理层                               │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │  Flink / Spark Streaming / Node.js Worker                │   │
│  │                                                          │   │
│  │  · 数据校验与清洗          · Source Map 还原              │   │
│  │  · 错误聚合（指纹去重）     · IP → 地域解析              │   │
│  │  · UA → 设备/浏览器解析     · 实时告警判定              │   │
│  └──────────────────────────────┬───────────────────────────┘   │
└─────────────────────────────────┼───────────────────────────────┘
                                  │
                    ┌─────────────┼─────────────┐
                    ▼             ▼             ▼
┌───────────────────────┐ ┌─────────────┐ ┌──────────────┐
│     存储层             │ │  告警系统    │ │  实时计算     │
│                       │ │             │ │              │
│ · ClickHouse (明细)   │ │ · 邮件      │ │ · 错误率飙升  │
│ · Elasticsearch (日志)│ │ · 钉钉/飞书  │ │ · P95 劣化   │
│ · Redis (实时聚合)    │ │ · Webhook   │ │ · 异常流量    │
│ · S3/OSS (归档)       │ │ · PagerDuty │ │              │
└───────────┬───────────┘ └─────────────┘ └──────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────┐
│                       可视化层                                   │
│                                                                 │
│  ┌───────────┐  ┌───────────┐  ┌───────────┐  ┌───────────┐   │
│  │  错误大盘  │  │  性能大盘  │  │  用户行为  │  │  自定义    │   │
│  │           │  │           │  │  分析      │  │  看板      │   │
│  │ · 错误趋势│  │ · LCP/INP │  │ · PV/UV   │  │ · Grafana  │   │
│  │ · Top错误 │  │ · CLS     │  │ · 漏斗    │  │ · 自研平台 │   │
│  │ · 错误详情│  │ · 资源加载 │  │ · 路径    │  │           │   │
│  └───────────┘  └───────────┘  └───────────┘  └───────────┘   │
└─────────────────────────────────────────────────────────────────┘

SDK 核心设计

class MonitorSDK {
  constructor(options) {
    this.dsn = options.dsn;
    this.appId = options.appId;
    this.release = options.release;
    this.sampleRate = options.sampleRate || 1;
    this.plugins = [];
    this.breadcrumbs = new BreadcrumbTracker();
    this.reporter = new BatchReporter({ url: this.dsn });
  }

  use(plugin) {
    this.plugins.push(plugin);
    plugin.install(this);
    return this;
  }

  report(data) {
    if (Math.random() > this.sampleRate) return;

    this.reporter.push({
      ...data,
      appId: this.appId,
      release: this.release,
      url: location.href,
      userAgent: navigator.userAgent,
      timestamp: Date.now(),
      breadcrumbs: this.breadcrumbs.getBreadcrumbs(),
    });
  }

  init() {
    this.use(new ErrorPlugin());
    this.use(new PerformancePlugin());
    this.use(new BehaviorPlugin());
    this.use(new ResourcePlugin());
    return this;
  }
}

class ErrorPlugin {
  install(sdk) {
    window.addEventListener('error', (event) => {
      if (event.target && (event.target.src || event.target.href)) {
        sdk.report({
          type: 'resource-error',
          tagName: event.target.tagName,
          url: event.target.src || event.target.href,
        });
      } else {
        sdk.report({
          type: 'js-error',
          message: event.message,
          stack: event.error?.stack,
          filename: event.filename,
          lineno: event.lineno,
          colno: event.colno,
        });
      }
    }, true);

    window.addEventListener('unhandledrejection', (event) => {
      sdk.report({
        type: 'promise-error',
        message: event.reason?.message || String(event.reason),
        stack: event.reason?.stack,
      });
    });
  }
}

class PerformancePlugin {
  install(sdk) {
    this.observeLCP(sdk);
    this.observeFCP(sdk);
    this.observeCLS(sdk);
    this.observeINP(sdk);
    this.observeNavigation(sdk);
  }

  observeLCP(sdk) {
    let lcpValue = 0;
    const observer = new PerformanceObserver((list) => {
      const entries = list.getEntries();
      lcpValue = entries[entries.length - 1].startTime;
    });
    observer.observe({ type: 'largest-contentful-paint', buffered: true });

    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden' && lcpValue > 0) {
        sdk.report({ type: 'performance', name: 'LCP', value: lcpValue });
      }
    });
  }

  observeFCP(sdk) {
    const observer = new PerformanceObserver((list) => {
      for (const entry of list.getEntries()) {
        if (entry.name === 'first-contentful-paint') {
          sdk.report({ type: 'performance', name: 'FCP', value: entry.startTime });
        }
      }
    });
    observer.observe({ type: 'paint', buffered: true });
  }

  observeCLS(sdk) {
    let clsValue = 0;
    let sessionValue = 0;
    let sessionEntries = [];

    const observer = new PerformanceObserver((list) => {
      for (const entry of list.getEntries()) {
        if (!entry.hadRecentInput) {
          const first = sessionEntries[0];
          const last = sessionEntries[sessionEntries.length - 1];
          if (
            sessionValue &&
            entry.startTime - last.startTime < 1000 &&
            entry.startTime - first.startTime < 5000
          ) {
            sessionValue += entry.value;
            sessionEntries.push(entry);
          } else {
            sessionValue = entry.value;
            sessionEntries = [entry];
          }
          if (sessionValue > clsValue) clsValue = sessionValue;
        }
      }
    });
    observer.observe({ type: 'layout-shift', buffered: true });

    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden') {
        sdk.report({ type: 'performance', name: 'CLS', value: clsValue });
      }
    });
  }

  observeINP(sdk) {
    const interactions = [];
    const observer = new PerformanceObserver((list) => {
      for (const entry of list.getEntries()) {
        if (entry.interactionId) {
          const existing = interactions.find(
            (i) => i.id === entry.interactionId
          );
          if (existing) {
            existing.duration = Math.max(existing.duration, entry.duration);
          } else {
            interactions.push({ id: entry.interactionId, duration: entry.duration });
          }
        }
      }
    });
    observer.observe({ type: 'event', buffered: true, durationThreshold: 16 });

    document.addEventListener('visibilitychange', () => {
      if (document.visibilityState === 'hidden' && interactions.length > 0) {
        interactions.sort((a, b) => b.duration - a.duration);
        const idx = Math.floor(interactions.length / 50);
        sdk.report({
          type: 'performance',
          name: 'INP',
          value: interactions[idx]?.duration || 0,
        });
      }
    });
  }

  observeNavigation(sdk) {
    const observer = new PerformanceObserver((list) => {
      for (const entry of list.getEntries()) {
        sdk.report({
          type: 'performance',
          name: 'navigation',
          value: {
            ttfb: entry.responseStart - entry.requestStart,
            domReady: entry.domContentLoadedEventEnd - entry.startTime,
            loadComplete: entry.loadEventEnd - entry.startTime,
          },
        });
      }
    });
    observer.observe({ type: 'navigation', buffered: true });
  }
}

const monitor = new MonitorSDK({
  dsn: 'https://monitor.example.com/api/report',
  appId: 'my-app',
  release: '1.2.3',
  sampleRate: 0.3,
}).init();

错误聚合与指纹

错误聚合策略：

相同的错误只创建一个 Issue，避免海量重复 Issue 淹没真正重要的问题。

指纹（Fingerprint）生成规则：
  1. 取错误类型（TypeError / ReferenceError / ...）
  2. 取错误消息（去除动态部分）
  3. 取堆栈顶部 N 帧的函数名 + 文件名 + 行号

示例：
  错误 A：TypeError: Cannot read property 'name' of undefined
          at UserProfile.render (UserProfile.tsx:42:18)
  
  错误 B：TypeError: Cannot read property 'name' of undefined
          at UserProfile.render (UserProfile.tsx:42:18)

  指纹相同 → 归为同一 Issue，计数 +1

动态内容标准化：
  "Failed to fetch /api/user/12345" → "Failed to fetch /api/user/{id}"
  "Timeout after 3000ms"           → "Timeout after {N}ms"

function generateFingerprint(error) {
  const parts = [];
  parts.push(error.type || 'Error');

  const normalizedMessage = (error.message || '')
    .replace(/\d+/g, '{N}')
    .replace(/\/[a-f0-9-]{36}/gi, '/{uuid}')
    .replace(/https?:\/\/[^\s]+/g, '{url}');
  parts.push(normalizedMessage);

  if (error.stack) {
    const frames = error.stack
      .split('\n')
      .filter((line) => line.includes('at '))
      .slice(0, 3)
      .map((line) => {
        const match = line.match(/at\s+(.+?)\s+\((.+?):(\d+):\d+\)/);
        return match ? `${match[1]}@${match[2]}:${match[3]}` : line.trim();
      });
    parts.push(frames.join('|'));
  }

  return md5(parts.join('::'));
}

八、面试高频问题

1. 如何设计一个前端错误监控 SDK？

回答思路：

从架构分层说起：采集层 → 数据处理层 → 上报层。采集层覆盖 JS 错误（window.onerror + unhandledrejection）、资源错误（addEventListener('error', fn, true) 捕获阶段）、框架错误（ErrorBoundary / errorHandler）。数据处理层负责错误信息标准化、堆栈格式化、添加环境信息（UA、URL、面包屑）。上报层实现批量上报、采样、sendBeacon 兜底、离线缓存。强调插件化架构，每种能力独立成插件，可按需注册。

追问：如何避免监控 SDK 本身的错误影响业务？

用 try-catch 包裹 SDK 的所有对外逻辑；SDK 内部错误不走业务上报通道，走独立通道或静默丢弃；限制 SDK 内存占用（队列长度上限、面包屑条数上限）；SDK 加载失败不影响业务代码（异步加载 + 全局容错）。

2. sendBeacon 和 Fetch keepalive 有什么区别？适用场景是什么？

回答思路：

两者都能在页面卸载时可靠发送数据。sendBeacon 更简单，只支持 POST，不能设置自定义 Header，数据量限制 64KB。fetch + keepalive: true 功能更全，支持任意 method 和自定义 Header，但所有 keepalive 请求的 body 总和不能超过 64KB。实际选择时，如果只是简单的打点上报用 sendBeacon；需要带 Token 或自定义 Header 时用 fetch keepalive。两者在可靠性方面基本一致。

3. CLS 是如何计算的？为什么要用 Session Window？

回答思路：

CLS 衡量页面布局偏移的严重程度。每次布局偏移都有一个分数 = Impact Fraction × Distance Fraction。但直接累加所有偏移对长生命周期页面不公平（如 SPA 用户停留很久，CLS 会越来越大）。所以引入了 Session Window 算法：相邻偏移间隔 < 1s 的归为同一窗口，窗口最大时长 5s。CLS 取所有窗口中得分最高的那个窗口的总分。hadRecentInput 标志用于排除用户主动操作（点击按钮展开菜单）引起的偏移。

4. 如何解决跨域脚本 Script error 问题？

回答思路：

浏览器出于安全策略，跨域脚本的错误在 window.onerror 中只能拿到 Script error.，无法获取具体信息。解决方案是在 <script> 标签上添加 crossorigin="anonymous" 属性，同时在 CDN 服务器响应头中添加 Access-Control-Allow-Origin。这样浏览器就会以 CORS 方式加载脚本，允许暴露错误详情。注意如果配置了 crossorigin 但服务器没有正确的 CORS 头，脚本会加载失败。

5. INP 和 FID 的区别是什么？为什么 INP 取代了 FID？

回答思路：

FID 只测量首次交互的 Input Delay——从用户第一次点击/按键到浏览器开始处理事件回调的时间。它有两个盲点：一是只测第一次，页面后续可能越来越卡但 FID 检测不到；二是只测 Input Delay，不包括事件处理时间和渲染时间。INP 解决了这两个问题：它测量所有交互（不只是第一次），每次交互的延迟 = Input Delay + Processing Time + Presentation Delay，最终取 P98。INP 更真实地反映了用户在整个页面生命周期内的交互体验。

6. 性能数据采样率应该如何设定？

回答思路：

不同类型数据的采样策略不同。错误数据通常 100% 采集，因为每一个错误都可能影响用户体验，且错误发生频率相对较低。性能数据（Web Vitals）建议 10%~30%，因为数据量大但只需要统计分布特征（P50/P75/P95）。用户行为数据（PV、点击路径）5%~20%。关键是做一致性采样：对同一用户的同一次会话，所有数据要么全采要么全不采，避免数据断裂。可以用 hash(userId) % 100 < sampleRate * 100 来实现。

7. Source Map 上传到 Sentry 的流程是怎样的？为什么不能把 .map 文件部署到线上？

回答思路：

构建时通过 @sentry/webpack-plugin 或 sentry-cli 将 .map 文件上传到 Sentry 服务端，并关联 Release 版本号。上传后删除本地 .map 文件，只部署 .js 文件到 CDN。线上错误发生时，Sentry 根据 Release 号匹配对应的 Source Map 还原堆栈。不能把 .map 部署到线上的原因：Source Map 包含完整的源代码，任何人通过浏览器 DevTools 都可以看到，相当于公开了源码，存在安全风险和知识产权风险。

8. 自建监控系统相比用 Sentry 有什么优劣？

回答思路：

Sentry 优势：开箱即用、功能完善（错误追踪 + 性能监控 + Session Replay + Profiling）、社区活跃、文档齐全、支持 Source Map 还原、SDK 覆盖几乎所有主流框架。Sentry 的劣势：SaaS 版有数据隐私和合规问题、自部署版运维成本高、定制化受限、大数据量费用不低。

自建的优势：完全自定义（指标、看板、告警规则）、数据完全私有、与内部系统深度集成、长期成本可控。自建的劣势：开发成本极高（SDK + 网关 + 清洗 + 存储 + 可视化全链路）、需要专门团队维护。

建议：中小团队直接用 Sentry，大厂或有特殊合规要求的团队可以考虑自建。也可以折中方案：用 Sentry SDK 采集 + 自建后端存储和可视化。

九、延伸阅读

web.dev - Web Vitals — Google 官方 Web Vitals 文档
web-vitals 库 — Google 官方性能指标采集库
Sentry 官方文档 — Sentry 完整文档
rrweb 项目 — Web 录屏回放开源方案
W3C Performance Timeline — 性能 API 标准规范
W3C Navigation Timing Level 2 — 导航计时规范
W3C Long Tasks API — 长任务检测规范
W3C Event Timing API — 事件计时规范（INP 基础）
ClickHouse 文档 — 适用于监控场景的列式数据库
Chrome DevTools Performance 面板 — 性能调试工具

Web 性能监控与可观测性 ​

一、性能指标体系 ​

Core Web Vitals ​

LCP（Largest Contentful Paint） ​

INP（Interaction to Next Paint） ​

CLS（Cumulative Layout Shift） ​

其他关键指标 ​

指标之间的关系 ​

二、性能数据采集 ​

Performance API 基础 ​

PerformanceObserver ​

采集 Core Web Vitals ​

采集 LCP ​

采集 FCP ​

采集 CLS ​

采集 INP ​

web-vitals 库 ​

Resource Timing API ​

Long Tasks API ​

三、错误监控 ​

JS 运行时错误 ​

window.onerror ​

window.addEventListener('error') ​

Promise 未捕获异常 ​

资源加载错误 ​

框架错误捕获 ​

React ErrorBoundary ​

Vue errorHandler ​

跨域脚本错误 ​

Source Map 还原 ​

四、用户行为监控 ​

PV/UV 统计 ​

用户行为路径追踪（面包屑 Breadcrumb） ​

点击热力图 ​

录屏回放（rrweb） ​

五、上报策略 ​

上报方式 ​

上报方式对比 ​

上报优化策略 ​

采样 ​

批量上报 ​

离线缓存 ​

节流与去重 ​

六、Sentry 实践 ​

Sentry 核心功能 ​

Sentry SDK 集成 ​

React 项目集成 ​

Vue 项目集成 ​

Node.js 集成 ​

DSN、Release 与 SourceMap ​

七、自建监控系统设计 ​

整体架构 ​

SDK 核心设计 ​

错误聚合与指纹 ​

八、面试高频问题 ​

1. 如何设计一个前端错误监控 SDK？ ​

2. sendBeacon 和 Fetch keepalive 有什么区别？适用场景是什么？ ​

3. CLS 是如何计算的？为什么要用 Session Window？ ​

4. 如何解决跨域脚本 Script error 问题？ ​

5. INP 和 FID 的区别是什么？为什么 INP 取代了 FID？ ​

6. 性能数据采样率应该如何设定？ ​

7. Source Map 上传到 Sentry 的流程是怎样的？为什么不能把 .map 文件部署到线上？ ​

8. 自建监控系统相比用 Sentry 有什么优劣？ ​

九、延伸阅读 ​

Web 性能监控与可观测性

一、性能指标体系

Core Web Vitals

LCP（Largest Contentful Paint）

INP（Interaction to Next Paint）

CLS（Cumulative Layout Shift）

其他关键指标

指标之间的关系

二、性能数据采集

Performance API 基础

PerformanceObserver

采集 Core Web Vitals

采集 LCP

采集 FCP

采集 CLS

采集 INP

web-vitals 库

Resource Timing API

Long Tasks API

三、错误监控

JS 运行时错误

window.onerror

window.addEventListener('error')

Promise 未捕获异常

资源加载错误

框架错误捕获

React ErrorBoundary

Vue errorHandler

跨域脚本错误

Source Map 还原

四、用户行为监控

PV/UV 统计

用户行为路径追踪（面包屑 Breadcrumb）

点击热力图

录屏回放（rrweb）

五、上报策略

上报方式

上报方式对比

上报优化策略

采样

批量上报

离线缓存

节流与去重

六、Sentry 实践

Sentry 核心功能

Sentry SDK 集成

React 项目集成

Vue 项目集成

Node.js 集成

DSN、Release 与 SourceMap

七、自建监控系统设计

整体架构

SDK 核心设计

错误聚合与指纹

八、面试高频问题

1. 如何设计一个前端错误监控 SDK？

2. sendBeacon 和 Fetch keepalive 有什么区别？适用场景是什么？

3. CLS 是如何计算的？为什么要用 Session Window？

4. 如何解决跨域脚本 Script error 问题？

5. INP 和 FID 的区别是什么？为什么 INP 取代了 FID？

6. 性能数据采样率应该如何设定？

7. Source Map 上传到 Sentry 的流程是怎样的？为什么不能把 .map 文件部署到线上？

8. 自建监控系统相比用 Sentry 有什么优劣？

九、延伸阅读