一个 Node.js 命令行程序的『死亡』

故事#

这是一个平平无奇的 Node.js 脚本，它帮你在未来的某一时刻做一些事情。

setTimeout(() => {
  // Do something in the future
}, 1000000)

随着故事越来越复杂，这个 Node.js 脚本规模越来越大，变成了一个 Node.js 命令行程序。

它会在启动时，做一些初始化，然后在一段时间后做一些事情，并且如果这个 Node.js 进程被突然终止（ctrl + c），你需要做一些清理工作，或者记录下中间状态等等。

// to-be.js

// Do something initialize

setTimeout(() => {
  // Do something in the future
}, 1000000)

process.on('SIGINT', () => {
  // Do something cleanup if this process is terminated
})

然而，你发现你刚才运行的这段脚本写错了 / 不想运行了等等，你希望结束它。于是，你疯狂地在 terminal 里按 ctrl + c，但是无事发生。

我们知道在 terminal 里按 ctrl + c 会给当前在前台运行的进程，发送一个 SIGINT 信号去终止它的运行。这里 Node.js 没有正常退出的原因是：Node.js 接收到 SIGINT 和 SIGTERM 这两个信号后的默认处理行为是退出当前的进程；如果你为这 2 个信号添加了自定义的回调函数，将会禁用这个默认行为（Node.js 不会退出）。

详情见 Node.js 文档 Signal events。

暴力的 Workaround#

然后，你就有了一个暴力的 workaround：

// not-to-be.js

setTimeout(() => {
  // Do something in the future
}, 1000000)

process.on('SIGINT', () => {
  // Do something cleanup if this process is terminated
  process.exit()
})

Okay，这确实可以解决很多问题。但是如果你有很多个回调函数，分别在不同的上下文下，做不同的清理工作呢？其中任何一个 process.exit 都会让进程突然暴毙，使得其他的清理回调函数没有执行。另一角度上说，滥用 process.exit 会给你的 Node.js 命令行程序的调试带来困难，有时你可能完全忘记了之前写过的 process.exit，然后你发现你的程序突然终止而抓耳挠腮，不知所措。如果你正在开发一个库，那么你更不能使用它，而是选择抛出异常等更加显式的方式终止。

『优雅』的解决方案？#

所以，一个更加『优雅』的解决方案是：

const timer = setTimeout(() => {
  // Do something in the future
}, 1000000)

process.on('SIGINT', () => {
  // Do something cleanup if this process is terminated
  clearTimeout(timer)
})

在接受到 SIGINT 信号后，手动清除之前启动的定时器。此时 Node.js 发现没有任何异步的任务正在 / 等待运行，那么它就会正常结束。这看起来更加 make sense。

但是，也会带来新的困难。随着你的 CLI 应用变得越来越复杂，你启动了更多的异步任务（更多的定时器，建立 TCP 服务器，运行子进程等等）。你可能忘记了哪些东西没有停止，导致上面这种『优雅』的解决方案，似乎有时候有一些不可靠，依赖于你是否给每一个任务都添加了对应的回收回调函数。

Why is node running#

这里有一个辅助你 debug 的工具：why-is-node-running，或者它的替代品 why-is-node-still-running。

在你的 Node.js 命令行程序的开始，引入的这个库在背后使用了 async_hooks API 监听了所有异步事件。下面是这个库的 demo：

const log = require('why-is-node-running') // should be your first require
const net = require('net')

function createServer () {
  const server = net.createServer()
  setInterval(function () {}, 1000)
  server.listen(0)
}

createServer()
createServer()

setTimeout(function () {
  log() // logs out active handles that are keeping node running
}, 100)

There are 5 handle(s) keeping the process running

# Timeout
/home/maf/dev/node_modules/why-is-node-running/example.js:6  - setInterval(function () {}, 1000)
/home/maf/dev/node_modules/why-is-node-running/example.js:10 - createServer()

# TCPSERVERWRAP
/home/maf/dev/node_modules/why-is-node-running/example.js:7  - server.listen(0)
/home/maf/dev/node_modules/why-is-node-running/example.js:10 - createServer()

# Timeout
/home/maf/dev/node_modules/why-is-node-running/example.js:6  - setInterval(function () {}, 1000)
/home/maf/dev/node_modules/why-is-node-running/example.js:11 - createServer()

# TCPSERVERWRAP
/home/maf/dev/node_modules/why-is-node-running/example.js:7  - server.listen(0)
/home/maf/dev/node_modules/why-is-node-running/example.js:11 - createServer()

# Timeout
/home/maf/dev/node_modules/why-is-node-running/example.js:13 - setTimeout(function () {

@breadc/death#

沿用暴力的 Workaround，但是我们可以搞一个集中的事件总线，用它作为 SIGINT 等终止信号的回调函数，提供更多复杂的回收功能。于是，我自己手搓 @breadc/death 这个库。

// 节选自 https://github.com/yjl9903/Breadc/blob/main/packages/death/src/death.ts

const emitter = new EventEmitter();

const handlers = {
  SIGINT: makeHandler('SIGINT')
};

function makeHandler(signal: NodeJS.Signals) {
  return async (signal: NodeJS.Signals) => {
    const listeners = emitter.listeners(signal);

    // Iterate all the listener by reverse
    for (const listener of listeners.reverse()) {
      await listener(signal);
    }

    // Remove listener to restore Node.js default behaviour
    // and avoid infinite loop
    process.removeListener('SIGINT', handlers.SIGINT);
    process.kill(process.pid, context.kill);    
  };
}

export function onDeath(callback: OnDeathCallback): () => void {
  process.on('SIGINT', handlers.SIGINT);
  emitter.addListener('SIGINT', callback);
  return () => {
    emitter.removeListener('SIGINT', callback)
  };
}

可以看到，我们这里用自己创建的 emitter 替换了 Node.js 内置的 process 上的事件总线。在调用自己的 onDeath 注册回调函数时，在本来的 process.on('SIGINT', ...) 注册的是我们自己的回调函数（重复注册同一个函数会保留多个，此处省略了相关处理），然后使用 emitter 自己维护回调函数。

在触发 SIGINT 信号后，我们会拿出一份所有回调函数的数组的拷贝，用逆序运行它们。可以理解为，这些回收资源的回调函数，往往要么顺序无关，要么可能需要按照它们分配的顺序进行回收，因此选择使用逆序。

最后，将我们这里 SIGINT 的回调函数移除，以恢复 Node.js 的默认退出行为，并重新发送一遍接受到的终止信号。

The Death of a Node.js Process#

与『你的 Node.js 进程为什么还在运行？』对称的另一个话题是：『你的 Node.js 进程为什么会突然暴毙？』

以下这几种原因会导致 Node.js 的进程异常终止：

操作	例子
手动退出进程	`process.exit(1)`
未捕获的异常	`throw new Error()`
未处理的 Promise reject	`Promise.reject()`
被忽略的 error 事件	`EventEmitter#emit('error')`
未处理的信号	`$ kill <PROCESS_ID>`

表格引用自 The Death of a Node.js Process。
此博客内容还包含一些 Node.js 如何错误处理的 Tips 可以扩展阅读。

未捕获的异常可以手动在可能有问题的位置 try catch，或者程序入口最顶层进行 try catch，或者你可以监听 uncaughtException 这个事件：

process.on('uncaughtException', error => {
  console.error(error)
})

未处理的 Promise reject 可以监听 unhandledRejection 这个事件：

process.on('unhandledRejection', error => {
  console.error(error)
})

因此，对 uncaughtException 和 unhandledRejection 这两个事件，SIGINT, SIGTERM, SIGQUIT 三个终止信号注册回调函数，可以帮助你更加 robust 的处理一个 Node.js 命令行程序『临终』时应该做些什么。