Webpack进阶

Kira2023/3/26...大约 26 分钟

就目前来说，我可以使用webpack进行打包，并且能够对配置文件有针对性的进行优化，但这还远远不够。我现在还不具备能使用webpack api完整搭建一个自动化脚手架的能力，而这个能力需要依附于对webpack更加深入的学习。道阻且长，能学多少算多少。

Webpack 的工作流程

下面是Webpack官网主页的图：

这张图中我们可以很直观的看到webpack做了啥：将各种各样的资源进行打包，最终产出能在不同版本浏览器兼容运行的js文件。这中间的过程对于使用者而言是个黑盒，但是探究黑盒里面的运转流程有利于我们去更好的使用它。

简单来说，Webpack的打包流程有以下几个阶段：

读取入口文件：Webpack会根据配置文件中的入口文件来读取应用程序的代码。
解析模块依赖关系：Webpack会分析入口文件以及其所依赖的模块，并建立模块之间的依赖关系。
加载模块并转换代码：Webpack会根据不同的模块类型，使用相应的loader来加载和转换代码。
生成代码块：Webpack会根据配置文件中的输出选项，将所有的模块打包成一个或多个代码块。
输出打包后的代码：Webpack会将生成的代码块输出到指定的目录中，供应用程序使用。

下面对这个过程从Webpack真实运行的角度进行更加深入的分析。

我们使用Webpack，不管是用Webpack Cli+配置文件进行打包，还是直接调用Webpack api进行打包，都会传入一个配置对象。Webpack首先会调用validateSchema校验配置对象，之后调用getNormalizedWebpackOptions和applyWebpackOptionsBaseDefaults来产出最终配置。产出的配置会用于创建complier对象。下面是Webpack源码中的实现：

/**
 * @param {WebpackOptions} rawOptions options object
 * @returns {Compiler} a compiler
 */
const createCompiler = (rawOptions) => {
  const options = getNormalizedWebpackOptions(rawOptions);
  applyWebpackOptionsBaseDefaults(options);
  const compiler = new Compiler(options.context, options);
  new NodeEnvironmentPlugin({
    infrastructureLogging: options.infrastructureLogging,
  }).apply(compiler);
  if (Array.isArray(options.plugins)) {
    for (const plugin of options.plugins) {
      if (typeof plugin === "function") {
        plugin.call(compiler, compiler);
      } else {
        plugin.apply(compiler);
      }
    }
  }
  applyWebpackOptionsDefaults(options);
  compiler.hooks.environment.call();
  compiler.hooks.afterEnvironment.call();
  new WebpackOptionsApply().process(options, compiler);
  compiler.hooks.initialize.call();
  return compiler;
};

这个函数依次做了以下的事情：

使用 getNormalizedWebpackOptions 函数对传入的选项进行规范化处理。
使用 applyWebpackOptionsBaseDefaults 函数对选项进行基本默认值的应用。
创建一个Compiler实例，并将规范化后的选项作为参数传入。
应用 NodeEnvironmentPlugin 插件，其作用是为编译器应用Node.js环境相关的默认值。
如果传入的选项包含一个插件数组，则遍历这个数组并依次为编译器应用每个插件。
应用 applyWebpackOptionsDefaults 函数对选项进行默认值的应用。
调用编译器实例的 environment 和 afterEnvironment 钩子，分别表示环境准备完成和环境准备完成后的处理。
实例化 WebpackOptionsApply 类，并调用其 process 方法，用于为编译器应用各种选项和插件。
调用编译器实例的 initialize 钩子，表示编译器初始化完成。
返回编译器实例。

创建完compiler后，就是调用compiler.compile执行构建了。源码如下:

/**
     * @param {Callback<Compilation>} callback signals when the compilation finishes
     * @returns {void}
     */
    compile(callback) {
        //调用 this.newCompilationParams() 方法，生成一个新的编译参数对象 params
        const params = this.newCompilationParams();
        // 调用 this.hooks.beforeCompile.callAsync 方法，执行编译前的钩子函数。
        this.hooks.beforeCompile.callAsync(params, err => {
            // 如果有错误，直接通过回调函数返回错误，否则继续执行下一步。
            if (err) return callback(err);
            // 调用 this.hooks.compile.call 方法，执行编译的钩子函数。
            this.hooks.compile.call(params);
            // 调用 this.newCompilation(params) 方法，创建一个新的 Compilation 对象，用于表示这次编译的过程
            const compilation = this.newCompilation(params);
            const logger = compilation.getLogger("webpack.Compiler");
            logger.time("make hook");
            // 调用 this.hooks.make.callAsync 方法，执行 make 钩子函数
            this.hooks.make.callAsync(compilation, err => {
                logger.timeEnd("make hook");
                if (err) return callback(err);
                logger.time("finish make hook");
                // 调用 this.hooks.finishMake.callAsync 方法，执行 finishMake 钩子函数。
                this.hooks.finishMake.callAsync(compilation, err => {
                    logger.timeEnd("finish make hook");
                    if (err) return callback(err);
                    // 在 process.nextTick 中执行后续的编译过程，以保证异步执行的顺序。
                    process.nextTick(() => {
                        logger.time("finish compilation");
                        // 调用 compilation.finish 方法，完成编译过程。
                        compilation.finish(err => {
                            logger.timeEnd("finish compilation");
                            if (err) return callback(err);

                            logger.time("seal compilation");
                            // 调用 compilation.seal 方法，执行 seal 钩子函数。
                            compilation.seal(err => {
                                logger.timeEnd("seal compilation");
                                if (err) return callback(err);

                                logger.time("afterCompile hook");
                                // 调用 this.hooks.afterCompile.callAsync 方法，执行 afterCompile 钩子函数。
                                this.hooks.afterCompile.callAsync(compilation, err => {
                                    logger.timeEnd("afterCompile hook");
                                    if (err) return callback(err);
                                    return callback(null, compilation);
                                });
                            });
                        });
                    });
                });
            });
        });
    }

上面这个层层嵌套的回调看着有点哈人，不过简单的去理解的话就是串行的一系列过程。调用完conpiler.compile就进入构建阶段了。

构建阶段

关于这个阶段Webpack所做的事情，官方文档如是说:

当 webpack 处理应用程序时，它会根据命令行参数中或配置文件中定义的模块列表开始处理。从入口开始，webpack 会递归的构建一个 依赖关系图，这个依赖图包含着应用程序中所需的每个模块，然后将所有模块打包为少量的 bundle —— 通常只有一个 —— 可由浏览器加载。

在上一阶段调用conpiler.compile之后，由于EntryPlugin监听了make钩子，当make触发会调用compilation.addEntry。EntryPlugin的源码如下：

class EntryPlugin {
  /**
   * An entry plugin which will handle
   * creation of the EntryDependency
   *
   * @param {string} context context path
   * @param {string} entry entry path
   * @param {EntryOptions | string=} options entry options (passing a string is deprecated)
   */
  constructor(context, entry, options) {
    this.context = context;
    this.entry = entry;
    this.options = options || "";
  }

  /**
   * Apply the plugin
   * @param {Compiler} compiler the compiler instance
   * @returns {void}
   */
  apply(compiler) {
    compiler.hooks.compilation.tap(
      "EntryPlugin",
      (compilation, { normalModuleFactory }) => {
        compilation.dependencyFactories.set(
          EntryDependency,
          normalModuleFactory
        );
      }
    );

    const { entry, options, context } = this;
    const dep = EntryPlugin.createDependency(entry, options);

    compiler.hooks.make.tapAsync("EntryPlugin", (compilation, callback) => {
      compilation.addEntry(context, dep, options, (err) => {
        callback(err);
      });
    });
  }

  /**
   * @param {string} entry entry request
   * @param {EntryOptions | string} options entry options (passing string is deprecated)
   * @returns {EntryDependency} the dependency
   */
  static createDependency(entry, options) {
    const dep = new EntryDependency(entry);
    // TODO webpack 6 remove string option
    dep.loc = { name: typeof options === "object" ? options.name : options };
    return dep;
  }
}

具体来说，调用addEntry 方法会做以下几件事情（由于此处源码比较分散，就不贴了）：

调用 this._addModuleChain 方法，为入口模块创建一个模块链条。
调用 this._modules 对象的 add 方法，将入口模块添加到模块列表中。
调用 this._modules 对象的 addBuiltModule 方法，将入口模块标记为已构建的模块。
调用 this._entries 对象的 add 方法，将入口模块添加到入口模块列表中。
调用 this._entrypoints 对象的 setEntrypoint 方法，为入口模块创建一个入口点。

这样，Webpack就完成了对入口模块的处理，并将入口模块添加到了编译过程中。之后，Webpack会继续执行编译过程的其它阶段：

调用handleModuleCreation，根据文件类型构建 module 子类
调用loader-runner转译 module 内容，将各类资源类型转译为Webpack能够理解的标准 JavaScript文本（关于这个部分的细节，我打算后面在loader的章节里面补充）
调用acorn将 JavaScript 代码解析为AST结构
遍历AST，触发各种钩子，并根据模块间的依赖关系（import和require语句）形成依赖数组AST。
遍历完毕后，调用 module.handleParseResult 处理模块依赖数组，具体过程如下：
```
const handleParseResult = (result) => {
  this.dependencies.sort(
    concatComparators(
      compareSelect((a) => a.loc, compareLocations),
      keepOriginalOrder(this.dependencies)
    )
  );
  this._initBuildHash(compilation);
  this._lastSuccessfulBuildMeta = this.buildMeta;
  return handleBuildDone();
};
```
这个方法做了以下的事情：
- 对依赖数组进行了排序，这是为了确保依赖的引入顺序正确。排序的方式是先按照依赖文件的位置（loc）排序，然后再按照原始顺序排序。这个排序逻辑比较复杂，使用了两个比较函数，并通过concatComparators函数组合在一起。
- 排序后调用_initBuildHash方法，用于初始化构建哈希值。构建哈希值是为了在webpack进行增量编译时，能够快速地识别哪些文件已经发生了变化，从而只编译发生变化的文件。
- 将当前的buildMeta值赋给_lastSuccessfulBuildMeta，用于保存最后一次构建的元数据。这个元数据包括了构建的版本号、时间戳等信息，用于在后续的构建中进行比较，以判断文件是否发生了变化。
- 最后调用handleBuildDone方法，用于处理构建完成后的一些收尾工作。具体来说，这个方法会触发一些事件，以及清理一些缓存和状态信息，以便下一次构建的进行。
对于 module 新增的依赖，调用 handleModuleCreation，回到第一步

当这个过程执行完后，构建阶段就算结束了。

生成阶段

上一个阶段产出了modules和dependencies，生成阶段将这些东西组装成chunks，并且输出为最终产物。下面来详细聊聊这个过程。

在上一阶段结束后，make阶段执行完毕了。从我们刚才看过的compile的源码中可知 compiler.compile会调用compilation.seal 方法，执行seal钩子函数。

至于seal干了啥嘛，看了看源码，这段代码长的一批，看了一两眼我就放弃了。在这里抄下别人的答案：

::: notice

在这里推荐一本小册：Webpack5 核心原理与应用实践。下面的答案就是出自这本小册。小册里内容很新，主要是关于Webpack5的，包括基本使用、优化手段和内部原理。作者是字节的（字节这两年的发展还可以，开源了不少东西，尤其是工程化方面的轮子，似乎有把阿里顶掉成为国服第一前端的趋势...）

:::

创建本次构建的ChunkGraph对象。

const chunkGraph = new ChunkGraph(
  this.moduleGraph,
  this.outputOptions.hashFunction
);

遍历入口集合 compilation.entries：

for (const [name, { dependencies, includeDependencies, options }] of this
  .entries) {
  // ...
}

调用addChunk方法为每一个入口创建对应的Chunk对象（EntryPoint Chunk）:
```
const chunk = this.addChunk(name);
```

遍历该入口对应的Dependency集合，找到相应Module对象并关联到该Chunk:

for (const dep of [...this.globalEntry.dependencies, ...dependencies]) {
  entrypoint.addOrigin(null, { name }, /** @type {any} */ (dep).request);

  const module = this.moduleGraph.getModule(dep);
  if (module) {
    chunkGraph.connectChunkAndEntryModule(chunk, module, entrypoint);
    entryModules.add(module);
    const modulesList = chunkGraphInit.get(entrypoint);
    if (modulesList === undefined) {
      chunkGraphInit.set(entrypoint, [module]);
    } else {
      modulesList.push(module);
    }
  }
}

到这里可以得到若干Chunk，之后调用buildChunkGraph方法将这些Chunk处理成Graph结构，方便后续处理：

buildChunkGraph(this, chunkGraphInit);

之后，触发 optimizeModules/optimizeChunks 等钩子，由插件（如 SplitChunksPlugin进一步修剪、优化Chunk结构。
一直到最后一个Optimize钩子 optimizeChunkModules 执行完毕后，开始调用 compilation.codeGeneration方法生成Chunk代码
所有 Module 都执行完 codeGeneration，生成模块资产代码后，开始调用 createChunkAssets函数，为每一个 Chunk 生成资产文件。
调用 compilation.emitAssets函数“提交”资产文件，注意这里还只是记录资产文件信息，还未写出磁盘文件。
上述所有操作正常完成后，触发 callback 回调，控制流回到 compiler 函数。
最后，调用compiler 对象的emitAssets方法，输出资产文件（即最终产物）。

小结

从Webpack工作的角度而言，分三个阶段：初始化、构建、生成。初始化的是配置和Complier对象；构建的依据是entry，发生的过程是make，产物是Module和Dependency；生成的核心过程是seal，做的事情是把上一步的产物处理成Chunk，并最终产出打包产物。

阅读源码不是目的，包括后面的写打包器demo等，都不是最终的目的，而是通过实践来增加对Webpack的认识。我们不可能原原本本的再去造一个Webpack，但是如果要更深入理解其工作过程，就需要去看源码。

注

在这里分享下马克思主义的实践理论，与诸位共勉：

实践是认识的来源，是认识发展的根本动力，是检验认识正确与否的唯一标准。实践与认识是辩证统一的关系，实践决定认识，认识对实践有巨大的反作用。正确的科学的认识促进实践的发展，错误的认识阻碍实践的发展。认识要随着实践的发展而不断进步。

浅入 Webpack api

一个简单的打包示例

我们先从一个简单的案例开始来感受下webpack api。

新建一个空项目，在安装完webpack的依赖后，创建一个build文件夹，写入调用webpack api的逻辑：

build/index.js：

const webpack = require("webpack");
const path = require("path");
const { resolve } = path;

function build() {
  return webpack({
    mode: "production",
    entry: {
      index: "./index.js",
    },
    output: {
      path: resolve(__dirname.replace("/build", ""), "dist"),
      filename: "[name].js",
    },
  });
}
build().run();

然后在最外层文件夹中创建一个index.js:

const a = "Hello Webpack";
console.log(a);

执行node build/index.js可以进行打包，打包完可以在dist文件夹中看到产物内容。

实际上，我们会发现调用webpack方法传入的对象和我们使用webpack-cli时创建的webpack.config.js中的内容是一样的。可以猜想webpack-cli其实就是对webpack api调用的封装。如果要像umi.js和CRA那样不借助webpack-cli进行打包，就需要自己用webpack api去写一个构建脚手架。

Stat/Compilation

如果我们想要获取当次编译的时间，应该怎么做呢？

我们可以对上面的代码做下修改：

const webpack = require("webpack");
const path = require("path");
const { resolve } = path;

function build() {
  return webpack({
    mode: "production",
    entry: {
      index: "./index.js",
    },
    output: {
      path: resolve(__dirname.replace("/build", ""), "dist"),
      filename: "[name].js",
    },
  });
}
build().run((err, stats) => {
  // stats文档：https://webpack.docschina.org/configuration/stats/#root
  if (err || stats.hasErrors()) {
    console.error(err || stats.toString());
    return;
  }
  const buildTime = stats.endTime - stats.startTime;
  console.log(`Build time: ${buildTime}ms`);
});

修改文件执行构建后我们会发现控制台把当前的编译时间给打印了出来。从代码也可以看到，这部分信息是通过run中传入回调的第二个参数获取到的。这个参数就是Stats对象。

可以通过打印stats.toJSON()来看看里面有啥：

{
  hash: '5100c1e96bc575a7a08f',
  version: '5.76.3',
  time: 112,
  builtAt: 1679798156226,
  publicPath: 'auto',
  outputPath: '/Users/wangdanhui/Desktop/webpack-learning/dist',
  assetsByChunkName: { index: [ 'index.js' ] },
  assets: [
    {
      type: 'asset',
      name: 'index.js',
      size: 27,
      emitted: false,
      comparedForEmit: true,
      cached: false,
      info: [Object],
      chunkNames: [Array],
      chunkIdHints: [],
      auxiliaryChunkNames: [],
      auxiliaryChunkIdHints: [],
      filteredRelated: undefined,
      related: {},
      chunks: [Array],
      auxiliaryChunks: [],
      isOverSizeLimit: false
    }
  ],
  filteredAssets: undefined,
  chunks: [
    {
      rendered: true,
      initial: true,
      entry: true,
      recorded: false,
      reason: undefined,
      size: 48,
      sizes: [Object: null prototype],
      names: [Array],
      idHints: [],
      runtime: [Array],
      files: [Array],
      auxiliaryFiles: [],
      hash: '0af066dba06772d42a25',
      childrenByOrder: [Object: null prototype] {},
      id: 826,
      siblings: [],
      parents: [],
      children: [],
      modules: [Array],
      filteredModules: undefined,
      origins: [Array]
    }
  ],
  modules: [
    {
      type: 'module',
      moduleType: 'javascript/auto',
      layer: null,
      size: 48,
      sizes: [Object],
      built: true,
      codeGenerated: true,
      buildTimeExecuted: false,
      cached: false,
      identifier: '/Users/wangdanhui/Desktop/webpack-learning/index.js',
      name: './index.js',
      nameForCondition: '/Users/wangdanhui/Desktop/webpack-learning/index.js',
      index: 0,
      preOrderIndex: 0,
      index2: 0,
      postOrderIndex: 0,
      cacheable: true,
      optional: false,
      orphan: false,
      dependent: undefined,
      issuer: null,
      issuerName: null,
      issuerPath: null,
      failed: false,
      errors: 0,
      warnings: 0,
      id: 10,
      issuerId: null,
      chunks: [Array],
      assets: [],
      reasons: [Array],
      filteredReasons: undefined,
      usedExports: [],
      providedExports: null,
      optimizationBailout: [Array],
      depth: 0
    }
  ],
  filteredModules: undefined,
  entrypoints: [Object: null prototype] {
    index: {
      name: 'index',
      chunks: [Array],
      assets: [Array],
      filteredAssets: 0,
      assetsSize: 27,
      auxiliaryAssets: [],
      filteredAuxiliaryAssets: 0,
      auxiliaryAssetsSize: 0,
      children: [Object: null prototype] {},
      childAssets: [Object: null prototype] {},
      isOverSizeLimit: false
    }
  },
  namedChunkGroups: [Object: null prototype] {
    index: {
      name: 'index',
      chunks: [Array],
      assets: [Array],
      filteredAssets: 0,
      assetsSize: 27,
      auxiliaryAssets: [],
      filteredAuxiliaryAssets: 0,
      auxiliaryAssetsSize: 0,
      children: [Object: null prototype] {},
      childAssets: [Object: null prototype] {},
      isOverSizeLimit: false
    }
  },
  errors: [],
  errorsCount: 0,
  warnings: [],
  warningsCount: 0,
  children: []
}

上面的这些信息包括但不限于：

错误和警告(如果有的话)
计时信息
module和chunk信息

下面是Stats类的源码：

/*
    MIT License http://www.opensource.org/licenses/mit-license.php
    Author Tobias Koppers @sokra
*/

"use strict";

/** @typedef {import("../declarations/WebpackOptions").StatsOptions} StatsOptions */
/** @typedef {import("./Compilation")} Compilation */
/** @typedef {import("./stats/DefaultStatsFactoryPlugin").StatsCompilation} StatsCompilation */

class Stats {
  /**
   * @param {Compilation} compilation webpack compilation
   */
  constructor(compilation) {
    this.compilation = compilation;
  }

  get hash() {
    return this.compilation.hash;
  }

  get startTime() {
    return this.compilation.startTime;
  }

  get endTime() {
    return this.compilation.endTime;
  }

  /**
   * @returns {boolean} true if the compilation had a warning
   */
  hasWarnings() {
    return (
      this.compilation.warnings.length > 0 ||
      this.compilation.children.some((child) => child.getStats().hasWarnings())
    );
  }

  /**
   * @returns {boolean} true if the compilation encountered an error
   */
  hasErrors() {
    return (
      this.compilation.errors.length > 0 ||
      this.compilation.children.some((child) => child.getStats().hasErrors())
    );
  }

  /**
   * @param {(string|StatsOptions)=} options stats options
   * @returns {StatsCompilation} json output
   */
  toJson(options) {
    options = this.compilation.createStatsOptions(options, {
      forToString: false,
    });

    const statsFactory = this.compilation.createStatsFactory(options);

    return statsFactory.create("compilation", this.compilation, {
      compilation: this.compilation,
    });
  }

  toString(options) {
    options = this.compilation.createStatsOptions(options, {
      forToString: true,
    });

    const statsFactory = this.compilation.createStatsFactory(options);
    const statsPrinter = this.compilation.createStatsPrinter(options);

    const data = statsFactory.create("compilation", this.compilation, {
      compilation: this.compilation,
    });
    const result = statsPrinter.print("compilation", data);
    return result === undefined ? "" : result;
  }
}

module.exports = Stats;

可以看到这个对象提供了一些方法，为我们获取一些诸如编译起止时间、编译对象哈希值等信息提供了便利。上面的代码也是通过stats来获取startTime和endTime的（实际上也可以通过stats.toJson().time来直接获取编译时间）。

那么这个对象在何时被创建呢？实际上，在webpack内部执行onComplied的时候，Stats对象就被创建了，下面是onComplied的源码：

const onCompiled = (err, compilation) => {
  if (err) return finalCallback(err);
  if (this.hooks.shouldEmit.call(compilation) === false) {
    compilation.startTime = startTime;
    compilation.endTime = Date.now();
    const stats = new Stats(compilation);
    this.hooks.done.callAsync(stats, (err) => {
      if (err) return finalCallback(err);
      return finalCallback(null, stats);
    });
    return;
  }
  process.nextTick(() => {
    logger = compilation.getLogger("webpack.Compiler");
    logger.time("emitAssets");
    this.emitAssets(compilation, (err) => {
      logger.timeEnd("emitAssets");
      if (err) return finalCallback(err);

      if (compilation.hooks.needAdditionalPass.call()) {
        compilation.needAdditionalPass = true;

        compilation.startTime = startTime;
        compilation.endTime = Date.now();
        logger.time("done hook");
        const stats = new Stats(compilation);
        this.hooks.done.callAsync(stats, (err) => {
          logger.timeEnd("done hook");
          if (err) return finalCallback(err);

          this.hooks.additionalPass.callAsync((err) => {
            if (err) return finalCallback(err);
            this.compile(onCompiled);
          });
        });
        return;
      }
      logger.time("emitRecords");
      this.emitRecords((err) => {
        logger.timeEnd("emitRecords");
        if (err) return finalCallback(err);

        compilation.startTime = startTime;
        compilation.endTime = Date.now();
        logger.time("done hook");
        const stats = new Stats(compilation);
        this.hooks.done.callAsync(stats, (err) => {
          logger.timeEnd("done hook");
          if (err) return finalCallback(err);
          this.cache.storeBuildDependencies(
            compilation.buildDependencies,
            (err) => {
              if (err) return finalCallback(err);
              return finalCallback(null, stats);
            }
          );
        });
      });
    });
  });
};

该方法在编译完成后被调用。具体来说，当Webpack的Compiler对象发出一个done事件时，就会调用onCompiled函数。当然现在我们先不管那么多，主要是清楚stats是在哪里被怎样初始化的就行了。

注

关于run回调中的Stats对象中还有哪些值，可以查阅https://webpack.docschina.org/api/stats。

Webpack 模块化原理

对 commonJS 的处理

首先介绍下commonjs：commonjs是JavaScript的一种模块化方案，它定义了一套规范，使得JavaScript可以在服务器端运行，并且可以很好地管理模块依赖。它的核心思想是通过require()函数加载模块，通过module.exports和exports对象导出模块。

不过，浏览器并不支持这种模块化方案，所以需要借助其他工具如webpack来进行解析。

还是沿用前面的例子，我们新建一个文件夹utils，新建index.js文件写入以下内容：

/utils/index.js：

module.exports = function (a, b) {
  return a + b;
};

然后在打包入口index.js文件中写入如下内容：

const add = require("./utils/index");

console.log(add(1, 2));

修改下build方法中webpack的配置，便于我们查看产物中的内容：

function build() {
  return webpack({
    mode: "none",
    entry: {
      index: "./index.js",
    },
    output: {
      path: resolve(__dirname.replace("/build", ""), "dist"),
      filename: "[name].js",
      /** pathinfo
       *  告知 webpack 是否在 bundle 中引入「所包含模块信息」的相关注释
       *  如果使用mode='development',默认值为true,如果使用mode='production',默认值为false
       * */
      pathinfo: "verbose",
    },
  });
}

执行打包后，可以看到产物内容是这样的：

/******/ (() => {
  // webpackBootstrap
  /******/ var __webpack_modules__ = [
    ,
    /* 0 */ /* 1 */
    /*!************************!*\
  !*** ./utils/index.js ***!
  \************************/
    /*! unknown exports (runtime-defined) */
    /*! runtime requirements: module */
    /*! CommonJS bailout: module.exports is used directly at 1:0-14 */
    /***/ (module) => {
      module.exports = function (a, b) {
        return a + b;
      };

      /***/
    },
    /******/
  ];
  /************************************************************************/
  /******/ // The module cache
  /******/ var __webpack_module_cache__ = {};
  /******/
  /******/ // The require function
  /******/ function __webpack_require__(moduleId) {
    /******/ // Check if module is in cache
    /******/ var cachedModule = __webpack_module_cache__[moduleId];
    /******/ if (cachedModule !== undefined) {
      /******/ return cachedModule.exports;
      /******/
    }
    /******/ // Create a new module (and put it into the cache)
    /******/ var module = (__webpack_module_cache__[moduleId] = {
      /******/ // no module.id needed
      /******/ // no module.loaded needed
      /******/ exports: {},
      /******/
    });
    /******/
    /******/ // Execute the module function
    /******/ __webpack_modules__[moduleId](
      module,
      module.exports,
      __webpack_require__
    );
    /******/
    /******/ // Return the exports of the module
    /******/ return module.exports;
    /******/
  }
  /******/
  /************************************************************************/
  var __webpack_exports__ = {};
  // This entry need to be wrapped in an IIFE because it need to be isolated against other modules in the chunk.
  (() => {
    /*!******************!*\
  !*** ./index.js ***!
  \******************/
    /*! unknown exports (runtime-defined) */
    /*! runtime requirements: __webpack_require__ */
    /*
     * @Author: KiraZz1 1634149028@qq.com
     * @Date: 2023-03-26 09:13:19
     * @LastEditors: KiraZz1 1634149028@qq.com
     * @LastEditTime: 2023-03-26 11:35:49
     * @FilePath: /webpack-learning/index.js
     * @Description: 这是默认设置,请设置`customMade`, 打开koroFileHeader查看配置 进行设置: https://github.com/OBKoro1/koro1FileHeader/wiki/%E9%85%8D%E7%BD%AE
     */

    const add = __webpack_require__(/*! ./utils/index */ 1);

    // test
    console.log(add(1, 2));
  })();

  /******/
})();

通过上面的代码我们可以知道，webpack对commonjs语法进行了特殊处理。其中，__webpack_modules__是存储所有模块的数组，__webpack_module_cache__是存储模块缓存的对象，__webpack_require__是加载模块的函数。下面进行逐一分析：

__webpack_modules__的代码：
```
var __webpack_modules__ = [
  ,
  (module) => {
    module.exports = function (a, b) {
      return a + b;
    };
  },
];
```
这个数组是webpack将入口文件解析为AST，并进行深度优先遍历产生的，每个模块都由一个包裹函数对模块进行包裹构成，并且其中的模块id 为数组的index。其中会发现0号索引的值没有传，实际上moduleId为0的模块即为入口模块。

__webpack_require__负责加载模块，传入的参数就是前面说的moduleId。代码如下：

function __webpack_require__(moduleId) {
  var cachedModule = __webpack_module_cache__[moduleId];
  if (cachedModule !== undefined) {
    return cachedModule.exports;
  }
  // Create a new module (and put it into the cache)
  var module = (__webpack_module_cache__[moduleId] = {
    // no module.id needed
    // no module.loaded needed
    exports: {},
  });
  // Execute the module function
  __webpack_modules__[moduleId](module, module.exports, __webpack_require__);

  // Return the exports of the module
  return module.exports;
}

执行该函数时，当__webpack_module_cache__缓存存在对应moduleId模块时，则直接返回__webpack_module_cache__中的对应值；如果不存在，说明是初次加载，执行 moduleId在 __webpack_modules__ 中对应的包裹函数，执行并返回 module.exports，并缓存。

通过分析上面的过程，我们弄清楚了webpack是如何处理commonjs的代码的。不过有一个过程没有细说，那就是__webpack_modules__数组产生的详细过程，也就是模块收集的过程。

commonjs 的模块收集

前面提到，webpack将入口文件解析为AST，并进行深度优先遍历产生了__webpack_modules__数组。首先，我们要了解下啥是AST：

AST 解析过程

AST（Abstract Syntax Tree，抽象语法树）是一种树形的数据结构，它用于表示源代码的抽象语法结构。在计算机科学中，AST 是编译器中的重要概念，它是编译器将源代码转换为可执行代码的中间表示形式。
AST通常是由编译器在解析源代码时生成的，它可以表示源代码中的各种结构，如变量、函数、条件语句、循环语句等。AST所表示的语法结构是抽象的，它不涉及具体的语言实现细节，而是描述语言本身的语法结构。
AST的生成过程可以分为两个阶段：解析和构建。在解析阶段，编译器将源代码分解为一个个的词法单元，并将其转化为抽象语法结构。在构建阶段，编译器将 AST 中的节点转化为可执行代码。
AST可以被用于各种编程工具中，如代码编辑器、静态分析工具、代码转换工具等。在代码编辑器中，AST可以用于语法高亮、自动补全等功能；在静态分析工具中，AST可以用于代码检查、性能优化等；在代码转换工具中，AST可以用于代码重构、代码压缩等。

在前端工程化中，自然离不开AST，如：

ts转js
sass/less转css
es6转es5、jsx转js（babel）
lint、prettier

实际上，上面代码转换的过程抽象大体是一样的，即：代码转为AST（parse）、对AST进行操作（transform）、将新的AST解析为代码。

注

我们可以通过网站https://astexplorer.net/来看看诸如babel、tsc和eslint等的AST结构

首先聊聊parse的过程，该过程包括词法分析和语法分析两个阶段。

词法分析

在词法分析中，源代码会被分割成一个个token。每个token代表了代码中的一个语法单元，例如标识符、关键字、运算符、数字、字符串等。词法分析器会根据一些规则将源代码分割成不同的token。

例如，下面是一个简单的 JavaScript 代码：

var x = 1;

该代码包含了四个token：var、x、=和1。其中，var和=是关键字和运算符，x和1分别是标识符和数字。另一个例子是一个简单的 HTML 代码：

<!DOCTYPE html>
<html>
  <head>
    <title>My page</title>
  </head>
  <body>
    <h1>Hello, world!</h1>
    <p>This is my page.</p>
  </body>
</html>

该代码包含了许多不同的token，例如<!DOCTYPE html>、html、head、title、h1、p等等。这些token将被用来生成AST，以表示代码的语法结构。

语法分析

前面做完了词法分析后，形成的token流可以根据语法规则构建AST。语法规则定义了语言中合法的语法结构，如if语句、for循环、函数定义等。AST反映了这些语法结构的嵌套关系，以及语句、表达式之间的依赖关系。

上面提到的var x = 1的例子，形成的AST对象是这样子的：

{
  "type": "Program",
  "start": 0,
  "end": 188,
  "body": [
    {
      "type": "VariableDeclaration",
      "start": 179,
      "end": 188,
      "declarations": [
        {
          "type": "VariableDeclarator",
          "start": 183,
          "end": 188,
          "id": {
            "type": "Identifier",
            "start": 183,
            "end": 184,
            "name": "a"
          },
          "init": {
            "type": "Literal",
            "start": 187,
            "end": 188,
            "value": 1,
            "raw": "1"
          }
        }
      ],
      "kind": "var"
    }
  ],
  "sourceType": "module"
}

分析依赖关系

通过AST，可以很容易找到所有的 require 函数，从而确认模块的依赖关系，由于JS执行查找模块为深度优先搜索遍历，根据模块依赖对所有模块构造一个以深度优先的树。

如：

//index.js
const a = require("./a.js");
const b = require("./b,js");

// b.js
const c = require("c.js");

所以此时__webpack_modules__的结构应当是这样的：

0 ➡️ index.js
1 ➡️ a.js
2 ➡️ b.js
3 ➡️ c.js

实践：一个迷你的 JS 打包器

正所谓实践出真知，自己把上面的过程模拟一遍，也就算懂了。下面参考https://github.com/shfshanyue/mini-code/blob/master/code/bundle这个项目，来实现一个迷你的打包器。

首先创建一个文件夹，初始化后安装babel相关的依赖：

pnpm i @babel/core @babel/generator --save-dev

然后在根目录下的index.js开始打包器的开发。

首先我们要实现AST的解析，这个过程需要调用babel的能力。index.js代码如下：

const babel = require("@babel/core");

const fs = require("fs");

function packer(entryPath, outputPath) {
  const content = fs.readFileSync(entryPath, "utf-8");
  const ast = babel.parse(content, {
    sourceType: "module",
  });
  console.log(ast);
}

packer("./test/index.js");

然后新建test文件夹并新建index.js，写入如下内容：

const a = 1;

console.log(a);

执行项目根路径下index.js，可以看到控制台打印出了index.js代码对应的AST：

{
  type: 'File',
  start: 0,
  end: 27,
  loc: SourceLocation {
    start: Position { line: 1, column: 0, index: 0 },
    end: Position { line: 3, column: 14, index: 27 },
    filename: undefined,
    identifierName: undefined
  },
  errors: [],
  program: Node {
    type: 'Program',
    start: 0,
    end: 27,
    loc: SourceLocation {
      start: [Position],
      end: [Position],
      filename: undefined,
      identifierName: undefined
    },
    sourceType: 'module',
    interpreter: null,
    body: [ [Node], [Node] ],
    directives: []
  },
  comments: []
}

不过，这样还远远不够，我们要实现对require函数调用的处理。可以利用babel/core提供的traverse方法遍历AST，并找出require的函数调用：

index.js：

const babel = require("@babel/core");
const fs = require("fs");

function packer(entryPath, outputPath) {
  const content = fs.readFileSync(entryPath, "utf-8");
  const ast = babel.parse(content, {
    sourceType: "module",
  });
  babel.traverse(ast, {
    enter: ({ node }) => {
      if (node.type === "CallExpression" && node.callee.name === "require") {
        console.log(node);
      }
    },
  });
}
packer("./test/index.js");

当然，我们需要对我们待测试的文件进行一些修改：

test/index.js：

const sum = require("./sum");
console.log(sum(1, 2));

test/sum.js：

const sum = (...args) => args?.reduce((pre, cur) => pre + cur, 0);
module.exports = sum;

然后执行index.js，会发现require调用对应的节点会被打印出来：

{
  type: 'CallExpression',
  start: 13,
  end: 29,
  loc: SourceLocation {
    start: Position { line: 2, column: 12, index: 13 },
    end: Position { line: 2, column: 28, index: 29 },
    filename: undefined,
    identifierName: undefined
  },
  callee: Node {
    type: 'Identifier',
    start: 13,
    end: 20,
    loc: SourceLocation {
      start: [Position],
      end: [Position],
      filename: undefined,
      identifierName: 'require'
    },
    name: 'require'
  },
  arguments: [
    Node {
      type: 'StringLiteral',
      start: 21,
      end: 28,
      loc: [SourceLocation],
      extra: [Object],
      value: './sum.js'
    }
  ]
}

从上面的数据我们还可以知道，在node.arguments[0]中可以获取到依赖所在的路径。拿到这个路径后，我们可以进行同样的操作解析AST，寻找require调用。因为这是一个递归的过程，我们可以把逻辑进行抽取。下面的代码除了抽取了逻辑，还增加了一些东西，包括moduleId的记录，依赖数组的生成等：

index.js

const babel = require("@babel/core");

const generate = require("@babel/generator").default;

const fs = require("fs");

const path = require("path");

let moduleId = 0;

function buildModule(filename) {
  // 如果入口位置为相对路径，则根据此时的 __dirname 生成绝对文件路径
  filename = path.resolve(__dirname, filename);
  const code = fs.readFileSync(filename, "utf8");
  // 使用 babel 解析源码为 AST
  const ast = babel.parse(code, {
    sourceType: "module",
  });

  const deps = [];
  const currentModuleId = moduleId;

  babel.traverse(ast, {
    enter: ({ node }) => {
      // 根据 AST 定位到所有的 require 函数，寻找出所有的依赖
      if (node.type === "CallExpression" && node.callee.name === "require") {
        const argument = node.arguments[0];
        if (argument.type === "StringLiteral") {
          moduleId++; //找到一个新的模块后moduleId增加1
          const nextFilename = path.join(
            // 生成下一模块的文件路径
            path.dirname(filename),
            argument.value
          );
          // 如果 lodash 的 moduleId 为 3 的话
          // require('lodash') -> require(3)
          argument.value = moduleId;
          deps.push(buildModule(nextFilename));
        }
      }
    },
  });

  return {
    filename,
    deps,
    code: generate(ast).code,
    id: currentModuleId,
  };
}

function packer(entryPath, outputPath) {
  console.log(buildModule(entryPath));
}

packer("./test/index.js");

上面写的buildModule返回值是一个树状结构，需要进行拍平，方便之后处理：

// 把模块依赖由树结构更改为数组结构，方便更快的索引
//
// {
//   id: 0,
//   filename: A,
//   deps: [
//     { id: 1, filename: B, deps: [] },
//     { id: 2, filename: C, deps: [] },
//   ]
// }
// ====> 该函数把数据结构由以上转为以下
// [
//   { id: 0, filename: A }
//   { id: 1, filename: B }
//   { id: 2, filename: C }
// ]
function moduleTreeToQueue(moduleTree) {
  const { deps, ...module } = moduleTree;
  const moduleQueue = deps.reduce(
    (acc, m) => {
      return acc.concat(moduleTreeToQueue(m));
    },
    [module]
  );
  return moduleQueue;
}

注

数组拍平也是很常见的需求，不过实现的这么优雅的我还是第一次见，还是值得学习的。

完成到这一步后，我们的依赖数组就算是构建完了，其他需要做的的就是把数组处理下插入模版代码就可以了。打包器完整代码如下：

index.js :

const babel = require("@babel/core");
const generate = require("@babel/generator").default;
const fs = require("fs");
const path = require("path");

let moduleId = 0;

function moduleTreeToQueue(moduleTree) {
  const { deps, ...module } = moduleTree;

  const moduleQueue = deps.reduce(
    (acc, m) => {
      return acc.concat(moduleTreeToQueue(m));
    },
    [module]
  );

  return moduleQueue;
}

// 构建一个浏览器端中虚假的 Commonjs Wrapper
// 注入 exports、require、module 等全局变量，注意这里的顺序与 CommonJS 保持一致，但与 webpack 不一致，但影响不大
// 在 webpack 中，这里的 code 需要使用 webpack loader 进行处理
function createModuleWrapper(code) {
  return `
    (function(exports, require, module) {
      ${code}
    })`;
}

function buildModule(filename) {
  // 如果入口位置为相对路径，则根据此时的 __dirname 生成绝对文件路径
  filename = path.resolve(__dirname, filename);
  const code = fs.readFileSync(filename, "utf8");
  // 使用 babel 解析源码为 AST
  const ast = babel.parse(code, {
    sourceType: "module",
  });

  const deps = [];
  const currentModuleId = moduleId;

  babel.traverse(ast, {
    enter: ({ node }) => {
      // 根据 AST 定位到所有的 require 函数，寻找出所有的依赖
      if (node.type === "CallExpression" && node.callee.name === "require") {
        const argument = node.arguments[0];
        if (argument.type === "StringLiteral") {
          moduleId++; //找到一个新的模块后moduleId增加1
          const nextFilename = path.join(
            // 生成下一模块的文件路径
            path.dirname(filename),
            argument.value
          );
          // 如果 lodash 的 moduleId 为 3 的话
          // require('lodash') -> require(3)
          argument.value = moduleId;
          deps.push(buildModule(nextFilename));
        }
      }
    },
  });

  return {
    filename,
    deps,
    code: generate(ast).code,
    id: currentModuleId,
  };
}

function packer(entryPath, outputPath) {
  // 如同 webpack 中的 __webpack_modules__，以数组的形式存储项目所有依赖的模块
  const moduleTree = buildModule(entryPath);
  const modules = moduleTreeToQueue(moduleTree);
  const result = `
    // 统一扔到块级作用域中，避免污染全局变量
    // 为了方便，这里使用 {}，而不用 IIFE
    //
    // 以下代码为打包的三个重要步骤：
    // 1. 构建 modules
    // 2. 构建 webpackRequire，加载模块，模拟 CommonJS 中的 require
    // 3. 运行入口函数
    {
      // 1. 构建 modules
      const modules = [
        ${modules.map((m) => createModuleWrapper(m.code))}
      ]
      // 模块缓存，所有模块都仅仅会加载并执行一次
      const cacheModules = {}
      // 2. 加载模块，模拟代码中的 require 函数
      // 打包后，实际上根据模块的 ID 加载，并对 module.exports 进行缓存
      function webpackRequire (moduleId) {
        const cachedModule = cacheModules[moduleId]
        if (cachedModule) {
          return cachedModule.exports
        }
        const targetModule = { exports: {} }
        modules[moduleId](targetModule.exports, webpackRequire, targetModule)
        cacheModules[moduleId] = targetModule
        return targetModule.exports
      }
      // 3. 运行入口函数
      webpackRequire(0)
    }
    `;
  fs.writeFileSync(outputPath, result);
}

packer("./test/index.js", "./dist/output.js");

commonjs 和 esm

拼命创作中...