Exploring the Internal Implementation of `@faker-js/faker`, The Mechanism of Dummy Data Generation

Introduction

@faker-js/faker (hereafter referred to as faker) is a library for generating dummy data. It is a useful tool for testing and creating mock data.

This article aims to provide a brief overview of how faker operates internally, enhancing our understanding of its implementation.

TL;DR

const firstName = faker.person.firstName();

The above code is the process when the firstName method of the person module is executed.
This is a diagram illustrating the process when the above code is executed.

Faker Instance

Faker has the following concepts:
Although there may be other things to mention, this article will focus on these concepts.

  • Modules (for basic datatypes and topic specific modules)
  • Definitions
  • Helpers
  • Randomizer

Modules (for basic datatypes and topic specific modules)

There are basic datatypes (for handling dates, strings, numbers, etc.) modules and topic specific (for handling Animal, Food, Person, etc.) modules.

// basic datatypes
const randomNumber = faker.number.int(); // 2900970162509863

// topic specific
const randomBear = faker.animal.bear(); // 'Asian black bear'

Definitions

Definitions are data used to actually generate dummy data.
Definitions vary depending on the locale value passed to the Faker instance.

For example, if you want to handle Japanese data, you will import fakerJA and use it in your application as follows.

import { fakerJA } from '@faker-js/faker';

This is what faker does internally.

src/locale/ja.ts
import ja from '../locales/ja';

export const faker = new Faker({
  locale: [ja, en, base], // set locale to `ja`
});

ja refers to the list of definitions for the ja locale. It has the following data.

src/locales/ja/person.ts
const ja: LocaleDefinition = {
  company: {
    category: ['ガス', '保険', '印刷', ...],
    // omitted
  },
  person: {
    first_name: {
      generic: ['あゆみ', 'きみ', ...],
      female: ['千代子', ...],
      male: ['正一', ...],
    },
    // omitted
  },
  // omitted
};

Faker generates dummy data based on these definitions.

Helpers

Helpers are helper methods for generating dummy data.
Strictly speaking, they are a module, but they are slightly different from the Modules mentioned earlier, so I have deliberately separated them.

The most useful method is arrayElement(), which returns a random element from an array.

const randomAnimal = faker.helpers.arrayElement(['cat', 'dog', 'mouse']); // 'dog'

Other useful methods include weightedArrayElement(), which allows you to assign weights to elements of an array and return a random element based on the weights, and fromRegExp(), which returns a random string that matches a regular expression (with some restrictions).

For more information, please refer to the Helpers documentation.

Randomizer

Randomizer refers to the process of generating random numbers, and Faker allows you to customize this.
By default, Faker uses a pseudo-random number generator called Mersenne Twister. It seems to generate random numbers by implementing the "MT19937 algorithm" in C and then converting it to TypeScript.

For more information, please refer to the implementation.

This Randomizer is used to generate random numbers, for example, in the int() method of the Number module.

faker.number.int(); // 2900970162509863
faker.number.int(100); // 52
faker.number.int({ min: 1000000 }); // 2900970162509863
faker.number.int({ max: 100 }); // 42
faker.number.int({ min: 10, max: 100 }); // 57
faker.number.int({ min: 10, max: 100, multipleOf: 10 }); // 50

This int() method is also used to generate a random number with an upper limit of array.length - 1 for the array passed as an argument in the arrayElement() method mentioned in the Helpers section.

src/modules/helpers/index.ts
  /**
   * Returns random element from the given array.
   *
   * @template T The type of the elements to pick from.
   *
   * @param array The array to pick the value from.
   *
   * @throws If the given array is empty.
   *
   * @example
   * faker.helpers.arrayElement(['cat', 'dog', 'mouse']) // 'dog'
   *
   * @since 6.3.0
   */
  arrayElement<const T>(array: ReadonlyArray<T>): T {
    if (array.length === 0) {
      throw new FakerError('Cannot get value from empty dataset.');
    }

    const index =
      array.length > 1 ? this.faker.number.int({ max: array.length - 1 }) : 0;

    return array[index];
  }

Person Module

This module generates personal information such as names and job titles.

For more information, please refer to this.

Let's look at the implementation of the firstName method.

The firstName method is a method for randomly obtaining a person's first name in the Person module.

Based on what we've seen so far, let's check the implementation.

The following is the implementation of the firstName method.

src/modules/person/index.ts
  firstName(sex?: SexType): string {
    return this.faker.helpers.arrayElement(
      selectDefinition(
        this.faker,
        sex,
        this.faker.definitions.person.first_name
      )
    );
  }

Let's check the implementation of the selectDefinition() method.

src/modules/person/index.ts
/**
 * Select a definition based on given sex.
 *
 * @param faker Faker instance.
 * @param sex Sex.
 * @param personEntry Definitions.
 *
 * @returns Definition based on given sex.
 */
function selectDefinition<T>(
  faker: Faker,
  sex: SexType | undefined,
  personEntry: PersonEntryDefinition<T>
): T[] {
  const { generic, female, male } = personEntry;
  switch (sex) {
    case Sex.Female: {
      return female ?? generic;
    }

    case Sex.Male: {
      return male ?? generic;
    }

    default: {
      return (
        generic ??
        faker.helpers.arrayElement([female, male]) ??
        // The last statement should never happen at run time. At this point in time,
        // the entry will satisfy at least (generic || (female && male)).
        // TS is not able to infer the type correctly.
        []
      );
    }
  }
}

As you can see from the JSDoc comment, the selectDefinition() method returns an array of definitions based on the sex value given.
As we saw earlier in the definitions section, definitions are as follows.
If sex is undefined, it returns the generic definition array, and otherwise it returns the female or male definition array.

src/locales/ja/person.ts
const ja: LocaleDefinition = {
  company: {
    category: ['ガス', '保険', '印刷', ...],
    // omitted
  },
  person: {
    first_name: {
      generic: ['あゆみ', 'きみ', ...],
      female: ['千代子', ...],
      male: ['正一', ...],
    },
    // omitted
  },
  // omitted
};

The arrayElement() method, as we saw earlier in the Helpers section, returns a random element from an array.

So, to summarize, the firstName() method:

  • obtains data from definitions
  • returns a random element from the obtained definition array

It can be seen that a random firstName is output through the above process.

Conclusion

In this article, we briefly looked at the internal implementation of faker.
The topic-specific data that returns various results when executed is somewhat magical, but in reality, it is generating random numbers from predefined data and obtaining them. It's interesting.

The bio() method of the Person module, for example, defines the output pattern, and other methods or combinations with the definition of other modules result in various outputs, so if you are interested, please take a look.

Resources