Replication - Forest Documentation

Replication datasources are only available for Node.js.

The replication strategy maintains a copy of target API data in an internal cache controlled by your back-end, rather than querying the API in real-time.

Minimal replica datasource architecture — A minimal replica datasource: cache controlled by the agent

Overview

Key advantages

No query translation: No query translation logic required
Performant: Eliminates synchronous network calls to the target API
Feature-complete: Charts, filtering, and search work out of the box
Flexible: Implement custom logic for fetching target API data
Robust: Recover bad states by reconstructing the replica from scratch

Minimal implementation

Node.js

const { createReplicaDataSource } = require('@forestadmin/datasource-replica');
const axios = require('axios');

const myCustomDataSource = createReplicaDataSource({
  pullDumpHandler: async () => {
    const url = 'https://jsonplaceholder.typicode.com';
    const collections = ['posts', 'comments', 'albums', 'photos', 'users'];
    const entries = [];

    for (const collection of collections) {
      const response = await axios.get(`${url}/${collection}`);
      entries.push(...response.data.map(record => ({ collection, record })));
    }

    return { more: false, entries };
  },
});

agent.addDatasource(myCustomDataSource);

This basic implementation fetches all records at startup but doesn’t update them afterward.

Known limitations & solutions

Limitation	Solution
Full data dump required at each startup	Implement persistent cache
Empty collections and foreign keys not auto-detected	Provide explicit schema definition
Data never updates after initial import	Implement update handlers
Read-only data	Implement write handlers
Nested fields and arrays in API responses	Use record flattener utility

Persistent cache

The Forest Node.js back-end uses a SQL database as its underlying cache mechanism. By default, an in-memory SQLite database is used.

Limitations of in-memory cache

The default in-memory approach presents two main challenges:

Extended startup time: The back-end must re-fetch all data from the target API on each restart
High memory consumption: All data remains in memory, which becomes problematic for large datasets

When to use persistent cache

Depending on which API you are targeting, it may be absolutely fine to use an in-memory cache for smaller datasets. However, larger systems like CRMs or databases containing millions of records benefit significantly from persistent storage.

Cache initialization

Forest will automatically detect when the schema of the tables in the caching database does not match the schema of the target API. When mismatches occur, tables and indexes are dropped, recreated, and repopulated from the target API.

Configuration options

cacheInto: Accepts a connection string or configuration object for the SQL connector
cacheNamespace: Prefixes table names, useful for sharing databases or running multiple replicas

Important: No locking mechanism currently exists for concurrent writes when multiple back-end instances share the same cache configuration.

SQLite file example

Node.js

const myCustomDataSource = createReplicaDataSource({
  cacheInto: 'sqlite:/tmp/my-cache.db',
  pullDumpHandler: async () => {
    return { more: false, entries: [] };
  },
});

PostgreSQL example

Node.js

const myCustomDataSource = createReplicaDataSource({
  cacheInto: {
    uri: 'postgres://xxxx:[email protected]/neondb',
    sslMode: 'verify',
  },
  cacheNamespace: 'my-custom-data-source',
  pullDumpHandler: async () => {
    return { more: false, entries: [] };
  },
});

Updating the replica

Real-world scenarios require keeping the Forest back-end to display up-to-date data.

Three update methods

Use these approaches independently or combine them:

Scheduled rebuilds - Refetch all records periodically
Change polling - Uses Forest events to detect modifications
Change pushing - Leverages target API events via webhooks

The target API feeds a replica cache held by the Forest back-end via three update methods (scheduled rebuild, change polling, and change pushing), and Forest queries the replica

Scheduled rebuilds

Scheduled rebuilds represent the simplest approach for updating replica data by fetching all records from a target API at regular intervals. This method works with any API but is less efficient for large datasets since it requires fetching all records regardless of changes.

Configuration options

pullDumpOnRestart: When set to true, data fetches on each back-end startup. This is always enabled for default in-memory cache implementations. pullDumpOnSchedule: Accepts cron-like schedule patterns for periodic updates. For example: ['0 0 0 * * *', '0 30 18 * * *'] triggers daily at midnight and 6:30 PM.

Schedule syntax

The system uses the croner NPM package for schedule parsing with this format:

┌─ second (0-59)
│ ┌─ minute (0-59)
│ │ ┌─ hour (0-23)
│ │ │ ┌─ day of month (1-31)
│ │ │ │ ┌─ month (1-12)
│ │ │ │ │ ┌─ day of week (0-6)
* * * * * *

Common examples:

* * * * * * - Every second
0 * * * * * - Every minute
0 0 9 * * 1 - Mondays at 9am

Handler implementation

The pullDumpHandler returns entries for import and supports pagination. The request object provides previousDumpState (for change detection), cache access, and reasons (startup/schedule triggers). The response object specifies entries to import, pagination via more flag, and state persistence through nextDumpState and nextDeltaState fields. Key advantage: Old data remains available to users until new data processing completes, preventing service disruption.

Change polling

Change polling is a strategy for updating replica data sources by fetching only records that have changed, rather than pulling all data from the target API on each update.

When to poll for changes

Four triggering events are available:

pullDeltaOnRestart: Handler executes when the back-end restarts
pullDeltaOnSchedule: Handler runs on a cron-like schedule (same syntax as pullDumpOnSchedule)
pullDeltaOnBeforeAccess: Handler executes before each datasource access; GUI blocks until completion
pullDeltaOnAfterWrite: Handler executes after each write operation; GUI blocks until completion

Optional delay feature: pullDeltaOnBeforeAccessDelay (milliseconds) groups multiple requests sent during the delay period, reducing calls to your target API. Set to 0 to disable.

Handler implementation

Implement a pullDeltaHandler function that receives a request object containing:

previousDeltaState: Persisted state from previous calls
affectedCollections: Collections being accessed or written to
cache: Interface for reading cached data
reasons: Array explaining why the handler was invoked

The handler should return a response object with:

more: Boolean indicating if additional changes exist (triggers immediate re-call)
nextDeltaState: State persisted for subsequent handler invocations
newOrUpdatedEntries: Records created or modified since last call
deletedEntries: Records removed since last call

Push & webhooks

The push strategy keeps replicas up-to-date when APIs expose change-following capabilities through webhooks, WebSockets, long polling, or similar mechanisms.

Handler programming

Unlike the pull strategy, developers are responsible for setting up subscriptions to the target API. The back-end calls your handler during startup to establish these subscriptions, and you send changes to the back-end for replica updates.

Request object structure

The request provides:

getPreviousDeltaState(): Fetches delta state asynchronously, useful when mixing push and pull strategies
cache: Interface for reading from the cache

onChange payload structure

The payload includes:

nextDeltaState (optional): Updated delta state for recovery on back-end restart
newOrUpdatedEntries: Array of created/updated records with collection and record data
deletedEntries: Array of deleted records (full record not required)

Example: CouchDB change feed

Using the nano library to subscribe to CouchDB’s changes stream:

Node.js

const { createReplicaDataSource } = require('@forestadmin/datasource-replica');
const nano = require('nano');

const myCustomDataSource = createReplicaDataSource({
  pushDeltaHandler: async (request, onChanges) => {
    const stream = nano.db.changesAsStream('books', {
      include_docs: true,
      since: await request.getPreviousDeltaState(),
    });

    stream.on('data', change => {
      onChanges({
        nextDeltaState: change.seq,
        newOrUpdatedEntries: !change.deleted
          ? [{ collection: 'books', record: { _id: change.id, ...change.doc } }]
          : [],
        deletedEntries: change.deleted
          ? [{ collection: 'books', record: { _id: change.id } }]
          : [],
      });
    });
  },
});

Example: webhook implementation

Using Express to receive webhooks on a separate port:

Node.js

const { createReplicaDataSource } = require('@forestadmin/datasource-replica');

const myCustomDataSource = createReplicaDataSource({
  pushDeltaHandler: async (request, onChanges) => {
    const app = express();
    app.use(express.json());

    app.post('/webhooks/on-book-:type(created|change|deleted)', (req, res) => {
      onChanges({
        newOrUpdatedEntries:
          req.params.type === 'created' || req.params.type === 'change'
            ? [{ collection: 'book', record: req.body }]
            : [],
        deletedEntries:
          req.params.type === 'deleted'
            ? [{ collection: 'book', record: { id: req.body.id } }]
            : [],
      });

      res.status(204).send();
    });

    app.listen(3000);
  },
});

Schema & references

Schema auto-discovery

When no explicit schema is provided, the back-end attempts to auto-discover structure from imported data. However, this approach has limitations:

Empty collections cannot be imported
Performance overhead from sampling data
Primary keys must be named id
Composite primary keys unsupported
Foreign keys aren’t automatically detected

Providing a schema

Supply a schema via the createReplicaDataSource function to avoid auto-discovery limitations. The schema can be static or dynamically generated through Promises or async functions.

Schema syntax

Collection definition includes:

name: Collection identifier
fields: Object containing field definitions, supporting nested objects and arrays

Field definition properties (type required):

Type options: Boolean, Integer, Number, String, Date, Dateonly, Timeonly, Binary, Enum, Json, Point, Uuid
defaultValue: Initial value for new records
enumValues: Possible values for Enum types
isPrimaryKey: Marks primary key fields
isReadOnly: Read-only designation
unique: Uniqueness constraint
validation: Array of validation rules
reference: Defines foreign key relationships with target collection details

Handling complex data

Flatten mode addresses limitations with nested structures and arrays. Options include auto or manual modes, similar to Mongoose driver configuration. When enabled, flatten mode:

Automatically transforms nested records
Creates virtual collections for arrays
Uses @@@ as field separator in flattened output
Generates synthetic IDs and foreign keys for relationships

Important: Original records in handlers remain unflattened; transformation occurs during cache import only.

Write handlers

Implementation requirements

Three optional handlers can be implemented: createRecordHandler, updateRecordHandler, and deleteRecordHandler. Omit any handler for operations not needed. The createRecordHandler function uniquely supports return values, which proves useful when the target API auto-generates record IDs.

Code example

Node.js

const axios = require('axios');
const { createReplicaDataSource } = require('@forestadmin/datasource-replica');

const url = 'https://jsonplaceholder.typicode.com';

const myCustomDataSource = createReplicaDataSource({
  // Record synchronization implementation...

  createRecordHandler: async (collectionName, record) => {
    const response = await axios.post(`${url}/${collectionName}`, record);
    return response.data;
  },

  updateRecordHandler: async (collectionName, record) => {
    await axios.put(`${url}/${collectionName}/${record.id}`, record);
  },

  deleteRecordHandler: async (collectionName, record) => {
    await axios.delete(`${url}/${collectionName}/${record.id}`);
  },
});

Key takeaways

All three write handlers remain optional
Create handlers can return newly generated IDs from the API
Update and delete handlers perform remote operations without returning values
The handlers abstract the communication layer between Forest and external APIs

Want to share your custom datasource with the community? Check out the Forest experimental repository to contribute.

​Overview

​Key advantages

​Minimal implementation

​Known limitations & solutions

​Persistent cache

​Limitations of in-memory cache

​When to use persistent cache

​Cache initialization

​Configuration options

​SQLite file example

​PostgreSQL example

​Updating the replica

​Three update methods

​Scheduled rebuilds

​Configuration options

​Schedule syntax

​Handler implementation

​Change polling

​When to poll for changes

​Handler implementation

​Push & webhooks

​Handler programming

​Request object structure

​onChange payload structure

​Example: CouchDB change feed

​Example: webhook implementation

​Schema & references

​Schema auto-discovery

​Providing a schema

​Schema syntax

​Handling complex data

​Write handlers

​Implementation requirements

​Code example

​Key takeaways

Overview

Key advantages

Minimal implementation

Known limitations & solutions

Persistent cache

Limitations of in-memory cache

When to use persistent cache

Cache initialization

Configuration options

SQLite file example

PostgreSQL example

Updating the replica

Three update methods

Scheduled rebuilds

Configuration options

Schedule syntax

Handler implementation

Change polling

When to poll for changes

Handler implementation

Push & webhooks

Handler programming

Request object structure

onChange payload structure

Example: CouchDB change feed

Example: webhook implementation

Schema & references

Schema auto-discovery

Providing a schema

Schema syntax

Handling complex data

Write handlers

Implementation requirements

Code example

Key takeaways