MetadataKeys Synchronization Service Overview

Overview & Problem Statement¶

The Metadata Keys Module is a dedicated standalone component designed to manage and retrieve metadata keys across the platform. This module replaces the legacy GET /datasets/metadataKeys endpoint.

Problem Addressed¶

The previous implementation in the Datasets service lacked a permission-based filtering layer. Because it attempted to return all global keys without ownership validation, it caused:

Performance: Significant latency when processing large datasets.
Stability: Crashes occurred when retrieval limits were missing or improperly configured.
Risks: Users could see metadata keys they did not have permissions to access.

Module Architecture¶

This module consists of a dedicated Controller and Service layer that implements a robust permission-aware logic.

MetadataKeysController¶

Provides the API interface for searching keys. Allowed filters can be found in src/metadata-keys/metadatakeys.service.ts and exmaple can be find in src/metadata-keys/types/metadatakeys-filter-content.ts

Endpoint: GET /metadatakeys (replaces /datasets/metadataKeys)
Method: findAll
Endpoint Access: Endpoint can be Accessed by any users

MetadataKeysService¶

This handles the business logic and talks to the database. It is divided into user-facing search logic and internal data synchronization.

Permission Layer (Applies to findAll only):¶

When a user searches for keys, the service uses accessibleBy to automatically append access filters based on CASL permissions:

Admins: Can search and get all metadata keys in the system.
Authenticated Users: Can only get keys where they are part of the ownerGroup or accessGroups.
Unauthenticated Users: Can only get keys that are marked as isPublished.

Service Methods:¶

findAll: The only public-facing method. It applies the permission layer and then uses a database aggregation pipeline to find and return the specific keys requested by the user. Every search is limited to 100 results by default, if limit is not provided.
insertManyFromSource: An internal method that takes an original document (like a Dataset), extracts fields from scientificMetadata, metadata, and customMetadata, and creates new records in the Metadata Keys collection.
deleteMany: Removes metadata key entries associated with a source document when that document is deleted from the system.
replaceManyFromSource: Triggered when a source document (e.g., a Dataset or Proposal) is updated. It calls deleteMany and insertManyFromSource sequentially.

Usage Example¶

To list all metadata keys associated with a dataset, the user must provide the sourceType and sourceId. If the fields array is provided, only those specific fields will be returned:

{
  "where": {
    "sourceType": "dataset",
    "sourceId": "datasetId"
  },
  "fields": ["humanreadableName", "key"],
  "limits": {
    "limit": 10,
    "skip": 0,
    "sort": {
      "createdAt": "asc | desc"
    }
  }
}

To retrieve a specific metadata key, use the following filter:

{
  "where": {
    "sourceType": "dataset",
    "sourceId": "datasetId",
    "key": "metadata_key_name"
  },
  "fields": ["key"],
  "limits": {
    "limit": 10,
    "skip": 0,
    "sort": {
      "createdAt": "asc | desc"
    }
  }
}