Skip to content

Configuration Development

As scicat-ingestor communicates with various frameworks via network, there are many configurations to maintain and handle.

Therefore we decided to group them as a nested dataclass and keep the dataclass as a source of truth.
Then it is exported as a json file in resources/config.sample.yml which has all possible options with their
default values.

See Architecture Decision Records to see why.

Argument parser is also automatically built by scicat_configuration.build_arg_parser.

Update Configuration

If you need to change anything in the configuration,
you will have to update data classes and sync the dataclasses with the json config file.

There should be one entry configuration that contains all configurations for the scicat ingestor program.
online ingestor and offline ingestor use different configuration class.

online ingestor offline ingestor
OnlineIngestorConfig OfflineIngestorConfig

Once any configuration is updated, it should be exported to json using this command that comes with the package.

# It calls `synchronize_config_file` function in `scicat_configuration` module.
scicat_synchronize_config

There is also a unit test to check if they are in sync so don't worry about forgetting them.
CI will scream about it...!

Using Configuration

Any other modules that needs configurations should always use the configuration dataclass object, instead of a plain dictionary.

Tip

Sometimes, it is easier to have a read-only property in configuration dataclass object that is built as it is read based on the other configuration properties.

For example:

@dataclass(kw_only=True)
class SciCatOptions:
    host: str = "https://scicat.host"
    token: str = "JWT_TOKEN"
    additional_headers: dict = field(default_factory=dict)
    ...

    @property
    def headers(self) -> dict:
        return {
            **self.additional_headers,
            **{"Authorization": f"Bearer {self.token}"},
        }

    ...

And communication module can simply access to scicat_options.hearders instead of building the header itself.

Argument Parser

As the argument parser is automatically built, there are some manual argument-configuration registries in scicat_configuration.py.

Helper Text Registry

scicat_configuration._HELP_TEXT

Why though? Unfortunately dataclass properties docstring cannot be parsed dynamically.
Therefore we made this registry to add useful help text to certain arguments.

_HELP_TEXT mapping proxy that holds all mappings from the long name to custom help-text.
The keys should be the long name of the argument including its group without --.

For example, if you want to add helper text to dry-run, you have to add
"ingestion.dry-run": "Dry run mode. No data will be sent to SciCat." to the registry.

Short Name Registry

scicat_configuration._SHORTENED_ARG_NAMES

Why though? Most of arguments will be passed to the ingestor from `config file`.
However `config-file` option can't be passed from the `config file` (obviously).
To make it more convenient to start the ingestor we wanted to give it a short name.

_SHORTENED_ARG_NAMES mapping proxy that holds all mappings from the long name to short name.
The keys should be the long name of the argument including its group without --.

For example, if you want to use d for dry run configuration, you have to add
"ingestion.dry-run": "d" to the registry.