Online Ingestor
The online-ingestor
is an asynchronous active daemon program.
It means online-ingestor
pulls notification/information from a message broker and process them as it finds, instead of synchronously triggered by the file being written.
Note
scicat-ingestor
does not come with daemon installation helper.
You will have to set it up by your self.
Warning
It is only tested on Ubuntu (>=22.04) as we do not have any plans to support other types of OS.
Why is it not done by file-writer instead?
File-writer should be a very robust program as it is very critical for collecting data so it is better to have a single concern. As we keep file writing and file ingestion decoupled, even if either file-writer or scicat-ingestor fails, the other program can continue doing their job. Also they are maintained by different teams so for maintenance it is easier to keep the interface asynchronous rather than building a monolithic program.Whenever it can pull a notification about a new dataset, it spawns a background process where the offline-ingestor
ingests the file.
online-ingestor
spawns only a limited number of offline-ingestor
processes.
Whenever the number of offline-ingestor
processes reaches the limitation,
it stops and wait until certain background processes are done.
The number of processes limitation is configurable by max_offline_ingestors
and offline_ingestor_wait_time
.
Note
The scicat-ingestor
is developed for ESS specifically therefore it only has support to kafka
broker and expects specific flatbuffer schema type(wfdn) used by our filewriter.
Generalization and adoption of different delivery and messaging system will be considered on a per-request base.
If you are interested in using ingestor with other frameworks, please contact us on our issue board or directly to the maintainers.
How to Run
As online-ingestor
is the main purpose of this project, it has an entry-point of script as scicat_ingestor
.
Or you can also run it as a module or as a script itself.
For example:
<path_to_the_selected_python_executable> \
<full_path_to_the_ingestor_executable_folder>/scicat_online_ingestor.py \
-c <full_path_to_the_configuration_file>
In the case of the ESS test environment, the command looks like this:
/root/micromamba/envs/scicat-ingestor/bin/python \
/ess/services/scicat-ingestor/software/src/scicat_online_ingestor.py \
-c /ess/services/scicat-ingestor/config/scicat_ingestor_config.json
Configuration
See configuration page for more details.
Online ingestor uses only the following sections of the configurations: - ingestion - kafka - logging
The rest is simply passed to the offline ingestor from the file.
See ADR-000#configuration for why
online-ingesetor
andoffline-ingestor
share the same configuration file.