Offline Ingestor
offline-ingestor is a simple command line interface.
When it is called, offline-ingestor ingests one data file according to a specific schema file (.imsc.yml).
Tip
If you are looking for a python interface that you can use to download/upload dataset occasionally, you should use scitacean.
But... why?
offline-ingestor is for raw datasets that are continuously produced by daq programs. It has very limited/low-level interface to communicate with scicat because it is expecting only certain types of files with known structure. (i.e. nexus file written by ess-file-writer.) Therefore it can't handle arbitrary type of files. scitacean, however, provides versatile high-level interfaces to process files and communicate with scicat with various authentication methods.
scitacean can also validate a dataset before it is uploaded or after it is downloaded and scitacean will provide informative error messages.
Note
Currently the offline ingestor can only ingest nexus(hdf5) format.
Other formats can be supported only if necessary. Please contact us on our github page or via your ESS, DMSC contact person.
For more information about the schema files, the variables and schema sections of such files, please consult the schemas documentation
How to Run
In the production environment, the offline ingestor is executed by the online ingestor as a background process.
The online ingestor executes following command:
<path to the scicat offline ingestor executable> \
-c <full_path_to_the_configuration_file> \
--nexus-file <full_path_to_the_nexus_data_file>
Such command can also be run manually in a terminal in case of need (aka automatic ingestion failed) or troubleshooting.
For example, if you want to ingest a file at /ess/data/coda/2025/123321/raw/123321_000123456.hdf,
the command could be:
conda activate scicat-ingestor
scicat_background_ingestor \
-c /ess/services/scicat-ingestor/config/scicat_ingestor_config.yml \
--nexus-file /ess/data/coda/2025/123321/raw/123321_000123456.hdf
or if you want to run the ingestor module directly,
/root/micromamba/envs/scicat-ingestor/bin/python -m /ess/services/scicat-ingestor/software/src/scicat_offline_ingestor.py \
-c /ess/services/scicat-ingestor/config/scicat_ingestor_config.yml \
--nexus-file /ess/data/coda/2025/123321/raw/123321_000123456.hdf
Flow
Ingestor Flow Diagram
flowchart TB
conf@{ shape: doc, label: "Configuration File" } --> readconfig
nexus@{ shape: doc, label: "Data File (Nexus)" } --> selectschema
schemas@{ shape: docs, "Schema Definition Files" } --> loadschema
readconfig[Read the Configuration] --> loadschema[Load Schema Files] --> selectschema[Select Schema that matches the data file]
selectschema --> schema@{ shape: doc, "Selected Schema Definition" }
Ingestor Flow Description
- read the configuration
- load the schema files
- select the schema file that matches the data file
- retrieve all the required values and assign them to internally defined
variablesaccording to the_variables_section of the schema file - prepare
local representation of the dataset, assigning field values according to the_schema_section of the schema file - prepare
local representation of the file listaccording to the provided configuration - send a
POSTrequest to the SciCat instance of reference to create thedataset - send a
POSTrequest to the SciCat instance of reference to create theorigdatablockcontaining the list of files