Welcome to Scicat Ingestor
SciCat Ingestor is a versatile application with the primary focus to automate the ingestion of new dataset in to SciCat.
Scicat Ingestor aims to accomplish FAIR
data
by making files visible via scicat
, associated with their metadata.
The project is composed of two main components:
-
online ingestor
is responsible to connect to a kafka cluster and listen to selected topics for a specific message and trigger the data ingestion by running the offline ingestor as a background process. At the moment, this is specific to ESS IT infrastructure, but it is already planned to generalize it as soon as other facilities express interest in adopting it.
For details, see online ingestor page.
-
offline ingestor
can be run from the online ingestor or by an operator. It is responsible to collect all the necessary metadata and create a dataset entry in SciCat.
For details, see offline ingestor page.
Key Features
- Continuously and asynchronously retrieving information of
files
from kafka. - Retrieve metadata from
files
. - Ingest
files
along with retrieved metadata toscicat
.
Infrastructure around Scicat Ingestor
scicat-ingestor
is written for specific infrastructure setup like below:
---
title: Infrastructure around Scicat Ingestor
---
graph LR
filewriter@{ shape: processes, label: "File Writers" } -.write file.-> storage[(Storage)]
filewriter --report (wrdn)--> kafkabroker[Kafka Broker]
ingestor[Scicat Ingestor] -.subscribe (wrdn).-> kafkabroker
storage -.read file.-> ingestor
ingestor --report--> log[Gray Log]
Framework | Required | Description |
---|---|---|
Scicat | O | Scicat service that scicat ingestor can ingest files to. |
Kafka | O | Kafka broker that scicat ingestor can receive write done messages from.All messages are assumed to be serialized as flatbuffer using these schema: flatbuffer schemas for filewriter scicat-ingestor uses python wrapper of those schemas to deserialize messages.Currently only wrdn schema is used. |
File Writer | O and X | Any process that can write files and produce write done messages can be used. |
GrayLog | X - optional | scicat ingestor has built in stdout logging option. |
File Ingesting Sequence
Here is a simple overview of how the ingestion is done.
---
title: File Ingesting Sequence
---
sequenceDiagram
create participant File Writer
create actor File
File Writer --> File: File Written
loop Ingest Files
Ingestor -->> Kafka Broker: Subscribe
(listening to writing done - wrdn)
Kafka Broker ->> Ingestor: Writing Done Message (wrdn)
Note over Ingestor: Parse writing done message
Ingestor ->> File: Check file
opt
Ingestor ->> File: Parse Metadata
end
Note over Ingestor: Wrap files and metadata as
Scicat Dataset
critical
Ingestor ->> Scicat: Ingest File
end
end
Here is the typical file writing sequence including when the files are created/open.
Click to see the File Writing Sequence
sequenceDiagram
loop File Writing
File Writer -->> Kafka Broker: Subscribe (run start)
Kafka Broker ->> File Writer: Run Start
create actor File
File Writer ->> File: Create File and Close
File Writer --> File: Open File as Append Mode
loop File Writing
File Writer -->> Kafka Broker: Subscribe Relevant Topics for the run
Kafka Broker ->> File Writer: Detector Data/Log/etc ...
File Writer ->> File: Write Data in the File.
end
Kafka Broker ->> File Writer: Run Stop
File Writer --> File: Close File.
Note over File Writer: Compose wrdn message including id and file path
File Writer ->> Kafka Broker: Report (writing done - wrdn)
end
Used At
scicat ingestor
is mainly maintained by ESS DMSC
but it can be used in any systems that have same infrastructure set up.
European Spallation Source
Quick Start
We do not release scicat-ingestor
into any package index services.
You can directly download it from our github page.
git clone https://github.com/SciCatProject/scicat-ingestor.git
cd scicat-ingestor
git fetch origin
git checkout v25.01.0 # Latest Version
pip install -e . # It will allow you to use entry-points of the scripts,
# defined in ``pyproject.toml``, under ``[project.scripts]`` section.
Contribution
Anyone is welcome to contribute to our project.
Please check our developer guide
.