Overview
As mentioned in the ADR-001, scicat-ingestor
has two main components, online-ingestor
and offline-ingestor
.
online-ingestor
is a daemonic program that runs offline-ingestor
in real-time.
In other words, online-ingestor
is higher level program than offline-ingestor
.
This page has various diagrams that shows how the ingestion flow or sequence is done.
You can find the relationship between online
and offline
ingestor from the diagram.
See online-ingestor page and offline-ingestor page for more details.
Infrastructure around Scicat Ingestor
scicat-ingestor
is written for specific infrastructure setup like below:
---
title: Infrastructure around Scicat Ingestor
---
graph LR
filewriter@{ shape: processes, label: "File Writers" } -.write file.-> storage[(Storage)]
filewriter --report (wrdn)--> kafkabroker[Kafka Broker]
ingestor[Scicat Ingestor] -.subscribe (wrdn).-> kafkabroker
storage -.read file.-> ingestor
ingestor --report--> log[Gray Log]
Ingestor Flow Chart - Bird Eye View
Ingestor Flow Chart - Detail
---
title: Ingestor Flow Chart - Detail
---
flowchart LR
subgraph online [Online Ingestor]
direction TB
connect-to-kafka[Connect to Kafka Cluster] --> subscription[Subscribe to Instrument Topics]
subscription --> wait[Wait for Next Messages]
wait --> done{{Done Writing Message?}}
done --> |No| wait
done --> |Yes| max-process{{Maximum Offline Ingestor Running?}}
max-process --> |Yes| wait-running@{ shape: delay , label: "Wait for previous ingestors"}
wait-running --> max-process
max-process --> |No| start@{shape: circle, label: "Start Offline Ingestor"}
start --> wait
end
subgraph offline [Offline Ingestor]
direction TB
start-offline@{shape: circle, label: "Start Offline Ingestor"}
start-offline --> load-schema[Load Schema]
load-schema --> select[Select Schema]
select --> open[Open Nexus File, Event Data]
open --> variable[Define Variables]
variable --> populate[Populate Local Dataset]
populate --> create[Create Dataset on Scicat]
create --> create-origdataset[Create OrigDataset on Scicat]
create-origdataset --> stop@{shape: dbl-circ, label: "Finish Offline Ingestor"}
end
online --> offline
style start fill:green,stroke-width:4px,opacity:0.5;
style start-offline fill:green,stroke-width:4px,opacity:0.5;
Ingestor Sequence Chart
---
title: File Ingesting Sequence
---
sequenceDiagram
create participant File Writer
create actor File
File Writer --> File: File Written
loop Ingest Files
Ingestor -->> Kafka Broker: Subscribe
(listening to writing done - wrdn)
Kafka Broker ->> Ingestor: Writing Done Message (wrdn)
Note over Ingestor: Parse writing done message
Ingestor ->> File: Check file
opt
Ingestor ->> File: Parse Metadata
end
Note over Ingestor: Wrap files and metadata as
Scicat Dataset
critical
Ingestor ->> Scicat: Ingest File
end
end