Skip to content

Overview

As mentioned in the ADR-001, scicat-ingestor has two main components, online-ingestor and offline-ingestor.

online-ingestor is a daemonic program that runs offline-ingestor in real-time. In other words, online-ingestor is higher level program than offline-ingestor.

This page has various diagrams that shows how the ingestion flow or sequence is done. You can find the relationship between online and offline ingestor from the diagram.

See online-ingestor page and offline-ingestor page for more details.

Infrastructure around Scicat Ingestor

scicat-ingestor is written for specific infrastructure setup like below:

---
title: Infrastructure around Scicat Ingestor
---
graph LR
    filewriter@{ shape: processes, label: "File Writers" } -.write file.-> storage[(Storage)]
    filewriter --report (wrdn)--> kafkabroker[Kafka Broker]
    ingestor[Scicat Ingestor] -.subscribe (wrdn).-> kafkabroker
    storage -.read file.-> ingestor
    ingestor --report--> log[Gray Log]

Ingestor Flow Chart - Bird Eye View

image

Ingestor Flow Chart - Detail

---
title: Ingestor Flow Chart - Detail
---
flowchart LR

    subgraph online [Online Ingestor]
        direction TB
        connect-to-kafka[Connect to Kafka Cluster] --> subscription[Subscribe to Instrument Topics]
        subscription --> wait[Wait for Next Messages]
        wait --> done{{Done Writing Message?}}
        done --> |No| wait
        done --> |Yes| max-process{{Maximum Offline Ingestor Running?}}
        max-process --> |Yes| wait-running@{ shape: delay , label: "Wait for previous ingestors"}
        wait-running --> max-process
        max-process --> |No| start@{shape: circle, label: "Start Offline Ingestor"}
        start --> wait
    end

    subgraph offline [Offline Ingestor]
        direction TB
        start-offline@{shape: circle, label: "Start Offline Ingestor"}
        start-offline --> load-schema[Load Schema]
        load-schema --> select[Select Schema]
        select --> open[Open Nexus File, Event Data]
        open --> variable[Define Variables]
        variable --> populate[Populate Local Dataset]
        populate --> create[Create Dataset on Scicat]
        create --> create-origdataset[Create OrigDataset on Scicat]
        create-origdataset --> stop@{shape: dbl-circ, label: "Finish Offline Ingestor"}

    end

    online --> offline

    style start fill:green,stroke-width:4px,opacity:0.5;
    style start-offline fill:green,stroke-width:4px,opacity:0.5;

Ingestor Sequence Chart

---
title: File Ingesting Sequence
---

sequenceDiagram
  create participant File Writer
  create actor File
  File Writer --> File: File Written
  loop Ingest Files
    Ingestor -->> Kafka Broker: Subscribe
(listening to writing done - wrdn) Kafka Broker ->> Ingestor: Writing Done Message (wrdn) Note over Ingestor: Parse writing done message Ingestor ->> File: Check file opt Ingestor ->> File: Parse Metadata end Note over Ingestor: Wrap files and metadata as
Scicat Dataset critical Ingestor ->> Scicat: Ingest File end end