scitacean.Dataset#
- class scitacean.Dataset(type, access_groups=None, classification=None, comment=None, contact_email=None, creation_location=None, creation_time='now', data_format=None, data_quality_metrics=None, description=None, end_time=None, input_datasets=None, instrument_group=None, instrument_id=None, investigator=None, is_published=None, job_log_data=None, job_parameters=None, keywords=None, license=None, lifecycle=None, name=None, orcid_of_owner=None, owner=None, owner_email=None, owner_group=None, principal_investigator=None, proposal_id=None, relationships=None, run_number=None, sample_id=None, shared_with=None, source_folder=None, source_folder_host=None, start_time=None, techniques=None, used_software=None, validation_status=None, meta=None, checksum_algorithm='blake2b')[source]#
Metadata and linked data files for a measurement, simulation, or analysis.
Constructors
__init__(type[, access_groups, ...])from_download_models(dataset_model, ...[, ...])Construct a new dataset from SciCat download models.
Methods
add_attachment([thumbnail, owner_group, ...])Create a new attachment and add it to the dataset.
add_files(*files[, datablock])Add files to the dataset.
add_local_files(*paths[, datablock])Add files on the local file system to the dataset.
add_orig_datablock(*, checksum_algorithm)Append a new orig datablock to the list of orig datablocks.
as_new()Return a new dataset with lifecycle-related fields erased.
derive(*[, keep])Return a new dataset that is derived from self.
fields([dataset_type, read_only])Iterate over dataset fields.
items()Dict-like items(name and value pairs of fields) method.
keys()Dict-like keys(names of fields) method.
Build models for all registered attachments.
Build models for all contained (orig) datablocks.
Construct a SciCat upload model from self.
replace(*[, _read_only, _orig_datablocks])Return a new dataset with replaced fields.
replace_files(*files)Return a new dataset with replaced files.
validate()Validate the fields of the dataset.
values()Dict-like values(values of fields) method.
Attributes
access_groupsList of groups which have access to this item.
api_versionVersion of the API used in creation of the dataset.
List of attachments for this dataset.
classificationACIA information about AUthenticity,COnfidentiality,INtegrity and AVailability requirements of dataset.
commentComment the user has about a given dataset.
contact_emailEmail of the contact person for this dataset.
created_at26:57.313Z)
created_byIndicate the user who created this record.
creation_locationUnique location identifier where data was taken, usually in the form /Site-name/facility-name/instrumentOrBeamline-name.
creation_time//www.rfc-editor.org/rfc/rfc3339#section-5).<br>Local times without timezone/offset info are automatically transformed to UTC using the timezone of the API server.
data_formatDefines the format of the data files in this dataset, e.g Nexus Version x.y.
data_quality_metricsData Quality Metrics is a number given by the user to rate the dataset.
descriptionFree text explanation of contents of dataset.
end_time//www.rfc-editor.org/rfc/rfc3339#section-5).<br>Local times without timezone/offset info are automatically transformed to UTC using the timezone of the API server.
Files linked with the dataset.
input_datasetsArray of input dataset identifiers used in producing the derived dataset.
instrument_groupGroup of the instrument which this item was acquired on.
instrument_idID of the instrument where the data was created.
investigatorFirst name and last name of the person or people pursuing the data analysis.
is_publishedFlag is true when data are made publicly available.
job_log_dataThe output job logfile.
job_parametersThe creation process of the derived data will usually depend on input job parameters.
keywordsArray of tags associated with the meaning or contents of this dataset.
licenseName of the license under which the data can be used.
lifecycleDescribes the current status of the dataset during its lifetime with respect to the storage handling systems.
metaDict of scientific metadata.
nameA name for the dataset, given by the creator to carry some semantic meaning.
Number of files in directly accessible storage in the dataset.
Total number of archived files in the dataset.
orcid_of_ownerORCID of the owner or custodian.
ownerOwner or custodian of the dataset, usually first name + last name.
owner_emailEmail of the owner or custodian of the dataset.
owner_groupName of the group owning this item.
Total size of all datablock package files created for this dataset.
pidPersistent identifier of the dataset.
principal_investigatorFirst name and last name of principal investigator(s).
proposal_idThe ID of the proposal to which the dataset belongs.
relationshipsStores the relationships with other datasets.
run_numberRun number assigned by the system to the data acquisition for the current dataset.
sample_idID of the sample used when collecting the data.
shared_withList of users that the dataset has been shared with.
Total size of files in directly accessible storage in the dataset.
source_folderAbsolute file path on file server containing the files of this dataset, e.g. /some/path/to/sourcefolder.
source_folder_host//]fileserver1.example.com
start_time//www.rfc-editor.org/rfc/rfc3339#section-5).<br>Local times without timezone/offset info are automatically transformed to UTC using the timezone of the API server.
techniquesStores the metadata information for techniques.
typeCharacterize type of dataset, either 'raw' or 'derived'.
updated_at26:57.313Z)
updated_byIndicate the user who updated this record last.
used_softwareA list of links to software repositories which uniquely identifies the pieces of software, including versions, used for yielding the derived data.
validation_statusDefines a level of trust, e.g. a measure of how much data was verified or used by other persons.
- __init__(type, access_groups=None, classification=None, comment=None, contact_email=None, creation_location=None, creation_time='now', data_format=None, data_quality_metrics=None, description=None, end_time=None, input_datasets=None, instrument_group=None, instrument_id=None, investigator=None, is_published=None, job_log_data=None, job_parameters=None, keywords=None, license=None, lifecycle=None, name=None, orcid_of_owner=None, owner=None, owner_email=None, owner_group=None, principal_investigator=None, proposal_id=None, relationships=None, run_number=None, sample_id=None, shared_with=None, source_folder=None, source_folder_host=None, start_time=None, techniques=None, used_software=None, validation_status=None, meta=None, checksum_algorithm='blake2b')#
- add_attachment(thumbnail=None, *, caption, owner_group=None, access_groups=None, instrument_group=None, proposal_id=None, sample_id=None)[source]#
Create a new attachment and add it to the dataset.
- Parameters:
thumbnail (
Union[str,PathLike[str],Thumbnail,None], default:None) – If ascitacean.thumbnail.Thumbnailobject, it is added to the attachment. If a string or path, a thumbnail is loaded from that path.caption (
str) – Caption of the attachment.owner_group (
Optional[str], default:None) – Owner group of the attachment. Defaults toself.owner_group.access_groups (
Optional[list[str]], default:None) – Access groups of the attachment. Defaults toself.access_groups.instrument_group (
Optional[str], default:None) – Instrument group of the attachment. Defaults toself.instrument_group.proposal_id (
Optional[str], default:None) – Proposal ID of the attachment. Defaults toself.proposal_id.sample_id (
Optional[str], default:None) – Sample ID of the attachment. Defaults toself.sample_id.
- Return type:
- add_files(*files, datablock=None)[source]#
Add files to the dataset.
- Parameters:
files (
File) – File object to add.datablock (
Union[int,str,PID,None], default:None) –Advanced feature, do not set unless you know what this is!
Select the orig datablock to store the file in.
None: Use the last datablock in the list if possible or add a new one if needed.If an
int, use the datablock with that index.If a
strorPID, use the datablock with that id; if there is none with matching id, raiseKeyError.
- Return type:
- add_local_files(*paths, datablock=None)[source]#
Add files on the local file system to the dataset.
The files are set up to be uploaded to the dataset’s source folder without preserving the local directory structure. That is, given
dataset.source_folder = "remote/source" dataset.add_local_files("/path/to/file1", "other_path/file2")
and uploading this dataset to SciCat, the files will be uploaded to:
remote/source/file1 remote/source/file2
- Parameters:
datablock (
Union[int,str,PID,None], default:None) –Advanced feature, do not set unless you know what this is!
Select the orig datablock to store the file in.
None: Use the last datablock in the list if possible or add a new one if needed.If an
int, use the datablock with that index.If a
strorPID, use the datablock with that id; if there is none with matching id, raiseKeyError.
- Return type:
- add_orig_datablock(*, checksum_algorithm)[source]#
Append a new orig datablock to the list of orig datablocks.
- Parameters:
checksum_algorithm (
str|None) – Use this algorithm to compute checksums of files associated with this datablock.- Returns:
OrigDatablock– The newly added datablock.
- as_new()[source]#
Return a new dataset with lifecycle-related fields erased.
The returned dataset has the same fields as
self. But fields that indicate when the dataset was created or by who are set toNone. This if, for example,created_at,history, andlifecycle.- Returns:
Dataset– A new dataset without lifecycle-related fields.
- property attachments: list[Attachment] | None#
List of attachments for this dataset.
This property can be in two distinct ‘falsy’ states:
dset.attachments is None: It is unknown whether there are attachments. This happens when datasets are downloaded without downloading the attachments.dset.attachments == []: It is known that there are no attachments. This happens either when downloading datasets or when initializing datasets locally without assigning attachments.
- derive(*, keep=('contact_email', 'investigator', 'orcid_of_owner', 'owner', 'owner_email', 'techniques'))[source]#
Return a new dataset that is derived from self.
The returned dataset has most fields set to
None. But a number of fields can be carried over fromself. By default, this assumes that the owner of the derived dataset is the same as the owner of the original. This can be customized with thekeepargument.- Parameters:
keep (
Iterable[str], default:('contact_email', 'investigator', 'orcid_of_owner', 'owner', 'owner_email', 'techniques')) – Fields to copy over to the derived dataset.- Returns:
Dataset– A new derived dataset.- Raises:
ValueError – If
selfhas no PID. The derived dataset requires a PID in order to link back toself.
- classmethod fields(dataset_type=None, read_only=None)[source]#
Iterate over dataset fields.
This is similar to
dataclasses.fields().- Parameters:
dataset_type (
Union[DatasetType,Literal['raw','derived'],None], default:None) – If set, return only the fields for this dataset type. If unset, do not filter fields.read_only (
Optional[bool], default:None) – If true or false, return only fields which are read-only or allow write-access, respectively. If unset, do not filter fields.
- Returns:
Generator[Field,None,None] – Iterable over the fields of datasets.
- classmethod from_download_models(dataset_model, orig_datablock_models, attachment_models=None)[source]#
Construct a new dataset from SciCat download models.
- Parameters:
dataset_model (
DownloadDataset) – Model of the dataset.orig_datablock_models (
list[DownloadOrigDatablock]) – List of all associated original datablock models for the dataset.attachment_models (
Optional[Iterable[DownloadAttachment]], default:None) – List of all associated attachment models for the dataset. UseNoneif the attachments were not downloaded. Use an empty list if the attachments were downloaded, but there aren’t any.
- Returns:
Dataset– A new Dataset instance.
- items()[source]#
Dict-like items(name and value pairs of fields) method.
- Returns:
Iterable[tuple[str,Any]] – Generator of (Name, Value) pairs of all fields corresponding toself.typeand other fields that are notNone.
Added in version 23.10.0.
- keys()[source]#
Dict-like keys(names of fields) method.
- Returns:
Iterable[str] – Generator of names of all fields corresponding toself.typeand other fields that are notNone.
Added in version 23.10.0.
- make_attachment_upload_models()[source]#
Build models for all registered attachments.
- Raises:
ValueError – If
self.attachmentsisNone, i.e., the attachments are uninitialized.- Returns:
list[UploadAttachment] – List of attachment models.
- make_datablock_upload_models()[source]#
Build models for all contained (orig) datablocks.
- Returns:
DatablockUploadModels– Structure with datablock and orig datablock models.
- property number_of_files: int#
Number of files in directly accessible storage in the dataset.
This includes files on both the local and remote filesystems.
Corresponds to OrigDatablocks.
- property number_of_files_archived: int#
Total number of archived files in the dataset.
Corresponds to Datablocks.
- replace(*, _read_only=None, _orig_datablocks=None, **replacements)[source]#
Return a new dataset with replaced fields.
Parameters starting with an underscore are for internal use. Using them may result in a broken dataset.
- replace_files(*files)[source]#
Return a new dataset with replaced files.
For each argument, if the input dataset has a file with the same remote path, that file is replaced. Otherwise, a new file is added. Other existing files are kept in the returned dataset.
- property size: int#
Total size of files in directly accessible storage in the dataset.
This includes files on both the local and remote filesystems.
Corresponds to OrigDatablocks.