# Downloading datasets

All communication with SciCat is handled by a client object.
Normally, one would construct one using something like

```python
from scitacean import Client
from scitacean.transfer.sftp import SFTPFileTransfer
client = Client.from_token(url="https://scicat.ess.eu/api/v3",
                           token=...,
                           file_transfer=SFTPFileTransfer(
                               host="login.esss.dk"
                           ))
```

In this example, we use ESS's SciCat.
If you want to use a different one, you need to figure out its URL.
Note that this is *not* the same URL that you open in a browser but typically ends in a suffix like `/api/v3`.

Here, we authenticate using a token.
You can find your token in the web interface by logging in and opening the settings.
Alternatively, we could use username and password via [Client.from_credentials](../generated/classes/scitacean.Client.rst#scitacean.Client.from_credentials).

<div class="alert alert-warning">
    <b>WARNING:</b>

Do **not** hard code secrets like tokens or passwords in notebooks or scripts!
There is a high risk of exposing them when code is under version control or uploaded to SciCat.

Scitacean currently requires secrets to be passed as function arguments.
So you will have to find your own solution for now.

</div>

While the client itself is responsible for talking to SciCat, a `file_transfer` object is required to download data files.
Here, we use `SFTPFileTransfer` which downloads / uploads files via SFTP.

The file transfer needs to authenticate separately from the SciCat connection.
By default, it requires an SSH agent to be running an set up for the selected `host`.

For the purposes of this guide, we don't want to connect to a real SciCat server in order to avoid the complications associated with that.
So we set up a fake client that only pretends to connect to SciCat and file servers.
Everything else in this guide works in the same way with a real client.
See [Developer Documentation/Testing](../developer/testing.rst) if you are interested in the details.

In [None]:
from scitacean.testing.docs import setup_fake_client
client = setup_fake_client()

## Metadata

We need the ID (`pid`) of a dataset in order to download it.
The fake client provides a dataset with id `20.500.12269/72fe3ff6-105b-4c7f-b9d0-073b67c90ec3`.
We can download it using

In [None]:
dset = client.get_dataset("20.500.12269/72fe3ff6-105b-4c7f-b9d0-073b67c90ec3")

Datasets can easily be inspected in Jupyter notebooks:

In [None]:
dset

All attributes listed above can be accessed directly:

In [None]:
dset.type

In [None]:
dset.name

In [None]:
dset.owner

See [Dataset](../generated/classes/scitacean.Dataset.rst) for a list of available fields.

In addition, datasets can have free form scientific metadata which we can be accessed using

In [None]:
dset.meta

## Files

The data files associated with this dataset can be accessed using

In [None]:
for f in dset.files:
    print(f"{f.remote_access_path(dset.source_folder) = }")
    print(f"{f.local_path = }")
    print(f"{f.size = } bytes")
    print("----")

Note that the `local_path` for both files is `None`.
This indicates that the files have not been downloaded.
Indeed, `client.get_dataset` downloads only the metadata from SciCat, not the files.

We can download the first file using

In [None]:
dset_with_local_file = client.download_files(dset, target="download", select="flux.dat")

In [None]:
for f in dset_with_local_file.files:
    print(f"{f.remote_access_path(dset.source_folder) = }")
    print(f"{f.local_path = }")
    print(f"{f.size = } bytes")
    print("----")

Which populates the `local_path`:

In [None]:
file = list(dset_with_local_file.files)[0]

In [None]:
file.local_path

We can use it to read the file:

In [None]:
with file.local_path.open("r") as f:
    print(f.read())

If we wanted to download all files, we could pass `select=True` (or nothing, `True` is the default) to `client.download_files`.
See [Client.download_files](../generated/classes/scitacean.Client.rst#scitacean.Client.download_files) for more options to select files.

In [None]:
# This cell is hidden.
# It should remove *only* files and directories created by this notebook.
import shutil
shutil.rmtree("download", ignore_errors=True)