scitacean.transfer.sftp.SFTPFileTransfer#
- class scitacean.transfer.sftp.SFTPFileTransfer(*, host, port=22, username=None, password=None, key_filename=None, source_folder=None, connect=None)[source]#
Upload / download files using SFTP.
Configuration & Authentication#
The file transfer connects to the server at the address given as the
host
constructor argument. This may bea full url such as
"some.fileserver.edu"
,or an IP address like
"127.0.0.1"
.
The file transfer relies on
paramiko.client.SSHClient
for authentication and arguments are passed along to the constructor ofSSHClient
. See its documentation for details.SFTPFileTransfer
can use an SSH agent if one is configured or use explicitly provided username and password or a key file. If none of these options work, you can define a customconnect
function which creates aparamiko.sftp_client.SFTPClient
. See the examples below.Upload folder#
The file transfer can take an optional
source_folder
as a constructor argument. If it is given,SFTPFileTransfer
uploads all files to it and ignores the source folder set in the dataset. If it is not given,SFTPFileTransfer
uses the dataset’s source folder.The source folder argument to
SFTPFileTransfer
may be a Python format string. In that case, all format fields are replaced by the corresponding fields of the dataset. All non-ASCII characters and most special ASCII characters are replaced. This should avoid broken paths from essentially random contents in datasets.Examples
Given
dset = Dataset( type="raw", name="my-dataset", source_folder="/dataset/source", )
This uploads to
/dataset/source
:file_transfer = SFTPFileTransfer(host="fileserver")
This uploads to
/transfer/folder
:file_transfer = SFTPFileTransfer(host="fileserver", source_folder="transfer/folder")
This uploads to
/transfer/my-dataset
: (Note that{name}
is replaced bydset.name
.)file_transfer = SFTPFileTransfer(host="fileserver", source_folder="transfer/{name}")
A useful approach is to include a unique ID in the source folder, for example,
"/some/base/folder/{uid}"
, to avoid clashes between different datasets. Scitacean will fill in the"{uid}"
placeholder with a new UUID4.The connection and authentication method can be customized using the
connect
argument. For example, to use a specific username + SSH key file, use the following:def connect(host, port): from paramiko import SSHClient client = SSHClient() client.load_system_host_keys() client.connect( hostname=host, port=port, username="<username>", key_filename="<key-file-name>", ) return client.open_sftp() file_transfer = SFTPFileTransfer(host="fileserver", connect=connect)
The
paramiko.client.SSHClient
can be configured as needed in this function.Constructors
__init__
(*, host[, port, username, ...])Construct a new SFTP file transfer.
Methods
connect_for_download
(dataset, ...)Create a connection for downloads, use as a context manager.
connect_for_upload
(dataset, ...)Create a connection for uploads, use as a context manager.
source_folder_for
(dataset)Return the source folder used for the given dataset.
- __init__(*, host, port=22, username=None, password=None, key_filename=None, source_folder=None, connect=None)[source]#
Construct a new SFTP file transfer.
- Parameters:
host (
str
) – URL or name of the server to connect to.port (
int
, default:22
) – Port of the server.username (
str
|None
, default:None
) – Username for the server.password (
str
|StrStorage
|None
, default:None
) – Password for the user. Or passphrase for the private key, ifkey_filename
is provided.key_filename (
str
|None
, default:None
) – Path to a private key file for authentication.source_folder (
str
|RemotePath
|None
, default:None
) – Upload files to this folder if set. Otherwise, upload to the dataset’s source_folder. Ignored when downloading files.connect (
Callable
[[str
,int
|None
],SFTPClient
] |None
, default:None
) – If this argument is set, it will be called to create a client for the server instead of the builtin method. The function arguments arehost
andport
as determined by the arguments to__init__
shown above.
- connect_for_download(dataset, representative_file_path)[source]#
Create a connection for downloads, use as a context manager.
- Parameters:
dataset (
Dataset
) – The connection will be used to download files of this dataset.representative_file_path (
RemotePath
) – A path on the SFTP host to check whether files for this dataset can be read. The transfer assumes that, if it is possible to read from this path, it is possible to read from the paths of all files to be downloaded.
- Returns:
Iterator
[SFTPDownloadConnection
] – An openSFTPDownloadConnection
object.
- connect_for_upload(dataset, representative_file_path)[source]#
Create a connection for uploads, use as a context manager.
- Parameters:
dataset (
Dataset
) – The connection will be used to upload files of this dataset. Used to determine the target folder.representative_file_path (
RemotePath
) – This is not used bySFTPFileTransfer
. The transfer assumes that all paths are writable when connecting. The actual upload fails if the user lacks sufficient permissions.
- Returns:
Iterator
[SFTPUploadConnection
] – An openSFTPUploadConnection
object.