Complete API reference¶
dryad2dataverse¶
Dryad to Dataverse utilities. No modules are loaded by default, so
import dryad2dataverse
will work, but will have no effect.
Modules included:
dryad2dataverse.constants : "Constants" for all modules. URLs, API keys,
etc are all here.
dryad2dataverse.serializer : Download and serialize Dryad
JSON to Dataverse JSON.
dryad2dataverse.transfer : metadata and file transfer
utilities.
dryad2dataverse.monitor : Monitoring and database tools
for maintaining a pipeline to Dataverse without unnecessary
downloading and file duplication.
dryad2dataverse.exceptions : Custom exceptions.
dryad2dataverse.monitor¶
Dryad/Dataverse status tracker. Monitor creates a singleton object which writes to a SQLite database. Methods will (generally) take either a dryad2dataverse.serializer.Serializer instance or dryad2dataverse.transfer.Transfer instance
The monitor’s primary function is to allow for state checking for Dryad studies so that files and studies aren’t downloaded unneccessarily.
Monitor Objects¶
class Monitor()
The Monitor object is a tracker and database updater, so that Dryad files can be monitored and updated over time. Monitor is a singleton, but is not thread-safe.
__new__¶
def __new__(cls, dbase=None, *args, **kwargs)
Creates a new singleton instance of Monitor.
Also creates a database if existing database is not present.
Arguments:
dbase : str — Path to sqlite3 database. That is: /path/to/file.sqlite3
__init__¶
def __init__(dbase=None, *args, **kwargs)
Initialize the Monitor instance if not instantiated already (ie, Monitor is a singleton).
Arguments:
dbase : str
— Complete path to desired location of tracking database
- (eg
- /tmp/test.db).
Defaults to dryad2dataverse.constants.DBASE.
__del__¶
def __del__()
Commits all database transactions on object deletion and closes database.
lastmod¶
@property
def lastmod()
Returns last modification date from monitor.dbase.
status¶
def status(serial)
Returns a dictionary with keys ‘status’ and ‘dvpid’ and ‘notes’.
{status :'updated', 'dvpid':'doi://some/ident'}
.
status
is one of ‘new’, ‘identical’, ‘lastmodsame’,
‘updated’
‘new’ is a completely new file.
‘identical’ The metadata from Dryad is identical to the last time the check was run.
‘lastmodsame’ Dryad lastModificationDate == last modification date in database AND output JSON is different. This can indicate a Dryad API output change, reindexing or something else. But the lastModificationDate is supposed to be an indicator of meaningful change, so this option exists so you can decide what to do given this option
‘updated’ Indicates changes to lastModificationDate
Note that Dryad constantly changes their API output, so the changes may not actually be meaningful.
dvpid
is a Dataverse persistent identifier.
None
in the case of status=’new’
notes
: value of Dryad versionChanges field. One of files_changed
or
metatdata_changed
. Non-null value present only when status is
not new
or identical
. Note that Dryad has no way to indicate both
a file and metadata change, so this value reflects only the last change
in the Dryad state.
Arguments:
serial : dryad2dataverse.serializer instance
diff_metadata¶
def diff_metadata(serial)
Analyzes differences in metadata between current serializer instance and last updated serializer instance. Returns a list of field changes consisting of:
[{key: (old_value, new_value}] or None if no changes.
For example:
[{'title':
('Cascading effects of algal warming in a freshwater community',
'Cascading effects of algal warming in a freshwater community theatre')}
]
Arguments:
serial : dryad2dataverse.serializer.Serializer instance
diff_files¶
def diff_files(serial)
Returns a dict with additions and deletions from previous Dryad to dataverse upload.
Because checksums are not necessarily included in Dryad file metadata, this method uses dryad file IDs, size, or whatever is available.
If dryad2dataverse.monitor.Monitor.status() indicates a change it will produce dictionary output with a list of additions, deletions or hash changes (ie, identical except for hash changes), as below:
{'add':[dyadfiletuples], 'delete:[dryadfiletuples],
'hash_change': [dryadfiletuples]}
Arguments:
serial : dryad2dataverse.serializer.Serializer instance
get_dv_fid¶
def get_dv_fid(url)
Returns str — the Dataverse file ID from parsing a Dryad file download link. Normally used for determining dataverse file ids for deletion in case of dryad file changes.
Arguments:
url : str — Dryad file URL in form of ‘https://datadryad.org/api/v2/files/385819/download’.
get_dv_fids¶
def get_dv_fids(filelist)
Returns Dataverse file IDs from a list of Dryad file tuples. Generally, you would use the output from dryad2dataverse.monitor.Monitor.diff_files[‘delete’] to discover Dataverse file ids for deletion.
Arguments:
filelist : list — List of Dryad file tuples: eg:
[('https://datadryad.org/api/v2/files/385819/download',
'GCB_ACG_Mortality_2020.zip',
'application/x-zip-compressed', 23787587),
('https://datadryad.org/api/v2/files/385820/download',
'Readme_ACG_Mortality.txt',
'text/plain', 1350)]
get_json_dvfids¶
def get_json_dvfids(serial)
Return a list of Dataverse file ids for Dryad JSONs which were uploaded to Dataverse. Normally used to discover the file IDs to remove Dryad JSONs which have changed.
Arguments:
serial : dryad2dataverse.serializer.Serializer instance
update¶
def update(transfer)
Updates the Monitor database with information from a dryad2dataverse.transfer.Transfer instance.
If a Dryad primary metadata record has changes, it will be deleted from the database.
This method should be called after all transfers are completed, including Dryad JSON updates, as the last action for transfer.
Arguments:
transfer : dryad2dataverse.transfer.Transfer instance
set_timestamp¶
def set_timestamp(curdate=None)
Adds current time to the database table. Can be queried and be used for subsequent checking for updates. To query last modification time, use the dataverse2dryad.monitor.Monitor.lastmod attribute.
Arguments:
curdate : str — UTC datetime string in the format suitable for the Dryad API. eg. 2021-01-21T21:42:40Z or .strftime(‘%Y-%m-%dT%H:%M:%SZ’).
dryad2dataverse.constants¶
This module contains the information that configures all the parameters required to transfer data from Dryad to Dataverse.
“Constants” may be a bit strong, but the only constant is the presence of change.
dryad2dataverse.handlers¶
Custom log handlers for sending log information to recipients.
SSLSMTPHandler Objects¶
class SSLSMTPHandler(SMTPHandler)
An SSL handler for logging.handlers
emit¶
def emit(record: logging.LogRecord)
Emit a record while using an SSL mail server.
dryad2dataverse.transfer¶
This module handles data downloads and uploads from a Dryad instance to a Dataverse instance
Transfer Objects¶
class Transfer()
Transfers metadata and data files from a Dryad installation to Dataverse installation.
__init__¶
def __init__(dryad)
Creates a dryad2dataverse.transfer.Transfer instance.
Arguments:
dryad : dryad2dataverse.serializer.Serializer instance
_del__¶
def _del__()
Expunges files from constants.TMP on deletion
test_api_key¶
def test_api_key(url=None, apikey=None)
Tests for an expired API key and raises dryad2dataverse.exceptions.Dryad2dataverseBadApiKeyError the API key is bad. Ignores other HTTP errors.
Arguments:
url : str — Base URL to Dataverse installation. Defaults to dryad2dataverse.constants.DVURL
apikey : str — Default dryad2dataverse.constants.APIKEY.
dvpid¶
@property
def dvpid()
Returns Dataverse study persistent ID as str.
auth¶
@property
def auth()
Returns datavese authentication header dict.
ie: {X-Dataverse-key' : 'APIKEYSTRING'}
fileJson¶
@property
def fileJson()
Returns a list of file JSONs from call to Dryad API /files/{id}, where the ID is parsed from the Dryad JSON. Dryad file listings are paginated.
files¶
@property
def files()
Returns a list of lists with:
[Download_location, filename, mimetype, size, description, md5digest]
This is mutable; downloading a file will add md5 info if not available.
oversize¶
@property
def oversize()
Returns list of files exceeding Dataverse ingest limit dryad2dataverse.constants.MAX_UPLOAD.
doi¶
@property
def doi()
Returns Dryad DOI.
set_correct_date¶
def set_correct_date(url=None,
hdl=None,
d_type='distributionDate',
apikey=None)
Sets “correct” publication date for Dataverse.
Note: dryad2dataverse.serializer maps Dryad ‘publicationDate’ to Dataverse ‘distributionDate’ (see serializer.py ~line 675).
Dataverse citation date default is “:publicationDate”. See Dataverse API reference: https://guides.dataverse.org/en/4.20/api/native-api.html#id54.
Arguments:
url : str — Base URL to Dataverse installation. Defaults to dryad2dataverse.constants.DVURL
hdl : str — Persistent indentifier for Dataverse study. Defaults to Transfer.dvpid (which can be None if the study has not yet been uploaded).
d_type : str — Date type. One of ‘distributionDate’, ‘productionDate’, ‘dateOfDeposit’. Default ‘distributionDate’.
apikey : str — Default dryad2dataverse.constants.APIKEY.
upload_study¶
def upload_study(url=None, apikey=None, timeout=45, **kwargs)
Uploads Dryad study metadata to target Dataverse or updates existing.
Supplying a targetDv
kwarg creates a new study and supplying a
dvpid
kwarg updates a currently existing Dataverse study.
Arguments:
url : str — URL of Dataverse instance. Defaults to constants.DVURL.
apikey : str — API key of user. Defaults to contants.APIKEY.
timeout : int — timeout on POST request.
KEYWORD ARGUMENTS
One of these is required. Supplying both or neither raises a NoTargetError
targetDv : str — Short name of target dataverse. Required if new dataset. Specify as targetDV=value.
dvpid = str — Dataverse persistent ID (for updating metadata). This is not required for new uploads, specify as dvpid=value
download_file¶
def download_file(url, filename, tmp=None, size=None, chk=None, timeout=45)
Downloads a file via requests streaming and saves to constants.TMP. returns md5sum on success and an exception on failure.
Arguments:
url : str — URL of download.
filename : str — Output file name.
timeout : int — Requests timeout.
tmp : str — Temporary directory for downloads. Defaults to dryad2dataverse.constants.TMP.
size : int — Reported file size in bytes. Defaults to dryad2dataverse.constants.MAX_UPLOAD.
chk : str - md5 sum of file (if available and known).
download_files¶
def download_files(files=None)
Bulk downloader for files.
Arguments:
files : list — Items in list can be tuples or list with a minimum of:
(dryaddownloadurl, filenamewithoutpath, [md5sum])
The md5 sum should be the last member of the tuple.
Defaults to self.files.
Normally used without arguments to download all the associated files with a Dryad study.
file_lock_check¶
def file_lock_check(study, dv_url, apikey=None, count=0)
Checks for a study lock
Returns True if locked. Normally used to check if processing is completed. As tabular processing halts file ingest, there should be no locks on a Dataverse study before performing a data file upload.
Arguments:
study : str — Persistent indentifer of study.
dv_url : str — URL to base Dataverse installation.
apikey : str — API key for user. If not present authorization defaults to self.auth.
count : int — Number of times the function has been called. Logs lock messages only on 0.
force_notab_unlock¶
def force_notab_unlock(study, dv_url, apikey=None)
Checks for a study lock and forcibly unlocks and uningests to prevent tabular file processing. Required if mime and filename spoofing is not sufficient.
Forcible unlocks require a superuser API key.
Arguments:
study : str — Persistent indentifer of study.
dv_url : str — URL to base Dataverse installation.
apikey : str — API key for user. If not present authorization defaults to self.auth.
upload_file¶
def upload_file(dryadUrl=None,
filename=None,
mimetype=None,
size=None,
descr=None,
md5=None,
studyId=None,
dest=None,
fprefix=None,
force_unlock=False,
timeout=300)
Uploads file to Dataverse study. Returns a tuple of the dryadFid (or None) and Dataverse JSON from the POST request. Failures produce JSON with different status messages rather than raising an exception.
Arguments:
filename : str — Filename (not including path).
mimetype : str — Mimetype of file.
size : int — Size in bytes.
studyId : str — Persistent Dataverse study identifier. Defaults to Transfer.dvpid.
dest : str — Destination dataverse installation url. Defaults to constants.DVURL.
md5 : str — md5 checksum for file.
fprefix : str — Path to file, not including a trailing slash.
timeout : int - Timeout in seconds for POST request. Default 300.
dryadUrl : str - Dryad download URL if you want to include a Dryad file id.
force_unlock : bool
— Attempt forcible unlock instead of waiting for tabular
file processing.
Defaults to False.
The Dataverse /locks
endpoint blocks POST and DELETE requests
from non-superusers (undocumented as of 31 March 2021).
Forcible unlock requires a superuser API key.
upload_files¶
def upload_files(files=None, pid=None, fprefix=None, force_unlock=False)
Uploads multiple files to study with persistentId pid. Returns a list of the original tuples plus JSON responses.
Arguments:
files : list — List contains tuples with (dryadDownloadURL, filename, mimetype, size).
pid : str — Defaults to self.dvpid, which is generated by calling dryad2dataverse.transfer.Transfer.upload_study().
fprefix : str — File location prefix. Defaults to dryad2dataverse.constants.TMP
force_unlock : bool
— Attempt forcible unlock instead of waiting for tabular
file processing.
Defaults to False.
The Dataverse /locks
endpoint blocks POST and DELETE requests
from non-superusers (undocumented as of 31 March 2021).
Forcible unlock requires a superuser API key.
upload_json¶
def upload_json(studyId=None, dest=None)
Uploads Dryad json as a separate file for archival purposes.
Arguments:
studyId : str — Dataverse persistent identifier. Default dryad2dataverse.transfer.Transfer.dvpid, which is only generated on dryad2dataverse.transfer.Transfer.upload_study()
dest : str — Base URL for transfer. Default dryad2datavese.constants.DVURL
delete_dv_file¶
def delete_dv_file(dvfid, dvurl=None, key=None)
Deletes files from Dataverse target given a dataverse file ID. This information is unknowable unless discovered by dryad2dataverse.monitor.Monitor or by other methods.
Returns 1 on success (204 response), or 0 on other response.
Arguments:
dvurl : str — Base URL of dataverse instance. Defaults to dryad2dataverse.constants.DVURL.
dvfid : str — Dataverse file ID number.
delete_dv_files¶
def delete_dv_files(dvfids=None, dvurl=None, key=None)
Deletes all files in list of Dataverse file ids from a Dataverse installation.
Arguments:
dvfids : list — List of Dataverse file ids. Defaults to dryad2dataverse.transfer.Transfer.fileDelRecord.
dvurl : str — Base URL of Dataverse. Defaults to dryad2dataverse.constants.DVURL.
key : str — API key for Dataverse. Defaults to dryad2dataverse.constants.APIKEY.
dryad2dataverse.serializer¶
Serializes Dryad study JSON to Dataverse JSON, as well as producing associated file information.
Serializer Objects¶
class Serializer()
Serializes Dryad JSON to Dataverse JSON
__init__¶
def __init__(doi)
Creates Dryad study metadata instance.
Arguments:
doi : str
— DOI of Dryad study. Required for downloading.
- eg
- ‘doi:10.5061/dryad.2rbnzs7jp’
fetch_record¶
def fetch_record(url=None, timeout=45)
Fetches Dryad study record JSON from Dryad V2 API at https://datadryad.org/api/v2/datasets/. Saves to self._dryadJson. Querying Serializer.dryadJson will call this function automatically.
Arguments:
url : str — Dryad instance base URL (eg: ‘https://datadryad.org’).
timeout : int — Timeout in seconds. Default 45.
id¶
@property
def id()
Returns Dryad unique database ID, not the DOI.
Where the original Dryad JSON is dryadJson, it’s the integer trailing portion of:
self.dryadJson['_links']['stash:version']['href']
dryadJson¶
@property
def dryadJson()
Returns Dryad study JSON. Will call Serializer.fetch_record() if no JSON is present.
dryadJson¶
@dryadJson.setter
def dryadJson(value=None)
Fetches Dryad JSON from Dryad website if not supplied.
If supplying it, make sure it’s correct or you will run into trouble with processing later.
Arguments:
value : dict — Dryad JSON.
embargo¶
@property
def embargo()
Check embargo status. Returns boolean True if embargoed.
dvJson¶
@property
def dvJson()
Returns Dataverse study JSON as dict.
fileJson¶
@property
def fileJson(timeout=45)
Returns a list of file JSONs from call to Dryad API /files/{id}, where the ID is parsed from the Dryad JSON. Dryad file listings are paginated, so the return consists of a list of dicts, one per page.
Arguments:
timeout : int — Request timeout in seconds.
files¶
@property
def files()
Returns a list of tuples with:
(Download_location, filename, mimetype, size, description, digest, digestType )
Digest types include, but are not necessarily limited to:
‘adler-32’,’crc-32’,’md2’,’md5’,’sha-1’,’sha-256’, ‘sha-384’,’sha-512’
oversize¶
@property
def oversize(maxsize=None)
Returns a list of Dryad files whose size value exceeds maxsize. Maximum size defaults to dryad2dataverse.constants.MAX_UPLOAD
Arguments:
maxsize : int — Size in bytes in which to flag as oversize. Defaults to constants.MAX_UPLOAD.
dryad2dataverse.exceptions¶
Custom exceptions for error handling.
Dryad2DataverseError Objects¶
class Dryad2DataverseError(Exception)
Base exception class for Dryad2Dataverse errors.
NoTargetError Objects¶
class NoTargetError(Dryad2DataverseError)
No dataverse target supplied error.
DownloadSizeError Objects¶
class DownloadSizeError(Dryad2DataverseError)
Raised when download sizes don’t match reported Dryad file size.
HashError Objects¶
class HashError(Dryad2DataverseError)
Raised on hex digest mismatch.
DatabaseError Objects¶
class DatabaseError(Dryad2DataverseError)
Tracking database error.
DataverseUploadError Objects¶
class DataverseUploadError(Dryad2DataverseError)
Returned on not OK respose (ie, not requests.status_code == 200).
DataverseDownloadError Objects¶
class DataverseDownloadError(Dryad2DataverseError)
Returned on not OK respose (ie, not requests.status_code == 200).
DataverseBadApiKeyError Objects¶
class DataverseBadApiKeyError(Dryad2DataverseError)
Returned on not OK respose (ie, request.request.json()[‘message’] == ‘Bad api key ‘).