API reference

This is a general guide as to how to use the dryad2dataverse package in your own Python software.

Basic usage

Converting JSON

>>> #Convert Dryad JSON to Dataverse JSON and save to a file
>>> import dryad2dataverse.serializer
>>> i_heart_dryad = dryad2dataverse.serializer.Serializer('doi:10.5061/dryad.2rbnzs7jp')
>>> with open('dataverse_json.json', 'w') as f:
    f.write(f'{i_heart_dryad.dvJson}')
>>> #Or just view it this way in a Python session
>>> i_heart_dryad.dvJson

Transferring data

Note: a number of variables must be set [correctly] for this to work, such as your target dataverse. This example continues with the Serializer instance above.

>>> import dryad2dataverse.config
>>> import dryad2dataverse.auth
>>> import dryad2dataverse.transfer
>>> config = dryad2dataverse.config.Config()
>>> # Now go and edit the config file which is saved at your system's default location
>>> # Alternately, you can fill out the config object like a dict
>>> config = dryad2dataverse.config.Config() #you are reloading it
>>> config['token'] = dryad2dataverse.auth.Token(**config)
>>> dv = dryad2dataverse.transfer.Transfer(i_heart_dryad, **config)
>>> # Files must first be downloaded; there is no direct transfer
>>> dv.download_files()
>>> # 'dryad' is the short name of the target dataverse
>>> # Yours may be different
>>> # First, create the study metadata
>>> dv.upload_study(targetDv='dryad', **config)
>>> # Then upload the files
>>> dv.upload_files(**config)

Change monitoring

Because monitoring the status of something over time requires persistence, the dryad2dataverse.monitor.Monitor object uses an SQLite3 database, which has the enormous advantage of being a single file that is portable between systems. This allows monitoring without laborious database configuration on a host system, and updates can be run on any system that has sufficient storage space to act as an intermediary between Dryad and Dataverse. This is quite a simple database, as the documentation on its structure shows.

If you need to change systems just swap the database to the new system.

In theory you could run it from a Raspberry Pi Zero that you have in a desk drawer, although that may not be the wisest idea. Maybe use your cell phone.

Monitoring changes requires both the Serializer and Transfer objects from above.

>>> # Create the Monitor instance
>>> monitor = dryad2dataverse.monitor.Monitor(**config) #config as above
>>> # Check status of your serializer object
>>> monitor.status(i_heart_dryad)
{'status': 'new', 'dvpid': None}
>>> # imagine, now that i_still_heart_dryad is a study
>>> # that was uploaded previously
>>> monitor.status(i_still_heart_dryad)
{'status': 'unchanged', 'dvpid': 'doi:99.99999/FK2/FAKER'}
>>> #Check the difference in files
>>> monitor.diff_files(i_still_heart_dryad)
{}
>>> # After the transfer dv above:
>>> monitor.update(transfer)
>>> # And then, to make your life easier, update the last time you checked Dryad
>>> monitor.set_timestamp()

That’s great! I’m going to use this for my very important data for which I have no backup.

The dryad2dataverse library is free and open source, released under the MIT license. It’s also not written by anyone with a degree in computer science, so as the MIT license says:

Software is provided "as is", without warranty of any kind

Python package API

dryad2dataverse

Dryad to Dataverse utilities. No modules are loaded by default, so

>>> import dryad2dataverse

will work, but will have no effect.

Modules included:

  • dryad2dataverse.config : Configuration for all modules. URLs, API keys, etc are all here. Base configurations are read out of a yaml file in ./data

  • dryad2dataverse.serializer : Download and serialize Dryad JSON to Dataverse JSON.

  • dryad2dataverse.transfer : metadata and file transfer utilities.

  • dryad2dataverse.monitor : Monitoring and database tools for maintaining a pipeline to Dataverse without unnecessary downloading and file duplication.

  • dryad2dataverse.exceptions : Custom exceptions.

dryad2dataverse.config

This module contains the information that configures all the parameters required to transfer data from Dryad to Dataverse.

“Constants” may be a bit strong, but the only constant is the presence of change.

Config

Bases: dict

Holds all the information about dryad2dataverse parameters

Source code in src/dryad2dataverse/config.py
class Config(dict):
    '''
    Holds all the information about dryad2dataverse parameters
    '''
    def __init__(self, cpath: Union[pathlib.Path, str]=None,
                 fname:str=None,
                 force:bool=False):
        '''
        Initalize

        Parameters
        ----------
        force : bool
            Force writing a new config file
        '''
        self.cpath = cpath
        self.fname = fname
        self.force = force
        self.default_locations = {'ios': '~/.config/dryad2dataverse',
                     'linux' : '~/.config/dryad2dataverse',
                     'darwin': '~/Library/Application Support/dryad2dataverse',
                     'win32' : '~/AppData/Roaming/dryad2dataverse',
                     'cygwin' : '~/.config/dryad2dataverse'}

        #Use read() instead of yaml.safe_load.read_text() so that
        #comments are preserved
        with open(importlib.resources.files(
                    'dryad2dataverse.data').joinpath(
                    'dryad2dataverse_config.yml'), mode='r',
                    encoding='utf-8') as w:
            self.template  = w.read()

        if not self.cpath:
            self.cpath = self.default_locations[sys.platform]
        if not self.fname:
            self.fname = 'dryad2dataverse_config.yml'
        self.configfile = pathlib.Path(self.cpath, self.fname).expanduser()

        if self.make_config_template():
            self.load_config()
        else:
            raise FileNotFoundError(f'Can\'t find {self.configfile}')

    @classmethod
    def update_headers(cls,
                       inheader:Union[None, dict]=None,
                       **kwargs)->dict:
        '''
        Update headers with user agent and token information (if present)

        Parameters
        ----------
        inheader : dict
            Existing header if present

        **kwargs
            Keyword arguments, one of which should be 'token' containing
            a dryad2dataverse.auth.Token instance
        '''
        if not kwargs:
            kwargs = {}
        if not inheader:
            inheader = {}
        headers = {'Accept':'application/json',
                   'Content-Type':'application/json'} 
        headers.update({'User-agent' : USERAGENT})
        if kwargs.get('token'):
            headers.update(kwargs['token'].auth_header)
        headers.update(inheader)
        return headers

    def make_config_template(self):
        '''
        Make a default config if one does not exist
        Returns
        -------
        True if created
        False if not
        '''
        if self.configfile.exists() and not self.force:
            return 1
        if not self.configfile.parent.exists():
            self.configfile.parent.mkdir(parents=True)
        with open(self.configfile, 'w', encoding='utf-8') as f:
            f.write(self.template)
        if self.configfile.exists():
            return 1
        return 0

    def load_config(self):
        '''
        Loads the config to a dict
        '''
        try:
            with open(self.configfile, 'r', encoding='utf-8') as f:
                self.update(yaml.safe_load(f))
        except yaml.YAMLError as e:
            LOGGER.exception('Unable to load config file, %s', e)
            sys.exit()

    def overwrite(self):
        '''
        Overwrite the config file with current contents.

        Note that this will remove the comments from the YAML file.
        '''
        with open(self.configfile, 'w', encoding='utf-8') as w:
            yaml.safe_dump(self, w)

    def validate(self):
        '''
        Ensure all keys have values
        '''
        can_be_false = ['force_unlock', 'test_mode']
        badkey = [k for k, v in self.items() if not v]
        for rm in can_be_false:
            badkey.remove(rm)#It can be false
        listkeys = {k:v for k,v in self.items() if isinstance(v, list)}
        for k, v in listkeys.items():
            for sub_v in v:
                if not sub_v:
                    badkey.append(k)
                    break
        if badkey:
            raise ValueError('Null values in configuration. '
                             f'See:\n{"\n".join([str(_) for _ in badkey])}')

__init__(cpath=None, fname=None, force=False)

Initalize

Parameters:
  • force (bool, default: False ) –

    Force writing a new config file

Source code in src/dryad2dataverse/config.py
def __init__(self, cpath: Union[pathlib.Path, str]=None,
             fname:str=None,
             force:bool=False):
    '''
    Initalize

    Parameters
    ----------
    force : bool
        Force writing a new config file
    '''
    self.cpath = cpath
    self.fname = fname
    self.force = force
    self.default_locations = {'ios': '~/.config/dryad2dataverse',
                 'linux' : '~/.config/dryad2dataverse',
                 'darwin': '~/Library/Application Support/dryad2dataverse',
                 'win32' : '~/AppData/Roaming/dryad2dataverse',
                 'cygwin' : '~/.config/dryad2dataverse'}

    #Use read() instead of yaml.safe_load.read_text() so that
    #comments are preserved
    with open(importlib.resources.files(
                'dryad2dataverse.data').joinpath(
                'dryad2dataverse_config.yml'), mode='r',
                encoding='utf-8') as w:
        self.template  = w.read()

    if not self.cpath:
        self.cpath = self.default_locations[sys.platform]
    if not self.fname:
        self.fname = 'dryad2dataverse_config.yml'
    self.configfile = pathlib.Path(self.cpath, self.fname).expanduser()

    if self.make_config_template():
        self.load_config()
    else:
        raise FileNotFoundError(f'Can\'t find {self.configfile}')

load_config()

Loads the config to a dict

Source code in src/dryad2dataverse/config.py
def load_config(self):
    '''
    Loads the config to a dict
    '''
    try:
        with open(self.configfile, 'r', encoding='utf-8') as f:
            self.update(yaml.safe_load(f))
    except yaml.YAMLError as e:
        LOGGER.exception('Unable to load config file, %s', e)
        sys.exit()

make_config_template()

Make a default config if one does not exist

Returns:
  • True if created
  • False if not
Source code in src/dryad2dataverse/config.py
def make_config_template(self):
    '''
    Make a default config if one does not exist
    Returns
    -------
    True if created
    False if not
    '''
    if self.configfile.exists() and not self.force:
        return 1
    if not self.configfile.parent.exists():
        self.configfile.parent.mkdir(parents=True)
    with open(self.configfile, 'w', encoding='utf-8') as f:
        f.write(self.template)
    if self.configfile.exists():
        return 1
    return 0

overwrite()

Overwrite the config file with current contents.

Note that this will remove the comments from the YAML file.

Source code in src/dryad2dataverse/config.py
def overwrite(self):
    '''
    Overwrite the config file with current contents.

    Note that this will remove the comments from the YAML file.
    '''
    with open(self.configfile, 'w', encoding='utf-8') as w:
        yaml.safe_dump(self, w)

update_headers(inheader=None, **kwargs) classmethod

Update headers with user agent and token information (if present)

Parameters:
  • inheader (dict, default: None ) –

    Existing header if present

  • **kwargs

    Keyword arguments, one of which should be ‘token’ containing a dryad2dataverse.auth.Token instance

Source code in src/dryad2dataverse/config.py
@classmethod
def update_headers(cls,
                   inheader:Union[None, dict]=None,
                   **kwargs)->dict:
    '''
    Update headers with user agent and token information (if present)

    Parameters
    ----------
    inheader : dict
        Existing header if present

    **kwargs
        Keyword arguments, one of which should be 'token' containing
        a dryad2dataverse.auth.Token instance
    '''
    if not kwargs:
        kwargs = {}
    if not inheader:
        inheader = {}
    headers = {'Accept':'application/json',
               'Content-Type':'application/json'} 
    headers.update({'User-agent' : USERAGENT})
    if kwargs.get('token'):
        headers.update(kwargs['token'].auth_header)
    headers.update(inheader)
    return headers

validate()

Ensure all keys have values

Source code in src/dryad2dataverse/config.py
def validate(self):
    '''
    Ensure all keys have values
    '''
    can_be_false = ['force_unlock', 'test_mode']
    badkey = [k for k, v in self.items() if not v]
    for rm in can_be_false:
        badkey.remove(rm)#It can be false
    listkeys = {k:v for k,v in self.items() if isinstance(v, list)}
    for k, v in listkeys.items():
        for sub_v in v:
            if not sub_v:
                badkey.append(k)
                break
    if badkey:
        raise ValueError('Null values in configuration. '
                         f'See:\n{"\n".join([str(_) for _ in badkey])}')

dryad2dataverse.auth

Handles authentication and bearer tokens using Dryad’s application ID and secret

Token

Self updating bearer token generator

Source code in src/dryad2dataverse/auth.py
class Token:
    '''
    Self updating bearer token generator
    '''
    def __init__(self, **kwargs):
        '''
        Obtain bearer token

        Parameters
        ----------
        **kwargs
            Must include required keyword arguments as below
        dry_url : str
            Dryad base url (eg: https://datadryad.org)
        app_id : str
            Dryad application ID
        secret : str
            Application secret

        Other parameters
        ----------------
        timeout : int
            timeout in seconds

        '''
        self.kwargs = kwargs
        self.path = '/oauth/token'
        self.data = {'client_id': kwargs['app_id'],
                     'client_secret' : kwargs['secret'],
                     'grant_type': 'client_credentials'}
        self.headers = {'User-agent': USERAGENT,
                        'charset' : 'UTF-8'}
        self.timeout = kwargs.get('timeout', 100)
        self.expiry_time = None
        self.__token_info = None

    def get_bearer_token(self):
        '''
        Obtain a brand new bearer token
        '''
        try:
            tokenr = requests.post(f"{self.kwargs['dry_url']}{self.path}",
                                   headers=self.headers,
                                   data=self.data,
                                   timeout=self.timeout)
            tokenr.raise_for_status()
            self.__token_info = tokenr.json()

        except (requests.exceptions.HTTPError,
                requests.exceptions.RequestException) as err:
            LOGGER.exception('HTTP Error:, %s', err)
            raise err

    def check_token_valid(self)->bool:
        '''
        Checks to see if token is still valid
        '''
        expiry_time = (datetime.datetime.fromtimestamp(self.__token_info['created_at']) +
                       datetime.timedelta(seconds=self.__token_info['expires_in']))
        self.expiry_time = expiry_time.strftime('%Y-%m-%dT%H:%M:%SZ')
        if datetime.datetime.now() > expiry_time:
            return False
        return True

    @property
    def token(self)->str:
        '''
        Return only a valid token
        '''
        if not self.__token_info:
            self.get_bearer_token()
        if not self.check_token_valid():
            self.get_bearer_token()
        return self.__token_info['access_token']

    @property
    def auth_header(self)->dict:
        '''
        Return valid authorization header
        '''
        return {'Accept' : 'application/json',
                'Content-Type' : 'application/json',
                'Authorization' : f'Bearer {self.token}'}

auth_header property

Return valid authorization header

token property

Return only a valid token

__init__(**kwargs)

Obtain bearer token

Parameters:
  • **kwargs

    Must include required keyword arguments as below

  • dry_url (str) –

    Dryad base url (eg: https://datadryad.org)

  • app_id (str) –

    Dryad application ID

  • secret (str) –

    Application secret

  • timeout (int) –

    timeout in seconds

Source code in src/dryad2dataverse/auth.py
def __init__(self, **kwargs):
    '''
    Obtain bearer token

    Parameters
    ----------
    **kwargs
        Must include required keyword arguments as below
    dry_url : str
        Dryad base url (eg: https://datadryad.org)
    app_id : str
        Dryad application ID
    secret : str
        Application secret

    Other parameters
    ----------------
    timeout : int
        timeout in seconds

    '''
    self.kwargs = kwargs
    self.path = '/oauth/token'
    self.data = {'client_id': kwargs['app_id'],
                 'client_secret' : kwargs['secret'],
                 'grant_type': 'client_credentials'}
    self.headers = {'User-agent': USERAGENT,
                    'charset' : 'UTF-8'}
    self.timeout = kwargs.get('timeout', 100)
    self.expiry_time = None
    self.__token_info = None

check_token_valid()

Checks to see if token is still valid

Source code in src/dryad2dataverse/auth.py
def check_token_valid(self)->bool:
    '''
    Checks to see if token is still valid
    '''
    expiry_time = (datetime.datetime.fromtimestamp(self.__token_info['created_at']) +
                   datetime.timedelta(seconds=self.__token_info['expires_in']))
    self.expiry_time = expiry_time.strftime('%Y-%m-%dT%H:%M:%SZ')
    if datetime.datetime.now() > expiry_time:
        return False
    return True

get_bearer_token()

Obtain a brand new bearer token

Source code in src/dryad2dataverse/auth.py
def get_bearer_token(self):
    '''
    Obtain a brand new bearer token
    '''
    try:
        tokenr = requests.post(f"{self.kwargs['dry_url']}{self.path}",
                               headers=self.headers,
                               data=self.data,
                               timeout=self.timeout)
        tokenr.raise_for_status()
        self.__token_info = tokenr.json()

    except (requests.exceptions.HTTPError,
            requests.exceptions.RequestException) as err:
        LOGGER.exception('HTTP Error:, %s', err)
        raise err

dryad2dataverse.serializer

Serializes Dryad study JSON to Dataverse JSON, as well as producing associated file information.

Serializer

Serializes Dryad JSON to Dataverse JSON

Source code in src/dryad2dataverse/serializer.py
class Serializer():
    '''
    Serializes Dryad JSON to Dataverse JSON
    '''
    CC0='''<p>
    <img src="https://licensebuttons.net/p/zero/1.0/88x31.png" title="Creative Commons CC0 1.0 Universal Public Domain Dedication. " style="display:none" onload="this.style.display='inline'" />
    <a href="http://creativecommons.org/publicdomain/zero/1.0" title="Creative Commons CC0 1.0 Universal Public Domain Dedication. " target="_blank">CC0 1.0</a>
    </p>'''

    def __init__(self, doi:str, **kwargs):
        '''
        Creates Dryad study metadata instance.

        Parameters
        ----------
        doi : str
            DOI of Dryad study. Required for downloading.
            eg: 'doi:10.5061/dryad.2rbnzs7jp'

        kwargs : dict
            Other keyword parameters

        Other parameters
        ----------------
        token : dryad2dataverse.auth.Token
            If present, will use authenticated API

        Notes
        -----
        Unpacking a dryad2dataverse.config.Config instance holding
        global setup should give all of the
        required kwargs. ie, Serializer(doi, **config_instance)

        '''
        self.doi = doi
        self.kwargs = kwargs
        self.kwargs['dry_url'] = kwargs.get('dry_url', 'https://datadryad.org')
        self.kwargs['api_path'] = kwargs.get('api_path', '/api/v2')
        self.kwargs['max_upload'] = kwargs.get('max_upload', 3221225472)
        self.kwargs['dv_contact_name'] = kwargs.get('dv_contact_name')
        self.kwargs['dv_contact_email'] = kwargs.get('dv_contact_email')
        if self.kwargs.get('token'):
            if not isinstance(self.kwargs['token'],dryad2dataverse.auth.Token):
                raise ValueError('Token must be a dryad2dataverse.auth.Token instance')
        #Don't need timeout if have RETRY_STRATEGY
        self.kwargs['timeout'] = kwargs.get('timeout', 100)
        self._dryadJson = None
        self._fileJson = None
        self._dvJson = None
        #Serializer objects will be assigned a Dataverse study PID
        #if dryad2Dataverse.transfer.Transfer() is instantiated
        self.dvpid = None
        self.session = requests.Session()
        self.session.mount('https://',
                           HTTPAdapter(max_retries=config.RETRY_STRATEGY))
        LOGGER.debug('Creating Serializer instance object')

    def fetch_record(self, url=None) :
        '''
        Fetches Dryad study record JSON from Dryad V2 API at
        https://datadryad.org/api/v2/datasets/.
        Saves to self._dryadJson. Querying Serializer.dryadJson
        will call this function automatically.

        Parameters
        ----------
        url : str
            Dryad instance base URL (eg: 'https://datadryad.org').
        '''
        if not url:
            url = self.kwargs['dry_url']
        try:
            headers = config.Config.update_headers(**self.kwargs)
            doiClean = urllib.parse.quote(self.doi, safe='')
            resp = self.session.get(f'{url}{self.kwargs["api_path"]}/datasets/{doiClean}',
                                    headers=headers, timeout=self.kwargs['timeout'])
            resp.raise_for_status()
            self._dryadJson = resp.json()
        except (requests.exceptions.HTTPError,
                requests.exceptions.ConnectionError) as err:
            LOGGER.error('URL error for: %s', url)
            LOGGER.exception(err)
            raise

    @property
    def id(self):
        '''
        Returns Dryad unique *database* ID, not the DOI.

        Where the original Dryad JSON is dryadJson, it's the integer
        trailing portion of:

        `self.dryadJson['_links']['stash:version']['href']`
        '''
        href = self.dryadJson['_links']['stash:version']['href']
        index = href.rfind('/') + 1
        return int(href[index:])

    @property
    def dryadJson(self):
        '''
        Returns Dryad study JSON. Will call Serializer.fetch_record() if
        no JSON is present.
        '''
        if not self._dryadJson:
            self.fetch_record()
        return self._dryadJson

    @dryadJson.setter
    def dryadJson(self, value=None):
        '''
        Fetches Dryad JSON from Dryad website if not supplied.

        If supplying it, make sure it's correct or you will run into trouble
        with processing later.

        Parameters
        ----------
        value : dict
            Dryad JSON.
        '''
        if value:
            self._dryadJson = value
        else:
            self.fetch_record()

    @property
    def embargo(self)->bool:
        '''
        Check embargo status. Returns boolean True if embargoed.

        '''
        if self.dryadJson.get('curationStatus') == 'Embargoed':
            return True
        return False

    @property
    def dvJson(self):
        '''
        Returns Dataverse study JSON as dict.
        '''
        self._assemble_json()
        return self._dvJson

    @property
    def fileJson(self):
        '''
        Returns a list of file JSONs from call to Dryad API /files/{id},
        where the ID is parsed from the Dryad JSON. Dryad file listings
        are paginated, so the return consists of a list of dicts, one
        per page.
        '''
        if not self._fileJson:
            try:
                self._fileJson = []
                headers = config.Config.update_headers(**self.kwargs)
                fileList = self.session.get(f'{self.kwargs["dry_url"]}'
                                            f'{self.kwargs["api_path"]}/versions/{self.id}/files',
                                            headers=headers,
                                            timeout=self.kwargs['timeout'])
                fileList.raise_for_status()
                #total = fileList.json()['total'] #Not needed
                lastPage = fileList.json()['_links']['last']['href']
                pages = int(lastPage[lastPage.rfind('=')+1:])
                self._fileJson.append(fileList.json())
                for i in range(2, pages+1):
                    fileCont = self.session.get(f'{self.kwargs["dry_url"]}{self.kwargs["api_path"]}'
                                                f'/versions/{self.id}/files?page={i}',
                                                headers=headers,
                                                timeout=self.kwargs['timeout'])
                    fileCont.raise_for_status()
                    self._fileJson.append(fileCont.json())
            except Exception as e:
                LOGGER.exception(e)
                raise
        return self._fileJson

    @property
    def files(self)->list:
        '''
        Returns a list of tuples with:

        (Download_location, filename, mimetype, size, description,
         digest, digestType )

        Digest types include, but are not necessarily limited to:

        'adler-32','crc-32','md2','md5','sha-1','sha-256',
        'sha-384','sha-512'
        '''
        out = []
        for page in self.fileJson:
            files = page['_embedded'].get('stash:files')
            if files:
                for f in files:
                    #This broke with this commit:
                    # https://github.com/datadryad/dryad-app/commit/b8a333ba34b14e55cbc1d7ed5aa4451e0f41db66

                    #downLink = f['_links']['stash:file-download']['href']
                    downLink = f['_links']['stash:download']['href']
                    downLink = f'{self.kwargs["dry_url"]}{downLink}'
                    name = f['path']
                    mimeType = f['mimeType']
                    size = f['size']
                    #HOW ABOUT PUTTING THIS IN THE DRYAD API PAGE?
                    descr = f.get('description', '')
                    digestType = f.get('digestType', '')
                    #not all files have a digest
                    digest = f.get('digest', '')
                    #Does it matter? If the primary use case is to
                    #compare why not take all the digest types.
                    #md5 = ''
                    #if digestType == 'md5' and digest:
                    #    md5 = digest
                    #    #nothing in the docs as to algorithms so just picking md5
                    #    #Email from Ryan Scherle 30 Nov 20: supported digest type
                    #    #('adler-32','crc-32','md2','md5','sha-1','sha-256',
                    #    #'sha-384','sha-512')
                    out.append((downLink, name, mimeType, size, descr, digestType,
                                digest))

        return out

    @property
    def oversize(self):
        '''
        Returns a list of Dryad files whose size value
        exceeds maxsize. Maximum size defaults to
        dryad2dataverse.config.MAX_UPLOAD
        '''
        maxsize = self.kwargs['max_upload']
        toobig = []
        for f in self.files:
            if f[3] >= maxsize:
                toobig.append(f)
        return toobig

    #def_typeclass(self, typeName, multiple, typeClass):
    @staticmethod
    def _typeclass(typeName, multiple, typeClass):
        '''
        Creates wrapper around single or multiple Dataverse JSON objects.
        Returns a dict *without* the  Dataverse 'value' key'.

        Parameters
        ----------
        typeName : str
            Dataverse typeName (eg: 'author').

        multiple : boolean
            "Multiple" value in Dataverse JSON.

        typeClass : str
            Dataverse typeClass. Usually one of 'compound', 'primitive,
              'controlledVocabulary').
        '''
        return {'typeName':typeName, 'multiple':multiple,
                'typeClass':typeClass}

    @staticmethod
    def _convert_generic(**kwargs):
        '''
        Generic dataverse json segment creator of form:
            ```
            {dvField:
                {'typeName': dvField,
                  'value': dryField}
            ```
        Suitable for generalized conversions. Only provides fields with
        multiple: False and typeclass:Primitive

        Parameters
        ----------
        kwargs : dict
            Dict from Dataverse JSON segment

        Other parameters
        ----------------
        dvField : str
            Dataverse output field

        dryField : str
            Dryad JSON field to convert

        inJson : dict
            Dryad JSON **segment** to convert

        addJSON : dict (optional)
        	any other JSON required to complete (cf ISNI)

        rType : str
        	'dict' (default) or 'list'.
            Returns 'value' field as dict value or list.

        pNotes : str
            Notes to be prepended to list type values.
            No trailing space required.
        '''

        dvField = kwargs.get('dvField')
        dryField = kwargs.get('dryField')
        inJson = kwargs.get('inJson')
        addJson = kwargs.get('addJson')
        pNotes = kwargs.get('pNotes', '')
        rType = kwargs.get('rType', 'dict')
        if not dvField or not dryField or not inJson:
            try:
                raise ValueError('Incorrect or insufficient fields provided')
            except ValueError as e:
                LOGGER.exception(e)
                raise
        outfield = inJson.get(dryField)
        if outfield:
            outfield = outfield.strip()
        #if not outfield:
        #    raise ValueError(f'Dryad field {dryField} not found')
        # If value missing can still concat empty dict
        if not outfield:
            return {}
        if rType == 'list':
            if pNotes:
                outfield = [f'{pNotes} {outfield}']

        outJson = {dvField:{'typeName':dvField,
                            'multiple': False,
                            'typeClass':'primitive',
                            'value': outfield}}
        #Simple conversion
        if not addJson:
            return outJson

        #Add JSONs together
        addJson.update(outJson)
        return addJson

    @staticmethod
    def _convert_author_names(author):
        '''
        Produces required author json fields.
        This is a special case, requiring concatenation of several fields.

        Parameters
        ----------
        author : dict
        	dryad['author'] JSON segment.
        '''
        first = author.get('firstName')
        last = author.get('lastName')
        if first + last is None:
            return None
        authname = f"{author.get('lastName','')}, {author.get('firstName', '')}"
        return {'authorName':
                {'typeName':'authorName', 'value': authname,
                 'multiple':False, 'typeClass':'primitive'}}

    @staticmethod
    def _convert_keywords(*args):
        '''
        Produces the insane keyword structure Dataverse JSON segment
        from a list of words.

        Parameters
        ----------
        args : list
            List with elements as strings.
        	Generally input is Dryad JSON 'keywords', ie *Dryad['keywords'].
            Don't forget to expand the list using *.
        '''
        outlist = []
        for arg in args:
            outlist.append({'keywordValue': {
                'typeName':'keywordValue',
                'value': arg}})
        return outlist

    @staticmethod
    def _convert_notes(dryJson):
        '''
        Returns formatted notes field with Dryad JSON values that
        don't really fit anywhere into the Dataverse JSON.

        Parameters
        ----------
        dryJson : dict
        	Dryad JSON as dict.
        '''
        notes = ''
        #these fields should be concatenated into notes
        notable = ['versionNumber',
                   'versionStatus',
                   'manuscriptNumber',
                   'curationStatus',
                   'preserveCurationStatus',
                   'invoiceId',
                   'sharingLink',
                   'loosenValidation',
                   'skipDataciteUpdate',
                   'storageSize',
                   'visibility',
                   'skipEmails']
        lookup = {'versionNumber': '<b>Dryad version number:</b>',
                  'versionStatus': '<b>Version status:</b>',
                  'manuscriptNumber': '<b>Manuscript number:</b>',
                  'curationStatus': '<b>Dryad curation status:</b>',
                  'preserveCurationStatus': '<b>Dryad preserve curation status:</b>',
                  'invoiceId': '<b>Invoice ID:</b>',
                  'sharingLink': '<b>Sharing link:</b>',
                  'loosenValidation': '<b>Loosen validation:</b>',
                  'skipDataciteUpdate': '<b>Skip Datacite update:</b>',
                  'storageSize': '<b>Storage size:</b>',
                  'visibility': '<b>Visibility:</b>',
                  'skipEmails': '<b>Skip emails:</b>'}
        for note in notable:
            text = dryJson.get(note)
            if text:
                text = str(text).strip()
                if note in lookup:
                    text = f'{lookup.get(note)} {text}'

                notes += f'<p>{text}</p>\n'
        concat = {'typeName':'notesText',
                  'multiple':False,
                  'typeClass': 'primitive',
                  'value': notes}
        return concat

    @staticmethod
    def _boundingbox(north, south, east, west):
        '''
        Makes a Dataverse bounding box from appropriate coordinates.
        Returns Dataverse JSON segment as dict.

        Parameters
        ----------
        north : float
        south : float
        east : float
        west : float

        Notes
        -----
            Coordinates in decimal degrees.
        '''
        names = ['north', 'south', 'east', 'west']
        points = [str(x) for x in [north, south, east, west]]
        #Because coordinates in DV are strings BFY
        coords = [(x[0]+'Longitude', {x[0]:x[1]}) for x in zip(names, points)]
        #Yes, everything is longitude in Dataverse
        out = []
        for coord in coords:
            out.append(Serializer._convert_generic(inJson=coord[1],
                                                   dvField=coord[0],
                                                   #dryField='north'))
                                                   #dryField=[k for k in coord[1].keys()][0]))
                                                   dryField=list(coord[1].keys())[0]))
        return out

    @staticmethod
    def _convert_geospatial(dryJson):
        '''
        Outputs Dataverse geospatial metadata block.

        Parameters
        ----------
        dryJson : dict
        	Dryad json as dict.
        '''
        if dryJson.get('locations'):
            #out = {}
            coverage = []
            box = []
            otherCov = None
            gbbox = None
            for loc in dryJson.get('locations'):
                if loc.get('place'):
                    #These are impossible to clean. Going to "other" field

                    other = Serializer._convert_generic(inJson=loc,
                                                        dvField='otherGeographicCoverage',
                                                        dryField='place')
                    coverage.append(other)


                if loc.get('point'):
                    #makes size zero bounding box
                    north = loc['point']['latitude']
                    south = north
                    east = loc['point']['longitude']
                    west = east
                    point = Serializer._boundingbox(north, south, east, west)
                    box.append(point)

                if loc.get('box'):
                    north = loc['box']['neLatitude']
                    south = loc['box']['swLatitude']
                    east = loc['box']['neLongitude']
                    west = loc['box']['swLongitude']
                    area = Serializer._boundingbox(north, south, east, west)
                    box.append(area)

            if coverage:
                otherCov = Serializer._typeclass(typeName='geographicCoverage',
                                                 multiple=True, typeClass='compound')
                otherCov['value'] = coverage

            if box:
                gbbox = Serializer._typeclass(typeName='geographicCoverage',
                                              multiple=True, typeClass='compound')
                gbbox['value'] = box

            if otherCov or gbbox:
                gblock = {'geospatial': {'displayName' : 'Geospatial Metadata',
                                         'fields': []}}
                if otherCov:
                    gblock['geospatial']['fields'].append(otherCov)
                if gbbox:
                    gblock['geospatial']['fields'].append(gbbox)
            return gblock
        return {}

    def _assemble_json(self, dryJson=None, dvContact=None,
                       dvEmail=None, defContact=True):
        #pylint: disable = too-many-statements, too-many-locals, too-many-branches
        '''

        Assembles Dataverse json from Dryad JSON components.
        Dataverse JSON is a nightmare, so this function is too.

        Parameters
        ----------
        dryJson : dict
        	Dryad json as dict.

        dvContact : str
        	Default Dataverse contact name.

        dvEmail : str
        	Default Dataverse 4 contact email address.

        defContact : boolean
        	Flag to include default contact information with record.
        '''
        if not dvContact:
            dvContact = self.kwargs['dv_contact_name']
        if not dvEmail:
            dvEmail = self.kwargs['dv_contact_email']
        if not dryJson:
            dryJson = self.dryadJson
        LOGGER.debug(dryJson)
        #Licence block changes ensure that it will only work with
        #Dataverse v5.10+
        #Go back to previous commits to see the earlier "standard"
        self._dvJson = {'datasetVersion':
                        {'license':{'name': 'CC0 1.0',
                                    'uri': 'http://creativecommons.org/publicdomain/zero/1.0' },
                         'termsOfUse': Serializer.CC0,
                         'metadataBlocks':{'citation':
                                           {'displayName': 'Citation Metadata',
                                            'fields': []},
                                           }
                         }
                        }
        #REQUIRED Dataverse fields

        #Dryad is a general purpose database; it is hard/impossible to get
        #Dataverse required subject tags out of their keywords, so:
        defaultSubj = {'typeName' : 'subject',
                       'typeClass':'controlledVocabulary',
                       'multiple': True,
                       'value' : ['Other']}
        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(defaultSubj)

        reqdTitle = Serializer._convert_generic(inJson=dryJson,
                                                dryField='title',
                                                dvField='title')['title']

        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(reqdTitle)

        #authors
        out = []
        for a in dryJson['authors']:
            reqdAuthor = Serializer._convert_author_names(a)
            if reqdAuthor:
                affiliation = Serializer._convert_generic(inJson=a,
                                                          dvField='authorAffiliation',
                                                          dryField='affiliation')
                addOrc = {'authorIdentifierScheme':
                          {'typeName':'authorIdentifierScheme',
                           'value': 'ORCID',
                           'typeClass': 'controlledVocabulary',
                           'multiple':False}}
                #only ORCID at UBC
                orcid = Serializer._convert_generic(inJson=a,
                                                    dvField='authorIdentifier',
                                                    dryField='orcid',
                                                    addJson=addOrc)
                if affiliation:
                    reqdAuthor.update(affiliation)
                if orcid:
                    reqdAuthor.update(orcid)
                out.append(reqdAuthor)

        authors = Serializer._typeclass(typeName='author',
                                        multiple=True, typeClass='compound')
        authors['value'] = out

        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(authors)


        ##rewrite as function:contact
        out = []
        for e in dryJson['authors']:
            reqdContact = Serializer._convert_generic(inJson=e,
                                                      dvField='datasetContactEmail',
                                                      dryField='email')
            if reqdContact:
                author = Serializer._convert_author_names(e)
                author = {'author':author['authorName']['value']}
                #for passing to function
                author = Serializer._convert_generic(inJson=author,
                                                     dvField='datasetContactName',
                                                     dryField='author')
                if author:
                    reqdContact.update(author)
                affiliation = Serializer._convert_generic(inJson=e,
                                                          dvField='datasetContactAffiliation',
                                                          dryField='affiliation')
                if affiliation:
                    reqdContact.update(affiliation)
                out.append(reqdContact)

        if defContact:
            #Adds default contact information the tail of the list
            defEmail = Serializer._convert_generic(inJson={'em':dvEmail},
                                                   dvField='datasetContactEmail',
                                                   dryField='em')
            defName = Serializer._convert_generic(inJson={'name':dvContact},
                                                  dvField='datasetContactName',
                                                  dryField='name')
            defEmail.update(defName)
            out.append(defEmail)

        contacts = Serializer._typeclass(typeName='datasetContact',
                                         multiple=True, typeClass='compound')
        contacts['value'] = out
        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(contacts)

        #Description
        description = Serializer._typeclass(typeName='dsDescription',
                                            multiple=True, typeClass='compound')
        desCat = [('abstract', '<b>Abstract</b><br/>'),
                  ('methods', '<b>Methods</b><br />'),
                  ('usageNotes', '<b>Usage notes</b><br />')]
        out = []
        for desc in desCat:
            if dryJson.get(desc[0]):
                descrField = Serializer._convert_generic(inJson=dryJson,
                                                         dvField='dsDescriptionValue',
                                                         dryField=desc[0])
                descrField['dsDescriptionValue']['value'] = (desc[1]
                                                             + descrField['dsDescriptionValue']['value'])

                descDate = Serializer._convert_generic(inJson=dryJson,
                                                       dvField='dsDescriptionDate',
                                                       dryField='lastModificationDate')
                descrField.update(descDate)
                out.append(descrField)

        description['value'] = out
        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(description)

        #Granting agencies
        if dryJson.get('funders'):

            out = []
            for fund in dryJson['funders']:
                org = Serializer._convert_generic(inJson=fund,
                                                  dvField='grantNumberAgency',
                                                  dryField='organization')
                if fund.get('awardNumber'):
                    fund = Serializer._convert_generic(inJson=fund,
                                                       dvField='grantNumberValue',
                                                       dryField='awardNumber')
                    org.update(fund)
                out.append(org)
            grants = Serializer._typeclass(typeName='grantNumber',
                                           multiple=True, typeClass='compound')
            grants['value'] = out
            self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(grants)

        #Keywords
        keywords = Serializer._typeclass(typeName='keyword',
                                         multiple=True, typeClass='compound')
        out = []
        for key in dryJson.get('keywords', []):
            #Apparently keywords are not required
            keydict = {'keyword':key}
            #because takes a dict
            kv = Serializer._convert_generic(inJson=keydict,
                                             dvField='keywordValue',
                                             dryField='keyword')
            vocab = {'dryad':'Dryad'}
            voc = Serializer._convert_generic(inJson=vocab,
                                              dvField='keywordVocabulary',
                                              dryField='dryad')
            kv.update(voc)
            out.append(kv)
        keywords['value'] = out
        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(keywords)

        #modification date
        moddate = Serializer._convert_generic(inJson=dryJson,
                                              dvField='dateOfDeposit',
                                              dryField='lastModificationDate')
        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(moddate['dateOfDeposit'])
        #This one isn't nested BFY

        #distribution date
        distdate = Serializer._convert_generic(inJson=dryJson,
                                               dvField='distributionDate',
                                               dryField='publicationDate')
        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(distdate['distributionDate'])
        #Also not nested

        #publications
        publications = Serializer._typeclass(typeName='publication',
                                             multiple=True,
                                             typeClass='compound')
        #quick and dirty lookup table
        #TODONE see https://github.com/CDL-Dryad/dryad-app/blob/
        #31d17d8dab7ea3bab1256063a1e4d0cb706dd5ec/stash/stash_datacite/
        #app/models/stash_datacite/related_identifier.rb
        #no longer required
        #lookup = {'IsDerivedFrom':'Is derived from',
        #          'Cites':'Cites',
        #          'IsSupplementTo': 'Is supplement to',
        #          'IsSupplementedBy': 'Is supplemented by'}
        out = []
        if dryJson.get('relatedWorks'):
            for r in dryJson.get('relatedWorks'):
                #id = r.get('identifier')
                #TODONE Verify that changing id to _id has not broken anything: 11Feb21
                _id = r.get('identifier')
                #Note:10 Feb 2021 : some records have identifier = ''. BAD DRYAD.
                if not _id:
                    continue
                relationship = r.get('relationship')
                #idType = r.get('identifierType') #not required in _convert_generic
                #citation = {'citation': f"{lookup[relationship]}: {id}"}
                citation = {'citation': relationship.capitalize()}
                pubcite = Serializer._convert_generic(inJson=citation,
                                                      dvField='publicationCitation',
                                                      dryField='citation')
                pubIdType = Serializer._convert_generic(inJson=r,
                                                        dvField='publicationIDType',
                                                        dryField='identifierType')
                #ID type must be lower case
                pubIdType['publicationIDType']['value'] = pubIdType['publicationIDType']['value'].lower()
                pubIdType['publicationIDType']['typeClass'] = 'controlledVocabulary'

                pubUrl = Serializer._convert_generic(inJson=r,
                                                     dvField='publicationURL',
                                                     dryField='identifier')

                #Dryad doesn't just put URLs in their URL field.
                if pubUrl['publicationURL']['value'].lower().startswith('doi:'):
                    fixurl = 'https://doi.org/' + pubUrl['publicationURL']['value'][4:]
                    pubUrl['publicationURL']['value'] = fixurl
                    LOGGER.debug('Rewrote URLs to be %s', fixurl)

                #Dryad doesn't validate URL fields to start with http or https. Assume https
                if not pubUrl['publicationURL']['value'].lower().startswith('htt'):
                    pubUrl['publicationURL']['value'] = ('https://' +
                                                         pubUrl['publicationURL']['value'])
                pubcite.update(pubIdType)
                pubcite.update(pubUrl)
                out.append(pubcite)
        publications['value'] = out
        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(publications)
        #notes
        #go into primary notes field, not DDI
        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(Serializer._convert_notes(dryJson))

        #Geospatial metadata
        self._dvJson['datasetVersion']['metadataBlocks'].update(Serializer._convert_geospatial(dryJson))

        #DOI --> agency/identifier
        doi = Serializer._convert_generic(inJson=dryJson, dryField='identifier',
                                          dvField='otherIdValue')
        doi.update(Serializer._convert_generic(inJson={'agency':'Dryad'},
                                               dryField='agency',
                                               dvField='otherIdAgency'))
        agency = Serializer._typeclass(typeName='otherId',
                                       multiple=True, typeClass='compound')
        agency['value'] = [doi]
        self._dvJson['datasetVersion']['metadataBlocks']['citation']['fields'].append(agency)

dryadJson property writable

Returns Dryad study JSON. Will call Serializer.fetch_record() if no JSON is present.

dvJson property

Returns Dataverse study JSON as dict.

embargo property

Check embargo status. Returns boolean True if embargoed.

fileJson property

Returns a list of file JSONs from call to Dryad API /files/{id}, where the ID is parsed from the Dryad JSON. Dryad file listings are paginated, so the return consists of a list of dicts, one per page.

files property

Returns a list of tuples with:

(Download_location, filename, mimetype, size, description, digest, digestType )

Digest types include, but are not necessarily limited to:

‘adler-32’,’crc-32’,’md2’,’md5’,’sha-1’,’sha-256’, ‘sha-384’,’sha-512’

id property

Returns Dryad unique database ID, not the DOI.

Where the original Dryad JSON is dryadJson, it’s the integer trailing portion of:

self.dryadJson['_links']['stash:version']['href']

oversize property

Returns a list of Dryad files whose size value exceeds maxsize. Maximum size defaults to dryad2dataverse.config.MAX_UPLOAD

__init__(doi, **kwargs)

Creates Dryad study metadata instance.

Parameters:
  • doi (str) –

    DOI of Dryad study. Required for downloading. eg: ‘doi:10.5061/dryad.2rbnzs7jp’

  • kwargs (dict, default: {} ) –

    Other keyword parameters

  • token (Token) –

    If present, will use authenticated API

Notes

Unpacking a dryad2dataverse.config.Config instance holding global setup should give all of the required kwargs. ie, Serializer(doi, **config_instance)

Source code in src/dryad2dataverse/serializer.py
def __init__(self, doi:str, **kwargs):
    '''
    Creates Dryad study metadata instance.

    Parameters
    ----------
    doi : str
        DOI of Dryad study. Required for downloading.
        eg: 'doi:10.5061/dryad.2rbnzs7jp'

    kwargs : dict
        Other keyword parameters

    Other parameters
    ----------------
    token : dryad2dataverse.auth.Token
        If present, will use authenticated API

    Notes
    -----
    Unpacking a dryad2dataverse.config.Config instance holding
    global setup should give all of the
    required kwargs. ie, Serializer(doi, **config_instance)

    '''
    self.doi = doi
    self.kwargs = kwargs
    self.kwargs['dry_url'] = kwargs.get('dry_url', 'https://datadryad.org')
    self.kwargs['api_path'] = kwargs.get('api_path', '/api/v2')
    self.kwargs['max_upload'] = kwargs.get('max_upload', 3221225472)
    self.kwargs['dv_contact_name'] = kwargs.get('dv_contact_name')
    self.kwargs['dv_contact_email'] = kwargs.get('dv_contact_email')
    if self.kwargs.get('token'):
        if not isinstance(self.kwargs['token'],dryad2dataverse.auth.Token):
            raise ValueError('Token must be a dryad2dataverse.auth.Token instance')
    #Don't need timeout if have RETRY_STRATEGY
    self.kwargs['timeout'] = kwargs.get('timeout', 100)
    self._dryadJson = None
    self._fileJson = None
    self._dvJson = None
    #Serializer objects will be assigned a Dataverse study PID
    #if dryad2Dataverse.transfer.Transfer() is instantiated
    self.dvpid = None
    self.session = requests.Session()
    self.session.mount('https://',
                       HTTPAdapter(max_retries=config.RETRY_STRATEGY))
    LOGGER.debug('Creating Serializer instance object')

fetch_record(url=None)

Fetches Dryad study record JSON from Dryad V2 API at https://datadryad.org/api/v2/datasets/. Saves to self._dryadJson. Querying Serializer.dryadJson will call this function automatically.

Parameters:
  • url (str, default: None ) –

    Dryad instance base URL (eg: ‘https://datadryad.org’).

Source code in src/dryad2dataverse/serializer.py
def fetch_record(self, url=None) :
    '''
    Fetches Dryad study record JSON from Dryad V2 API at
    https://datadryad.org/api/v2/datasets/.
    Saves to self._dryadJson. Querying Serializer.dryadJson
    will call this function automatically.

    Parameters
    ----------
    url : str
        Dryad instance base URL (eg: 'https://datadryad.org').
    '''
    if not url:
        url = self.kwargs['dry_url']
    try:
        headers = config.Config.update_headers(**self.kwargs)
        doiClean = urllib.parse.quote(self.doi, safe='')
        resp = self.session.get(f'{url}{self.kwargs["api_path"]}/datasets/{doiClean}',
                                headers=headers, timeout=self.kwargs['timeout'])
        resp.raise_for_status()
        self._dryadJson = resp.json()
    except (requests.exceptions.HTTPError,
            requests.exceptions.ConnectionError) as err:
        LOGGER.error('URL error for: %s', url)
        LOGGER.exception(err)
        raise

dryad2dataverse.transfer

This module handles data downloads and uploads from a Dryad instance to a Dataverse instance

Transfer

Transfers metadata and data files from a Dryad installation to Dataverse installation.

Source code in src/dryad2dataverse/transfer.py
class Transfer():
    '''
    Transfers metadata and data files from a
    Dryad installation to Dataverse installation.
    '''
    #pylint: disable=too-many-instance-attributes
    def __init__(self, dryad, **kwargs):
        '''
        Creates a dryad2dataverse.transfer.Transfer instance.

        Parameters
        ----------
        dryad : dryad2dataverse.serializer.Serializer

        **kwargs
            Normally this would be a dryad2dataverse.constants.Config instance

        Notes
        -----
        Minimum kwargs for function:
        max_upload : int
            Maximum size in bytes
        tempfile_location : str
            Path to temporary directory
        dv_url : str
            Base URL of dataverse instance
        api_key : str
            API key for Dataverse user
        dv_contact_email : str
            Contact email address for Dataverse record
        dv_contact_name : str
            Contact name
        target : str
            Target collection short name
        '''
        self.kwargs = kwargs
        self.dryad = dryad
        self._fileJson = None
        self._files = [list(f) for f in self.dryad.files]
        #self._files = copy.deepcopy(self.dryad.files)
        self.fileUpRecord = []
        self.fileDelRecord = []
        self.dvStudy = None
        self.jsonFlag = None #Whether or not new json uploaded
        self.session = requests.Session()
        self.session.mount('https://', HTTPAdapter(max_retries=config.RETRY_STRATEGY))
        self.check_kwargs()

    def check_kwargs(self):
        '''
        Verify sufficient information
        '''
        required = ['max_upload',
                    'tempfile_location',
                    'dv_url',
                    'api_key',
                    'dv_contact_email',
                    'dv_contact_name',
                    'target']
        keys = self.kwargs.keys()
        for val in required:
            if val not in keys:
                try:
                    raise exceptions.Dryad2DataverseError(f'Required parameter missing: {val}')
                except exceptions.Dryad2DataverseError as err:
                    LOGGER.exception(err)
                    raise

    def _del__(self):
        '''Expunges files from temporary file on deletion'''
        tmp = pathlib.Path(self.kwargs['tempfile_location']).expanduser().absolute()
        for f in self.files:
            if pathlib.Path(tmp, f[1]).exists():
                os.remove(pathlib.Path(tmp, f[1]))

    def test_api_key(self):
        '''
        Tests for an expired API key and raises
        dryad2dataverse.exceptions.Dryad2dataverseBadApiKeyError
        the API key is bad. Ignores other HTTP errors.
        '''
        #API validity check appears to come before a PID validity check
        params = {'persistentId': 'doi:000/000/000'} # PID is irrelevant
        headers = {'X-Dataverse-key': self.kwargs['api_key'], 'User-agent': USERAGENT}
        bad_test = self.session.get(f'{self.kwargs["dv_url"]}/api/datasets/:persistentId',
                                headers=headers,
                                params=params)
        #There's an extra space in the message which Harvard
        #will probably find out about, so . . .
        if bad_test.json().get('message').startswith('Bad api key'):
            try:
                raise exceptions.DataverseBadApiKeyError('Bad API key')
            except exceptions.DataverseBadApiKeyError as err:
                LOGGER.exception(err)
                raise
        try: #other errors
            bad_test.raise_for_status()
        except requests.exceptions.HTTPError:
            pass
        except Exception as e:
            LOGGER.exception(e)
            LOGGER.exception(traceback.format_exc())
            raise

    @property
    def dvpid(self):
        '''
        Returns Dataverse study persistent ID as str.
        '''
        return self.dryad.dvpid

    @property
    def auth(self):
        '''
        Returns datavese authentication header dict.
        ie: `{X-Dataverse-key' : 'APIKEYSTRING'}`
        '''
        return {'X-Dataverse-key' : self.kwargs['api_key']}

    @property
    def fileJson(self):
        '''
        Returns a list of file JSONs from call to Dryad API /files/{id},
        where the ID is parsed from the Dryad JSON. Dryad file listings
        are paginated.
        '''
        return self.dryad.fileJson.copy()

    @property
    def files(self):
        '''
        Returns a list of lists with:

        [Download_location, filename, mimetype, size, description, md5digest]

        This is mutable; downloading a file will add md5 info if not available.
        '''
        return self._files

    @property
    def oversize(self):
        '''
        Returns list of files exceeding Dataverse ingest limit
        dryad2dataverse.constants.MAX_UPLOAD.
        '''
        return self.dryad.oversize

    @property
    def doi(self):
        '''
        Returns Dryad DOI.
        '''
        return self.dryad.doi

    @staticmethod
    def _dryad_file_id(url:str):
        '''
        Returns Dryad fileID from dryad file download URL as integer.

        Parameters
        ----------
        url : str
            Dryad file URL in format
            'https://datadryad.org/api/v2/files/385820/download'.
        '''
        fid = url.strip('/download')
        fid = int(fid[fid.rfind('/')+1:])
        return fid

    @staticmethod
    def _make_dv_head(apikey):
        '''
        Returns Dataverse authentication header as dict.

        Parameters
        ----------
        apikey : str
            Dataverse API key.
        '''
        return {'X-Dataverse-key' : apikey}

    def set_correct_date(self, hdl=None,
                         d_type='distributionDate'):
        '''
        Sets "correct" publication date for Dataverse.

        Parameters
        ----------
        hdl : str
            Persistent indentifier for Dataverse study.
            Defaults to Transfer.dvpid (which can be None if the
            study has not yet been uploaded).
        d_type : str
            Date type. One of  'distributionDate', 'productionDate',
            `dateOfDeposit'. Default 'distributionDate'.

        Notes
        -----
        self.kwargs are normally read from dryad2dataverse.config.Config
        instances.

        dryad2dataverse.serializer maps Dryad 'publicationDate'
        to Dataverse 'distributionDate' (see serializer.py ~line 675).

        Dataverse citation date default is ":publicationDate". See
        Dataverse API reference:
        <https://guides.dataverse.org/en/4.20/api/native-api.html#id54>.

        '''
        try:
            if not hdl:
                hdl = self.dvpid
            headers ={'X-Dataverse-key': self.kwargs['api_key'],
                      'User-agent': USERAGENT}
            params = {'persistentId': hdl}
            set_date = self.session.put(f'{self.kwargs["dv_url"]}/api/'
                                        'datasets/:persistentId/citationdate',
                                        headers=headers,
                                        data=d_type,
                                        params=params)
            set_date.raise_for_status()

        except (requests.exceptions.HTTPError,
                requests.exceptions.ConnectionError) as err:
            LOGGER.warning('Unable to set citation date for %s',
                           hdl)
            LOGGER.warning(err)
            LOGGER.warning(set_date.text)

    def upload_study(self, **kwargs):
        '''
        Uploads Dryad study metadata to target Dataverse or updates existing.
        Supplying a `targetDv` kwarg creates a new study and supplying a
        `dvpid` kwarg updates a currently existing Dataverse study.

        **kwargs : dict
            Normally this is one of the two parameters below

        Other parameters
        ----------------
        targetDv : str
            Short name of target dataverse. Required if new dataset.
            Specify as targetDV=value.
        dvpid : str
            Dataverse persistent ID (for updating metadata).
            This is not required for new uploads, specify as dvpid=value

        Notes
        -----
        One of targetDv or dvpid is required.
        '''
        headers = {'X-Dataverse-key': self.kwargs['api_key'], 'User-agent': USERAGENT}
        targetDv = kwargs.get('targetDv')
        dvpid = kwargs.get('dvpid')
        #dryFid = kwargs.get('dryFid') #Why did I put this here?
        if not targetDv and not dvpid:
            try:
                raise exceptions.NoTargetError('You must supply one of targetDv \
                                        (target dataverse) \
                                         or dvpid (Dataverse persistent ID)')
            except exceptions.NoTargetError as err:
                LOGGER.exception(err)
                raise

        if targetDv and dvpid:
            msg = 'Supply only one of targetDv or dvpid'
            LOGGER.exception(msg)
            raise exceptions.Dryad2DataverseError(msg)

        if not dvpid:
            endpoint = f'{self.kwargs["dv_url"]}/api/dataverses/{targetDv}/datasets'
            upload = self.session.post(endpoint,
                                       headers=headers,
                                       json=self.dryad.dvJson)
            LOGGER.debug(upload.text)
        else:
            endpoint = f'{self.kwargs["dv_url"]}/api/datasets/:persistentId/versions/:draft'
            params = {'persistentId':dvpid}
            #Yes, dataverse uses *different* json for edits
            upload = self.session.put(endpoint, params=params,
                                      headers=headers,
                                      json=self.dryad.dvJson['datasetVersion'])
            #self._dvrecord = upload.json()
            LOGGER.debug(upload.text)

        try:
            updata = upload.json()
            self.dvStudy = updata
            if updata.get('status') != 'OK':
                msg = ('Status return is not OK.'
                        f'{upload.status_code}: '
                        f'{upload.reason}. '
                        f'{upload.request.url} '
                        f'{upload.text}')
                try:
                    raise exceptions.DataverseUploadError(msg)
                except exceptions.DataverseUploadError as err:
                    LOGGER.exception(err)
                    LOGGER.exception(traceback.format_exc())
            upload.raise_for_status()
        except Exception as e: # Only accessible via non-requests exception
            LOGGER.exception(e)
            LOGGER.exception(traceback.format_exc())
            raise

        if targetDv:
            self.dryad.dvpid = updata['data'].get('persistentId')
        if dvpid:
            self.dryad.dvpid = updata['data'].get('datasetPersistentId')
        return self.dvpid

    @staticmethod
    def _check_md5(infile, dig_type):
        '''
        Returns the hex digest of a file (formerly just md5sum).

        Parameters
        ----------
        infile : str
            Complete path to target file.
        dig_type : Union[str, None]
            Digest type
        '''
        #From Ryan Scherle
        #When Dryad calculates a digest, it only uses MD5.
        #But if you have precomputed some other type of digest, we should accept it.
        #The list of allowed values is:
        #('adler-32','crc-32','md2','md5','sha-1','sha-256','sha-384','sha-512')
        #hashlib doesn't support adler-32, crc-32, md2

        blocksize = 2**16
        #Well, this is inelegant
        with open(infile, 'rb') as m:
            #fmd5 = hashlib.md5()
            ## var name kept for posterity. Maybe refactor
            if dig_type in ['sha-1', 'sha-256', 'sha-384', 'sha-512', 'md5', 'md2']:
                if dig_type == 'md2':
                    fmd5 = Crypto.Hash.MD2.new()
                else:
                    fmd5 = HASHTABLE[dig_type]()
                fblock = m.read(blocksize)
                while fblock:
                    fmd5.update(fblock)
                    fblock = m.read(blocksize)
                return fmd5.hexdigest()
            if dig_type in ['adler-32', 'crc-32']:
                fblock = m.read(blocksize)
                curvalue = HASHTABLE[dig_type](fblock)
                while fblock:
                    fblock = m.read(blocksize)
                    curvalue = HASHTABLE[dig_type](fblock, curvalue)
                return curvalue
        LOGGER.exception('Unable to determine hash type for %s: %s', infile, dig_type)
        raise exceptions.HashError(f'Unable to determine hash type for{infile}: {dig_type}')


    def download_file(self, url=None, filename=None,
                      size=None, chk=None, **kwargs):
        '''
        Downloads a file via requests streaming and saves to the
        the defined temporary file directory.
        Returns checksum on success and an exception on failure.

        Parameters
        ----------
        url : str
            URL of download.
        filename : str
            Output file name.
        size : int
            Reported file size in bytes.
        chk : str
            checksum of file (if available and known).
        kwargs : dict

        Other parameters
        ----------------
        digest_type : str
            checksum type (ie, md5, sha-256, etc)
        '''
        #pylint: disable=too-many-branches
        LOGGER.debug('Start download sequence')
        LOGGER.debug('MAX SIZE = %s', self.kwargs['max_upload'])
        LOGGER.debug('Filename: %s, size=%s', filename, size)
        tmp = pathlib.Path(self.kwargs['tempfile_location']).expanduser().absolute()
        if size:
            if size > self.kwargs['max_upload']:
                #TOO BIG
                LOGGER.warning('%s: File %s exceeds '
                               'Dataverse maximum upload size. Skipping download.',
                               self.doi, filename)
                md5 = 'this_file_is_too_big_to_upload__' #HA HA
                for i in self._files:
                    if url == i[0]:
                        i[-1] = md5
                LOGGER.debug('Stop download sequence with large file skip')
                return md5
        try:
            down = self.session.get(url, stream=True,
                                    headers=config.Config.update_headers(**self.kwargs))
            down.raise_for_status()
            with open(pathlib.Path(tmp,filename), 'wb') as fi:
                for chunk in down.iter_content(chunk_size=8192):
                    fi.write(chunk)

            #verify size
            #https://stackoverflow.com/questions/2104080/how-can-i-check-file-size-in-python'
            if size:
                checkSize = os.stat(pathlib.Path(tmp,filename)).st_size
                if checkSize != size:
                    try:
                        raise exceptions.DownloadSizeError('Download size does not '
                                                           'match reported size')
                    except exceptions.DownloadSizeError as e:
                        LOGGER.exception(e)
                        raise
            #now check the md5
            md5 = None
            if chk and kwargs.get('digest_type') in HASHTABLE:
                md5 = Transfer._check_md5(pathlib.Path(tmp,filename),
                                      kwargs['digest_type'])
                if md5 != chk:
                    try:
                        raise exceptions.HashError(f'Hex digest mismatch: {md5} : {chk}')
                        #is this really what I want to do on a bad checksum?
                    except exceptions.HashError as e:
                        LOGGER.exception(e)
                        raise
            for i in self._files:
                if url == i[0]:
                    i[-1] = md5
            LOGGER.debug('Complete download sequence')
            #This doesn't actually return an md5, just the hash value
            return md5
        except (requests.exceptions.HTTPError,
                requests.exceptions.ConnectionError) as err:
            LOGGER.critical('Unable to download %s', url)
            LOGGER.exception(err)
            raise
        except Exception as err:
            LOGGER.exception(err)
            raise

    def download_files(self, files=None):
        '''
        Bulk downloader for files.

        Parameters
        ----------
        files : list
            Items in list can be tuples or list with a minimum of:
            `(dryaddownloadurl, filenamewithoutpath, [md5sum])`
            The md5 sum should be the last member of the tuple.
            Defaults to self.files.

        Notes
        -----
        Normally used without arguments to download all the associated
        files with a Dryad study.
        '''
        if not files:
            files = self.files
        try:
            for f in files:
                self.download_file(url=f[0],
                                   filename=f[1],
                                   mimetype=f[2],
                                   size=f[3],
                                   descr=f[4],
                                   digest_type=f[5],
                                   chk=f[-1])
        except exceptions.DataverseDownloadError as e:
            LOGGER.exception('Unable to download file with info %s\n%s', f, e)
            raise

    def file_lock_check(self, study, count=0):
        '''
        Checks for a study lock

        Returns True if locked. Normally used to check
        if processing is completed. As tabular processing
        halts file ingest, there should be no locks on a
        Dataverse study before performing a data file upload.

        Parameters
        ----------
        study : str
            Persistent indentifer of study.
        count : int
            Number of times the function has been called. Logs
            lock messages only on 0.
        '''
        headers = {'X-Dataverse-key': self.kwargs['api_key'], 'User-agent': USERAGENT}
        params = {'persistentId': study}
        try:
            lock_status = self.session.get(f'{self.kwargs["dv_url"]}'
                                           '/api/datasets/:persistentId/locks',
                                           headers=headers,
                                           params=params)
            lock_status.raise_for_status()
            if lock_status.json().get('data'):
                if count == 0:
                    LOGGER.warning('Study %s has been locked', study)
                    LOGGER.warning('Lock info:\n%s', lock_status.json())
                return True
            return False
        except (requests.exceptions.HTTPError,
                requests.exceptions.ConnectionError) as err:
            LOGGER.error('Unable to detect lock status for %s', study)
            LOGGER.error('ERROR message: %s', lock_status.text)
            LOGGER.exception(err)
            #return True #Should I raise here?
            raise

    def force_notab_unlock(self, study):
        '''
        Checks for a study lock and forcibly unlocks and uningests
        to prevent tabular file processing. Required if mime and filename
        spoofing is not sufficient.

        **Forcible unlocks require a superuser API key.**

        Parameters
        ----------
        study : str
            Persistent indentifer of study.
        '''
        headers = {'X-Dataverse-key': self.kwargs['api_key'], 'User-agent': USERAGENT}
        params = {'persistentId': study}
        lock_status = self.session.get(f'{self.kwargs["dv_url"]}/api/datasets/:persistentId/locks',
                                       headers=headers,
                                       params=params)
        try:
            lock_status.raise_for_status()
        except (requests.exceptions.HTTPError,
                requests.exceptions.ConnectionError) as err:
            LOGGER.exception(err)
            raise
        except Exception as err:
            LOGGER.exception(err)
            raise
        if lock_status.json()['data']:
            LOGGER.warning('Study %s has been locked', study)
            LOGGER.warning('Lock info:\n%s', lock_status.json())
            force_unlock = self.session.delete(f'{self.kwargs["dv_url"]}/api/'
                                               'datasets/:persistentId/locks',
                                               params=params, headers=headers)
            force_unlock.raise_for_status()
            LOGGER.warning('Lock removed for %s', study)
            LOGGER.warning('Lock status:\n %s', force_unlock.json())
            #This is what the file ID was for, in case it can
            #be implemented again.
            #According to Harvard, you can't remove the progress bar
            #for uploaded tab files that squeak through unless you
            #let them ingest first then reingest them. Oh well.
            #See:
            #https://groups.google.com/d/msgid/dataverse-community/
            #74caa708-e39b-4259-874d-5b6b74ef9723n%40googlegroups.com
            #Also, you can't uningest it because it hasn't been
            #ingested once it's been unlocked. So the commented
            #code below is useless (for now)
            #uningest = requests.post(f'{dv_url}/api/files/{fid}/uningest',
            #                         headers=headers,
            #                         timeout=300)
            #LOGGER.warning('Ingest halted for file %s for study %s', fid, study)
            #uningest.raise_for_status()

    def upload_file(self, dryadUrl=None, filename=None,
                    mimetype=None, size=None, descr=None,
                    hashtype=None,
                    #md5=None, studyId=None, dest=None,
                    digest=None, studyId=None, dest=None,
                    fprefix=None, force_unlock=False):
        '''
        Uploads file to Dataverse study. Returns a tuple of the
        dryadFid (or None) and Dataverse JSON from the POST request.
        Failures produce JSON with different status messages
        rather than raising an exception, unless it's some
        horrendous failure whereupon you will get an actual
        exception.

        Parameters
        ----------
        dryadURL : str
            Dryad download URL
        filename : str
            Filename (not including path).
        mimetype : str
            Mimetype of file.
        size : int
            Size in bytes.
        studyId : str
            Persistent Dataverse study identifier.
            Defaults to Transfer.dvpid.
        hashtype: str
            original Dryad hash type
        fprefix : str
            Path to file, not including a trailing slash.
        dryadUrl : str
            Dryad download URL if you want to include a Dryad file id.
        force_unlock : bool
            Attempt forcible unlock instead of waiting for tabular
            file processing.
            Defaults to False.
            The Dataverse `/locks` endpoint blocks POST and DELETE requests
            from non-superusers (undocumented as of 31 March 2021).
            **Forcible unlock requires a superuser API key.**
        '''
        #pylint: disable = consider-using-with, too-many-arguments, too-many-positional-arguments
        #pylint:disable=too-many-locals, too-many-branches, too-many-statements
        #Fix the arguments one day
        if not studyId:
            studyId = self.dvpid
        dest = self.kwargs['dv_url']
        fprefix = pathlib.Path(self.kwargs['tempfile_location']).expanduser().absolute()
        if dryadUrl:
            fid = dryadUrl.strip('/download')
            fid = int(fid[fid.rfind('/')+1:])
        else:
            fid = 0 #dummy fid for non-Dryad use
        params = {'persistentId' : studyId}
        upfile = pathlib.Path(fprefix, filename[:])
        badExt = filename[filename.rfind('.'):].lower()
        #Descriptions are technically possible, although how to add
        #them is buried in Dryad's API documentation
        dv4meta = {'label' : filename[:], 'description' : descr}
        #if mimetype == 'application/zip' or filename.lower().endswith('.zip'):
        if mimetype == 'application/zip' or badExt in self.kwargs.get('notab',[]):
            mimetype = 'application/octet-stream' # stop unzipping automatically
            filename += '.NOPROCESS' # Also screw with their naming convention
            #debug log about file names to see what is up with XSLX
            #see doi:10.5061/dryad.z8w9ghxb6
            LOGGER.debug('File renamed to %s for upload', filename)
        if size >= self.kwargs['max_upload']:
            fail = (fid, {'status' : 'Failure: MAX_UPLOAD size exceeded'})
            self.fileUpRecord.append(fail)
            LOGGER.warning('%s: File %s of '
                           'size %s exceeds '
                           'Dataverse MAX_UPLOAD size. Skipping.', self.doi, filename, size)
            return fail

        fields = {'file': (filename, open(upfile, 'rb'), mimetype)}
        fields.update({'jsonData': f'{dv4meta}'})
        multi = MultipartEncoder(fields=fields)
        ctype = {'Content-type' : multi.content_type}
        tmphead = self.auth.copy()
        tmphead.update(ctype)
        tmphead.update({'User-agent':USERAGENT})
        url = dest + '/api/datasets/:persistentId/add'
        upload = self.session.post(url, params=params,
                                       headers=tmphead,
                                       data=multi)
        try:
            upload.raise_for_status()

        except (requests.exceptions.HTTPError,
                    requests.exceptions.ConnectionError):
            LOGGER.critical('Error %s: %s, upload.status_code, upload.reason')
            return (fid, {'status' : f'Failure: Reason - {upload.status_code}: {upload.reason}'})


        try:
            self.fileUpRecord.append((fid, upload.json()))
            upmd5 = upload.json()['data']['files'][0]['dataFile']['checksum']['value']
            #Dataverse hash type
            _type = upload.json()['data']['files'][0]['dataFile']['checksum']['type']
            if _type.lower() != hashtype.lower():
                comparator = self._check_md5(upfile, _type.lower())
            else:
                comparator = digest
            #if hashtype.lower () != 'md5':
            #    #get an md5 because dataverse uses md5s. Or most of them do anyway.
            #    #One day this will be rewritten properly.
            #    md5 = self._check_md5(filename, 'md5')
            #else:
            #    md5 = digest
            #if md5 and (upmd5 != md5):
            if upmd5 != comparator:
                try:
                    raise exceptions.HashError(f'{_type} mismatch:\nlocal: '
                                               f'{comparator}\nuploaded: {upmd5}')
                except exceptions.HashError as e:
                    LOGGER.exception(e)
                    return (fid, {'status': e})
            #Make damn sure that the study isn't locked because of
            #tab file processing
            ##SPSS files still process despite spoofing MIME and extension
            ##so there's also a forcible unlock check

            #fid = upload.json()['data']['files'][0]['dataFile']['id']
            #fid not required for unlock
            #self.force_notab_unlock(studyId, dest, fid)
            if force_unlock:
                self.force_notab_unlock(studyId)
            else:
                count = 0
                wait = True
                while wait:
                    wait = self.file_lock_check(studyId, count)
                    if wait:
                        time.sleep(15) # Don't hit it too often
                    count += 1


            return (fid, upload.json())

        except requests.exceptions.JSONDecodeError as e:
            LOGGER.warning('JSON error with upload')
            LOGGER.exception(e)
            return (fid, {'status' : f'Failure: Reason {upload.reason}'})

        #It can crash later
        except Exception as f_plus: #pylint: disable=broad-except
            LOGGER.exception(f_plus)
            return (fid, {'status' : f'Failure: Reason: {f_plus}'})

    def upload_files(self, files=None, pid=None, fprefix=None, force_unlock=False):
        '''
        Uploads multiple files to study with persistentId pid.
        Returns a list of the original tuples plus JSON responses.

        Parameters
        ----------
        files : list
            List contains tuples with
            (dryadDownloadURL, filename, mimetype, size).
        pid : str
            Defaults to self.dvpid, which is generated by calling
            dryad2dataverse.transfer.Transfer.upload_study().
        force_unlock : bool
            Attempt forcible unlock instead of waiting for tabular
            file processing.
            Defaults to False.
            The Dataverse `/locks` endpoint blocks POST and DELETE requests
            from non-superusers (undocumented as of 31 March 2021).
            **Forcible unlock requires a superuser API key.**
        '''
        if not files:
            files = self.files
        fprefix = pathlib.Path(self.kwargs['tempfile_location']).expanduser().absolute()
        out = []
        for f in files:
            #out.append(self.upload_file(f[0], f[1], f[2], f[3],
            #                             f[4], f[5], pid, fprefix=fprefix))
            #out.append(self.upload_file(*[x for x in f],
            #last item in files is not necessary
            out.append(self.upload_file(*list(f)[:-1],
                                        studyId=pid, fprefix=fprefix,
                                        force_unlock=force_unlock))
        return out

    def upload_json(self, studyId=None, dest=None):
        '''
        Uploads Dryad json as a separate file for archival purposes.

        Parameters
        ----------
        studyId : str
            Dataverse persistent identifier.
            Default dryad2dataverse.transfer.Transfer.dvpid,
            which is only generated on
            dryad2dataverse.transfer.Transfer.upload_study()
        '''
        if not studyId:
            studyId = self.dvpid
        dest = self.kwargs['dv_url']
        if not self.jsonFlag:
            url = dest + '/api/datasets/:persistentId/add'
            pack = io.StringIO(json.dumps(self.dryad.dryadJson))
            desc = {'description':'Original JSON from Dryad',
                    'categories':['Documentation', 'Code']}
            fname = self.doi[self.doi.rfind('/')+1:].replace('.', '_')
            payload = {'file': (f'{fname}.json', pack, 'text/plain;charset=UTF-8'),
                       'jsonData':f'{desc}'}
            params = {'persistentId':studyId}
            try:
                meta = self.session.post(f'{url}',
                                         params=params,
                                         headers=self.auth,
                                         files=payload)
                #0 because no dryad fid will be zero
                meta.raise_for_status()
                self.fileUpRecord.append((0, meta.json()))
                self.jsonFlag = (0, meta.json())
                LOGGER.debug('Successfully uploaded Dryad JSON to %s', studyId)

            #JSON uploads randomly fail with a Dataverse server.log error of
            #"A system exception occurred during an invocation on EJB . . ."
            #Not reproducible, so errors will only be written to the log.
            #Jesus.
            except (requests.exceptions.HTTPError,
                    requests.exceptions.ConnectionError) as err:
                LOGGER.error('Unable to upload Dryad JSON to %s', studyId)
                LOGGER.exception(err)
                #And further checking as to what is happening
                self.fileUpRecord.append((0, {'status':'Failure: Unable to upload Dryad JSON'}))
                if not isinstance(self.dryad.dryadJson, dict):
                    LOGGER.error('Dryad JSON is not a dictionary')
            except Exception as err:
                LOGGER.error('Unable to upload Dryad JSON')
                LOGGER.exception(err)
                raise

    def delete_dv_file(self, dvfid)->bool:
        #WTAF curl -u $API_TOKEN: -X DELETE
        #https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/123

        '''
        Deletes files from Dataverse target given a dataverse file ID.
        This information is unknowable unless discovered by
        dryad2dataverse.monitor.Monitor or by other methods.

        Returns 1 on success (204 response), or 0 on other response.

        Parameters
        ----------
        dvfid : str
            Dataverse file ID number.
        '''
        delme = self.session.delete(f'{self.kwargs["dv_url"]}/'
                                    'dvn/api/data-deposit/v1.1/swordv2/edit-media'
                                    f'/file/{dvfid}',
                                    auth=(self.kwargs['api_key'], ''))
        if delme.status_code == 204:
            self.fileDelRecord.append(dvfid)
            return 1
        return 0

    def delete_dv_files(self, dvfids=None):
        '''
        Deletes all files in list of Dataverse file ids from
        a Dataverse installation.

        Parameters
        ----------
        dvfids : list
            List of Dataverse file ids.
            Defaults to dryad2dataverse.transfer.Transfer.fileDelRecord.
        '''
        #if not dvfids:
        #   dvfids = self.fileDelRecord
        for fid in dvfids:
            self.delete_dv_file(fid)

auth property

Returns datavese authentication header dict. ie: {X-Dataverse-key' : 'APIKEYSTRING'}

doi property

Returns Dryad DOI.

dvpid property

Returns Dataverse study persistent ID as str.

fileJson property

Returns a list of file JSONs from call to Dryad API /files/{id}, where the ID is parsed from the Dryad JSON. Dryad file listings are paginated.

files property

Returns a list of lists with:

[Download_location, filename, mimetype, size, description, md5digest]

This is mutable; downloading a file will add md5 info if not available.

oversize property

Returns list of files exceeding Dataverse ingest limit dryad2dataverse.constants.MAX_UPLOAD.

__init__(dryad, **kwargs)

Creates a dryad2dataverse.transfer.Transfer instance.

Parameters:
  • dryad (Serializer) –
  • **kwargs

    Normally this would be a dryad2dataverse.constants.Config instance

Notes

Minimum kwargs for function: max_upload : int Maximum size in bytes tempfile_location : str Path to temporary directory dv_url : str Base URL of dataverse instance api_key : str API key for Dataverse user dv_contact_email : str Contact email address for Dataverse record dv_contact_name : str Contact name target : str Target collection short name

Source code in src/dryad2dataverse/transfer.py
def __init__(self, dryad, **kwargs):
    '''
    Creates a dryad2dataverse.transfer.Transfer instance.

    Parameters
    ----------
    dryad : dryad2dataverse.serializer.Serializer

    **kwargs
        Normally this would be a dryad2dataverse.constants.Config instance

    Notes
    -----
    Minimum kwargs for function:
    max_upload : int
        Maximum size in bytes
    tempfile_location : str
        Path to temporary directory
    dv_url : str
        Base URL of dataverse instance
    api_key : str
        API key for Dataverse user
    dv_contact_email : str
        Contact email address for Dataverse record
    dv_contact_name : str
        Contact name
    target : str
        Target collection short name
    '''
    self.kwargs = kwargs
    self.dryad = dryad
    self._fileJson = None
    self._files = [list(f) for f in self.dryad.files]
    #self._files = copy.deepcopy(self.dryad.files)
    self.fileUpRecord = []
    self.fileDelRecord = []
    self.dvStudy = None
    self.jsonFlag = None #Whether or not new json uploaded
    self.session = requests.Session()
    self.session.mount('https://', HTTPAdapter(max_retries=config.RETRY_STRATEGY))
    self.check_kwargs()

check_kwargs()

Verify sufficient information

Source code in src/dryad2dataverse/transfer.py
def check_kwargs(self):
    '''
    Verify sufficient information
    '''
    required = ['max_upload',
                'tempfile_location',
                'dv_url',
                'api_key',
                'dv_contact_email',
                'dv_contact_name',
                'target']
    keys = self.kwargs.keys()
    for val in required:
        if val not in keys:
            try:
                raise exceptions.Dryad2DataverseError(f'Required parameter missing: {val}')
            except exceptions.Dryad2DataverseError as err:
                LOGGER.exception(err)
                raise

delete_dv_file(dvfid)

Deletes files from Dataverse target given a dataverse file ID. This information is unknowable unless discovered by dryad2dataverse.monitor.Monitor or by other methods.

Returns 1 on success (204 response), or 0 on other response.

Parameters:
  • dvfid (str) –

    Dataverse file ID number.

Source code in src/dryad2dataverse/transfer.py
def delete_dv_file(self, dvfid)->bool:
    #WTAF curl -u $API_TOKEN: -X DELETE
    #https://$HOSTNAME/dvn/api/data-deposit/v1.1/swordv2/edit-media/file/123

    '''
    Deletes files from Dataverse target given a dataverse file ID.
    This information is unknowable unless discovered by
    dryad2dataverse.monitor.Monitor or by other methods.

    Returns 1 on success (204 response), or 0 on other response.

    Parameters
    ----------
    dvfid : str
        Dataverse file ID number.
    '''
    delme = self.session.delete(f'{self.kwargs["dv_url"]}/'
                                'dvn/api/data-deposit/v1.1/swordv2/edit-media'
                                f'/file/{dvfid}',
                                auth=(self.kwargs['api_key'], ''))
    if delme.status_code == 204:
        self.fileDelRecord.append(dvfid)
        return 1
    return 0

delete_dv_files(dvfids=None)

Deletes all files in list of Dataverse file ids from a Dataverse installation.

Parameters:
  • dvfids (list, default: None ) –

    List of Dataverse file ids. Defaults to dryad2dataverse.transfer.Transfer.fileDelRecord.

Source code in src/dryad2dataverse/transfer.py
def delete_dv_files(self, dvfids=None):
    '''
    Deletes all files in list of Dataverse file ids from
    a Dataverse installation.

    Parameters
    ----------
    dvfids : list
        List of Dataverse file ids.
        Defaults to dryad2dataverse.transfer.Transfer.fileDelRecord.
    '''
    #if not dvfids:
    #   dvfids = self.fileDelRecord
    for fid in dvfids:
        self.delete_dv_file(fid)

download_file(url=None, filename=None, size=None, chk=None, **kwargs)

Downloads a file via requests streaming and saves to the the defined temporary file directory. Returns checksum on success and an exception on failure.

Parameters:
  • url (str, default: None ) –

    URL of download.

  • filename (str, default: None ) –

    Output file name.

  • size (int, default: None ) –

    Reported file size in bytes.

  • chk (str, default: None ) –

    checksum of file (if available and known).

  • kwargs (dict, default: {} ) –
  • digest_type (str) –

    checksum type (ie, md5, sha-256, etc)

Source code in src/dryad2dataverse/transfer.py
def download_file(self, url=None, filename=None,
                  size=None, chk=None, **kwargs):
    '''
    Downloads a file via requests streaming and saves to the
    the defined temporary file directory.
    Returns checksum on success and an exception on failure.

    Parameters
    ----------
    url : str
        URL of download.
    filename : str
        Output file name.
    size : int
        Reported file size in bytes.
    chk : str
        checksum of file (if available and known).
    kwargs : dict

    Other parameters
    ----------------
    digest_type : str
        checksum type (ie, md5, sha-256, etc)
    '''
    #pylint: disable=too-many-branches
    LOGGER.debug('Start download sequence')
    LOGGER.debug('MAX SIZE = %s', self.kwargs['max_upload'])
    LOGGER.debug('Filename: %s, size=%s', filename, size)
    tmp = pathlib.Path(self.kwargs['tempfile_location']).expanduser().absolute()
    if size:
        if size > self.kwargs['max_upload']:
            #TOO BIG
            LOGGER.warning('%s: File %s exceeds '
                           'Dataverse maximum upload size. Skipping download.',
                           self.doi, filename)
            md5 = 'this_file_is_too_big_to_upload__' #HA HA
            for i in self._files:
                if url == i[0]:
                    i[-1] = md5
            LOGGER.debug('Stop download sequence with large file skip')
            return md5
    try:
        down = self.session.get(url, stream=True,
                                headers=config.Config.update_headers(**self.kwargs))
        down.raise_for_status()
        with open(pathlib.Path(tmp,filename), 'wb') as fi:
            for chunk in down.iter_content(chunk_size=8192):
                fi.write(chunk)

        #verify size
        #https://stackoverflow.com/questions/2104080/how-can-i-check-file-size-in-python'
        if size:
            checkSize = os.stat(pathlib.Path(tmp,filename)).st_size
            if checkSize != size:
                try:
                    raise exceptions.DownloadSizeError('Download size does not '
                                                       'match reported size')
                except exceptions.DownloadSizeError as e:
                    LOGGER.exception(e)
                    raise
        #now check the md5
        md5 = None
        if chk and kwargs.get('digest_type') in HASHTABLE:
            md5 = Transfer._check_md5(pathlib.Path(tmp,filename),
                                  kwargs['digest_type'])
            if md5 != chk:
                try:
                    raise exceptions.HashError(f'Hex digest mismatch: {md5} : {chk}')
                    #is this really what I want to do on a bad checksum?
                except exceptions.HashError as e:
                    LOGGER.exception(e)
                    raise
        for i in self._files:
            if url == i[0]:
                i[-1] = md5
        LOGGER.debug('Complete download sequence')
        #This doesn't actually return an md5, just the hash value
        return md5
    except (requests.exceptions.HTTPError,
            requests.exceptions.ConnectionError) as err:
        LOGGER.critical('Unable to download %s', url)
        LOGGER.exception(err)
        raise
    except Exception as err:
        LOGGER.exception(err)
        raise

download_files(files=None)

Bulk downloader for files.

Parameters:
  • files (list, default: None ) –

    Items in list can be tuples or list with a minimum of: (dryaddownloadurl, filenamewithoutpath, [md5sum]) The md5 sum should be the last member of the tuple. Defaults to self.files.

Notes

Normally used without arguments to download all the associated files with a Dryad study.

Source code in src/dryad2dataverse/transfer.py
def download_files(self, files=None):
    '''
    Bulk downloader for files.

    Parameters
    ----------
    files : list
        Items in list can be tuples or list with a minimum of:
        `(dryaddownloadurl, filenamewithoutpath, [md5sum])`
        The md5 sum should be the last member of the tuple.
        Defaults to self.files.

    Notes
    -----
    Normally used without arguments to download all the associated
    files with a Dryad study.
    '''
    if not files:
        files = self.files
    try:
        for f in files:
            self.download_file(url=f[0],
                               filename=f[1],
                               mimetype=f[2],
                               size=f[3],
                               descr=f[4],
                               digest_type=f[5],
                               chk=f[-1])
    except exceptions.DataverseDownloadError as e:
        LOGGER.exception('Unable to download file with info %s\n%s', f, e)
        raise

file_lock_check(study, count=0)

Checks for a study lock

Returns True if locked. Normally used to check if processing is completed. As tabular processing halts file ingest, there should be no locks on a Dataverse study before performing a data file upload.

Parameters:
  • study (str) –

    Persistent indentifer of study.

  • count (int, default: 0 ) –

    Number of times the function has been called. Logs lock messages only on 0.

Source code in src/dryad2dataverse/transfer.py
def file_lock_check(self, study, count=0):
    '''
    Checks for a study lock

    Returns True if locked. Normally used to check
    if processing is completed. As tabular processing
    halts file ingest, there should be no locks on a
    Dataverse study before performing a data file upload.

    Parameters
    ----------
    study : str
        Persistent indentifer of study.
    count : int
        Number of times the function has been called. Logs
        lock messages only on 0.
    '''
    headers = {'X-Dataverse-key': self.kwargs['api_key'], 'User-agent': USERAGENT}
    params = {'persistentId': study}
    try:
        lock_status = self.session.get(f'{self.kwargs["dv_url"]}'
                                       '/api/datasets/:persistentId/locks',
                                       headers=headers,
                                       params=params)
        lock_status.raise_for_status()
        if lock_status.json().get('data'):
            if count == 0:
                LOGGER.warning('Study %s has been locked', study)
                LOGGER.warning('Lock info:\n%s', lock_status.json())
            return True
        return False
    except (requests.exceptions.HTTPError,
            requests.exceptions.ConnectionError) as err:
        LOGGER.error('Unable to detect lock status for %s', study)
        LOGGER.error('ERROR message: %s', lock_status.text)
        LOGGER.exception(err)
        #return True #Should I raise here?
        raise

force_notab_unlock(study)

Checks for a study lock and forcibly unlocks and uningests to prevent tabular file processing. Required if mime and filename spoofing is not sufficient.

Forcible unlocks require a superuser API key.

Parameters:
  • study (str) –

    Persistent indentifer of study.

Source code in src/dryad2dataverse/transfer.py
def force_notab_unlock(self, study):
    '''
    Checks for a study lock and forcibly unlocks and uningests
    to prevent tabular file processing. Required if mime and filename
    spoofing is not sufficient.

    **Forcible unlocks require a superuser API key.**

    Parameters
    ----------
    study : str
        Persistent indentifer of study.
    '''
    headers = {'X-Dataverse-key': self.kwargs['api_key'], 'User-agent': USERAGENT}
    params = {'persistentId': study}
    lock_status = self.session.get(f'{self.kwargs["dv_url"]}/api/datasets/:persistentId/locks',
                                   headers=headers,
                                   params=params)
    try:
        lock_status.raise_for_status()
    except (requests.exceptions.HTTPError,
            requests.exceptions.ConnectionError) as err:
        LOGGER.exception(err)
        raise
    except Exception as err:
        LOGGER.exception(err)
        raise
    if lock_status.json()['data']:
        LOGGER.warning('Study %s has been locked', study)
        LOGGER.warning('Lock info:\n%s', lock_status.json())
        force_unlock = self.session.delete(f'{self.kwargs["dv_url"]}/api/'
                                           'datasets/:persistentId/locks',
                                           params=params, headers=headers)
        force_unlock.raise_for_status()
        LOGGER.warning('Lock removed for %s', study)
        LOGGER.warning('Lock status:\n %s', force_unlock.json())

set_correct_date(hdl=None, d_type='distributionDate')

Sets “correct” publication date for Dataverse.

Parameters:
  • hdl (str, default: None ) –

    Persistent indentifier for Dataverse study. Defaults to Transfer.dvpid (which can be None if the study has not yet been uploaded).

  • d_type (str, default: 'distributionDate' ) –

    Date type. One of ‘distributionDate’, ‘productionDate’, `dateOfDeposit’. Default ‘distributionDate’.

Notes

self.kwargs are normally read from dryad2dataverse.config.Config instances.

dryad2dataverse.serializer maps Dryad ‘publicationDate’ to Dataverse ‘distributionDate’ (see serializer.py ~line 675).

Dataverse citation date default is “:publicationDate”. See Dataverse API reference: https://guides.dataverse.org/en/4.20/api/native-api.html#id54.

Source code in src/dryad2dataverse/transfer.py
def set_correct_date(self, hdl=None,
                     d_type='distributionDate'):
    '''
    Sets "correct" publication date for Dataverse.

    Parameters
    ----------
    hdl : str
        Persistent indentifier for Dataverse study.
        Defaults to Transfer.dvpid (which can be None if the
        study has not yet been uploaded).
    d_type : str
        Date type. One of  'distributionDate', 'productionDate',
        `dateOfDeposit'. Default 'distributionDate'.

    Notes
    -----
    self.kwargs are normally read from dryad2dataverse.config.Config
    instances.

    dryad2dataverse.serializer maps Dryad 'publicationDate'
    to Dataverse 'distributionDate' (see serializer.py ~line 675).

    Dataverse citation date default is ":publicationDate". See
    Dataverse API reference:
    <https://guides.dataverse.org/en/4.20/api/native-api.html#id54>.

    '''
    try:
        if not hdl:
            hdl = self.dvpid
        headers ={'X-Dataverse-key': self.kwargs['api_key'],
                  'User-agent': USERAGENT}
        params = {'persistentId': hdl}
        set_date = self.session.put(f'{self.kwargs["dv_url"]}/api/'
                                    'datasets/:persistentId/citationdate',
                                    headers=headers,
                                    data=d_type,
                                    params=params)
        set_date.raise_for_status()

    except (requests.exceptions.HTTPError,
            requests.exceptions.ConnectionError) as err:
        LOGGER.warning('Unable to set citation date for %s',
                       hdl)
        LOGGER.warning(err)
        LOGGER.warning(set_date.text)

test_api_key()

Tests for an expired API key and raises dryad2dataverse.exceptions.Dryad2dataverseBadApiKeyError the API key is bad. Ignores other HTTP errors.

Source code in src/dryad2dataverse/transfer.py
def test_api_key(self):
    '''
    Tests for an expired API key and raises
    dryad2dataverse.exceptions.Dryad2dataverseBadApiKeyError
    the API key is bad. Ignores other HTTP errors.
    '''
    #API validity check appears to come before a PID validity check
    params = {'persistentId': 'doi:000/000/000'} # PID is irrelevant
    headers = {'X-Dataverse-key': self.kwargs['api_key'], 'User-agent': USERAGENT}
    bad_test = self.session.get(f'{self.kwargs["dv_url"]}/api/datasets/:persistentId',
                            headers=headers,
                            params=params)
    #There's an extra space in the message which Harvard
    #will probably find out about, so . . .
    if bad_test.json().get('message').startswith('Bad api key'):
        try:
            raise exceptions.DataverseBadApiKeyError('Bad API key')
        except exceptions.DataverseBadApiKeyError as err:
            LOGGER.exception(err)
            raise
    try: #other errors
        bad_test.raise_for_status()
    except requests.exceptions.HTTPError:
        pass
    except Exception as e:
        LOGGER.exception(e)
        LOGGER.exception(traceback.format_exc())
        raise

upload_file(dryadUrl=None, filename=None, mimetype=None, size=None, descr=None, hashtype=None, digest=None, studyId=None, dest=None, fprefix=None, force_unlock=False)

Uploads file to Dataverse study. Returns a tuple of the dryadFid (or None) and Dataverse JSON from the POST request. Failures produce JSON with different status messages rather than raising an exception, unless it’s some horrendous failure whereupon you will get an actual exception.

Parameters:
  • dryadURL (str) –

    Dryad download URL

  • filename (str, default: None ) –

    Filename (not including path).

  • mimetype (str, default: None ) –

    Mimetype of file.

  • size (int, default: None ) –

    Size in bytes.

  • studyId (str, default: None ) –

    Persistent Dataverse study identifier. Defaults to Transfer.dvpid.

  • hashtype

    original Dryad hash type

  • fprefix (str, default: None ) –

    Path to file, not including a trailing slash.

  • dryadUrl (str, default: None ) –

    Dryad download URL if you want to include a Dryad file id.

  • force_unlock (bool, default: False ) –

    Attempt forcible unlock instead of waiting for tabular file processing. Defaults to False. The Dataverse /locks endpoint blocks POST and DELETE requests from non-superusers (undocumented as of 31 March 2021). Forcible unlock requires a superuser API key.

Source code in src/dryad2dataverse/transfer.py
def upload_file(self, dryadUrl=None, filename=None,
                mimetype=None, size=None, descr=None,
                hashtype=None,
                #md5=None, studyId=None, dest=None,
                digest=None, studyId=None, dest=None,
                fprefix=None, force_unlock=False):
    '''
    Uploads file to Dataverse study. Returns a tuple of the
    dryadFid (or None) and Dataverse JSON from the POST request.
    Failures produce JSON with different status messages
    rather than raising an exception, unless it's some
    horrendous failure whereupon you will get an actual
    exception.

    Parameters
    ----------
    dryadURL : str
        Dryad download URL
    filename : str
        Filename (not including path).
    mimetype : str
        Mimetype of file.
    size : int
        Size in bytes.
    studyId : str
        Persistent Dataverse study identifier.
        Defaults to Transfer.dvpid.
    hashtype: str
        original Dryad hash type
    fprefix : str
        Path to file, not including a trailing slash.
    dryadUrl : str
        Dryad download URL if you want to include a Dryad file id.
    force_unlock : bool
        Attempt forcible unlock instead of waiting for tabular
        file processing.
        Defaults to False.
        The Dataverse `/locks` endpoint blocks POST and DELETE requests
        from non-superusers (undocumented as of 31 March 2021).
        **Forcible unlock requires a superuser API key.**
    '''
    #pylint: disable = consider-using-with, too-many-arguments, too-many-positional-arguments
    #pylint:disable=too-many-locals, too-many-branches, too-many-statements
    #Fix the arguments one day
    if not studyId:
        studyId = self.dvpid
    dest = self.kwargs['dv_url']
    fprefix = pathlib.Path(self.kwargs['tempfile_location']).expanduser().absolute()
    if dryadUrl:
        fid = dryadUrl.strip('/download')
        fid = int(fid[fid.rfind('/')+1:])
    else:
        fid = 0 #dummy fid for non-Dryad use
    params = {'persistentId' : studyId}
    upfile = pathlib.Path(fprefix, filename[:])
    badExt = filename[filename.rfind('.'):].lower()
    #Descriptions are technically possible, although how to add
    #them is buried in Dryad's API documentation
    dv4meta = {'label' : filename[:], 'description' : descr}
    #if mimetype == 'application/zip' or filename.lower().endswith('.zip'):
    if mimetype == 'application/zip' or badExt in self.kwargs.get('notab',[]):
        mimetype = 'application/octet-stream' # stop unzipping automatically
        filename += '.NOPROCESS' # Also screw with their naming convention
        #debug log about file names to see what is up with XSLX
        #see doi:10.5061/dryad.z8w9ghxb6
        LOGGER.debug('File renamed to %s for upload', filename)
    if size >= self.kwargs['max_upload']:
        fail = (fid, {'status' : 'Failure: MAX_UPLOAD size exceeded'})
        self.fileUpRecord.append(fail)
        LOGGER.warning('%s: File %s of '
                       'size %s exceeds '
                       'Dataverse MAX_UPLOAD size. Skipping.', self.doi, filename, size)
        return fail

    fields = {'file': (filename, open(upfile, 'rb'), mimetype)}
    fields.update({'jsonData': f'{dv4meta}'})
    multi = MultipartEncoder(fields=fields)
    ctype = {'Content-type' : multi.content_type}
    tmphead = self.auth.copy()
    tmphead.update(ctype)
    tmphead.update({'User-agent':USERAGENT})
    url = dest + '/api/datasets/:persistentId/add'
    upload = self.session.post(url, params=params,
                                   headers=tmphead,
                                   data=multi)
    try:
        upload.raise_for_status()

    except (requests.exceptions.HTTPError,
                requests.exceptions.ConnectionError):
        LOGGER.critical('Error %s: %s, upload.status_code, upload.reason')
        return (fid, {'status' : f'Failure: Reason - {upload.status_code}: {upload.reason}'})


    try:
        self.fileUpRecord.append((fid, upload.json()))
        upmd5 = upload.json()['data']['files'][0]['dataFile']['checksum']['value']
        #Dataverse hash type
        _type = upload.json()['data']['files'][0]['dataFile']['checksum']['type']
        if _type.lower() != hashtype.lower():
            comparator = self._check_md5(upfile, _type.lower())
        else:
            comparator = digest
        #if hashtype.lower () != 'md5':
        #    #get an md5 because dataverse uses md5s. Or most of them do anyway.
        #    #One day this will be rewritten properly.
        #    md5 = self._check_md5(filename, 'md5')
        #else:
        #    md5 = digest
        #if md5 and (upmd5 != md5):
        if upmd5 != comparator:
            try:
                raise exceptions.HashError(f'{_type} mismatch:\nlocal: '
                                           f'{comparator}\nuploaded: {upmd5}')
            except exceptions.HashError as e:
                LOGGER.exception(e)
                return (fid, {'status': e})
        #Make damn sure that the study isn't locked because of
        #tab file processing
        ##SPSS files still process despite spoofing MIME and extension
        ##so there's also a forcible unlock check

        #fid = upload.json()['data']['files'][0]['dataFile']['id']
        #fid not required for unlock
        #self.force_notab_unlock(studyId, dest, fid)
        if force_unlock:
            self.force_notab_unlock(studyId)
        else:
            count = 0
            wait = True
            while wait:
                wait = self.file_lock_check(studyId, count)
                if wait:
                    time.sleep(15) # Don't hit it too often
                count += 1


        return (fid, upload.json())

    except requests.exceptions.JSONDecodeError as e:
        LOGGER.warning('JSON error with upload')
        LOGGER.exception(e)
        return (fid, {'status' : f'Failure: Reason {upload.reason}'})

    #It can crash later
    except Exception as f_plus: #pylint: disable=broad-except
        LOGGER.exception(f_plus)
        return (fid, {'status' : f'Failure: Reason: {f_plus}'})

upload_files(files=None, pid=None, fprefix=None, force_unlock=False)

Uploads multiple files to study with persistentId pid. Returns a list of the original tuples plus JSON responses.

Parameters:
  • files (list, default: None ) –

    List contains tuples with (dryadDownloadURL, filename, mimetype, size).

  • pid (str, default: None ) –

    Defaults to self.dvpid, which is generated by calling dryad2dataverse.transfer.Transfer.upload_study().

  • force_unlock (bool, default: False ) –

    Attempt forcible unlock instead of waiting for tabular file processing. Defaults to False. The Dataverse /locks endpoint blocks POST and DELETE requests from non-superusers (undocumented as of 31 March 2021). Forcible unlock requires a superuser API key.

Source code in src/dryad2dataverse/transfer.py
def upload_files(self, files=None, pid=None, fprefix=None, force_unlock=False):
    '''
    Uploads multiple files to study with persistentId pid.
    Returns a list of the original tuples plus JSON responses.

    Parameters
    ----------
    files : list
        List contains tuples with
        (dryadDownloadURL, filename, mimetype, size).
    pid : str
        Defaults to self.dvpid, which is generated by calling
        dryad2dataverse.transfer.Transfer.upload_study().
    force_unlock : bool
        Attempt forcible unlock instead of waiting for tabular
        file processing.
        Defaults to False.
        The Dataverse `/locks` endpoint blocks POST and DELETE requests
        from non-superusers (undocumented as of 31 March 2021).
        **Forcible unlock requires a superuser API key.**
    '''
    if not files:
        files = self.files
    fprefix = pathlib.Path(self.kwargs['tempfile_location']).expanduser().absolute()
    out = []
    for f in files:
        #out.append(self.upload_file(f[0], f[1], f[2], f[3],
        #                             f[4], f[5], pid, fprefix=fprefix))
        #out.append(self.upload_file(*[x for x in f],
        #last item in files is not necessary
        out.append(self.upload_file(*list(f)[:-1],
                                    studyId=pid, fprefix=fprefix,
                                    force_unlock=force_unlock))
    return out

upload_json(studyId=None, dest=None)

Uploads Dryad json as a separate file for archival purposes.

Parameters:
  • studyId (str, default: None ) –

    Dataverse persistent identifier. Default dryad2dataverse.transfer.Transfer.dvpid, which is only generated on dryad2dataverse.transfer.Transfer.upload_study()

Source code in src/dryad2dataverse/transfer.py
def upload_json(self, studyId=None, dest=None):
    '''
    Uploads Dryad json as a separate file for archival purposes.

    Parameters
    ----------
    studyId : str
        Dataverse persistent identifier.
        Default dryad2dataverse.transfer.Transfer.dvpid,
        which is only generated on
        dryad2dataverse.transfer.Transfer.upload_study()
    '''
    if not studyId:
        studyId = self.dvpid
    dest = self.kwargs['dv_url']
    if not self.jsonFlag:
        url = dest + '/api/datasets/:persistentId/add'
        pack = io.StringIO(json.dumps(self.dryad.dryadJson))
        desc = {'description':'Original JSON from Dryad',
                'categories':['Documentation', 'Code']}
        fname = self.doi[self.doi.rfind('/')+1:].replace('.', '_')
        payload = {'file': (f'{fname}.json', pack, 'text/plain;charset=UTF-8'),
                   'jsonData':f'{desc}'}
        params = {'persistentId':studyId}
        try:
            meta = self.session.post(f'{url}',
                                     params=params,
                                     headers=self.auth,
                                     files=payload)
            #0 because no dryad fid will be zero
            meta.raise_for_status()
            self.fileUpRecord.append((0, meta.json()))
            self.jsonFlag = (0, meta.json())
            LOGGER.debug('Successfully uploaded Dryad JSON to %s', studyId)

        #JSON uploads randomly fail with a Dataverse server.log error of
        #"A system exception occurred during an invocation on EJB . . ."
        #Not reproducible, so errors will only be written to the log.
        #Jesus.
        except (requests.exceptions.HTTPError,
                requests.exceptions.ConnectionError) as err:
            LOGGER.error('Unable to upload Dryad JSON to %s', studyId)
            LOGGER.exception(err)
            #And further checking as to what is happening
            self.fileUpRecord.append((0, {'status':'Failure: Unable to upload Dryad JSON'}))
            if not isinstance(self.dryad.dryadJson, dict):
                LOGGER.error('Dryad JSON is not a dictionary')
        except Exception as err:
            LOGGER.error('Unable to upload Dryad JSON')
            LOGGER.exception(err)
            raise

upload_study(**kwargs)

Uploads Dryad study metadata to target Dataverse or updates existing. Supplying a targetDv kwarg creates a new study and supplying a dvpid kwarg updates a currently existing Dataverse study.

**kwargs : dict Normally this is one of the two parameters below

  • targetDv (str) –

    Short name of target dataverse. Required if new dataset. Specify as targetDV=value.

  • dvpid (str) –

    Dataverse persistent ID (for updating metadata). This is not required for new uploads, specify as dvpid=value

Notes

One of targetDv or dvpid is required.

Source code in src/dryad2dataverse/transfer.py
def upload_study(self, **kwargs):
    '''
    Uploads Dryad study metadata to target Dataverse or updates existing.
    Supplying a `targetDv` kwarg creates a new study and supplying a
    `dvpid` kwarg updates a currently existing Dataverse study.

    **kwargs : dict
        Normally this is one of the two parameters below

    Other parameters
    ----------------
    targetDv : str
        Short name of target dataverse. Required if new dataset.
        Specify as targetDV=value.
    dvpid : str
        Dataverse persistent ID (for updating metadata).
        This is not required for new uploads, specify as dvpid=value

    Notes
    -----
    One of targetDv or dvpid is required.
    '''
    headers = {'X-Dataverse-key': self.kwargs['api_key'], 'User-agent': USERAGENT}
    targetDv = kwargs.get('targetDv')
    dvpid = kwargs.get('dvpid')
    #dryFid = kwargs.get('dryFid') #Why did I put this here?
    if not targetDv and not dvpid:
        try:
            raise exceptions.NoTargetError('You must supply one of targetDv \
                                    (target dataverse) \
                                     or dvpid (Dataverse persistent ID)')
        except exceptions.NoTargetError as err:
            LOGGER.exception(err)
            raise

    if targetDv and dvpid:
        msg = 'Supply only one of targetDv or dvpid'
        LOGGER.exception(msg)
        raise exceptions.Dryad2DataverseError(msg)

    if not dvpid:
        endpoint = f'{self.kwargs["dv_url"]}/api/dataverses/{targetDv}/datasets'
        upload = self.session.post(endpoint,
                                   headers=headers,
                                   json=self.dryad.dvJson)
        LOGGER.debug(upload.text)
    else:
        endpoint = f'{self.kwargs["dv_url"]}/api/datasets/:persistentId/versions/:draft'
        params = {'persistentId':dvpid}
        #Yes, dataverse uses *different* json for edits
        upload = self.session.put(endpoint, params=params,
                                  headers=headers,
                                  json=self.dryad.dvJson['datasetVersion'])
        #self._dvrecord = upload.json()
        LOGGER.debug(upload.text)

    try:
        updata = upload.json()
        self.dvStudy = updata
        if updata.get('status') != 'OK':
            msg = ('Status return is not OK.'
                    f'{upload.status_code}: '
                    f'{upload.reason}. '
                    f'{upload.request.url} '
                    f'{upload.text}')
            try:
                raise exceptions.DataverseUploadError(msg)
            except exceptions.DataverseUploadError as err:
                LOGGER.exception(err)
                LOGGER.exception(traceback.format_exc())
        upload.raise_for_status()
    except Exception as e: # Only accessible via non-requests exception
        LOGGER.exception(e)
        LOGGER.exception(traceback.format_exc())
        raise

    if targetDv:
        self.dryad.dvpid = updata['data'].get('persistentId')
    if dvpid:
        self.dryad.dvpid = updata['data'].get('datasetPersistentId')
    return self.dvpid

dryad2dataverse.monitor

Dryad/Dataverse status tracker. Monitor creates a singleton object which writes to a SQLite database. Methods will (generally) take either a dryad2dataverse.serializer.Serializer instance or dryad2dataverse.transfer.Transfer instance

The monitor’s primary function is to allow for state checking for Dryad studies so that files and studies aren’t downloaded unneccessarily.

Monitor

The Monitor object is a tracker and database updater, so that Dryad files can be monitored and updated over time. Monitor is a singleton, but is not thread-safe.

Source code in src/dryad2dataverse/monitor.py
class Monitor():
    '''
    The Monitor object is a tracker and database updater, so that
    Dryad files can be monitored and updated over time. Monitor is a singleton,
    but is not thread-safe.
    '''
    def __new__(cls, *args, **kwargs):
        '''
        Creates a new singleton instance of Monitor.

        Parameters
        ----------
        *args
        **kwargs
        '''
        if not hasattr(cls, 'inst'):
            cls.inst = super().__new__(cls)
            #This ensures only the first set of kwargs (on instantiation)
            #are used.
            cls.init = 0
            cls.kwargs = kwargs
            if not cls.kwargs.get('dbase'):
                try:
                    cls.kwargs['dbase'] = args[0]
                except ValueError as e:
                    raise KeyError from e
            cls.conn = sqlite3.connect(pathlib.Path(cls.kwargs['dbase']).expanduser().absolute())
            cls.cursor = cls.conn.cursor()
            LOGGER.info('Open database %s', cls.kwargs['dbase'])
        return cls.inst

    def __init__(self, *args, **kwargs):
        '''
        Initialize singleton instance of Monitor

        Parameters
        ----------
        *args
            Positional arguments. Only the first is used
        **kwargs
            Keyword arguments. Only dbase is used, and it overwrites args[0] if present

        Notes
        -----
        Normally you would just pass a dryad2dataverse.config.Config object,
        ie. Monitor(**config)

        These keyword parameters are required at a minimum, and are included as part of a
        Config instance.
        dbase : str
            Path to dryad2dataverse monitor database
        dry_url : str
            Dryad base URL
        '''
        #pylint: disable=unused-argument
        #arguments are parsed in __new__ to make a singleton
        #but they need to be passed in __init__
        if not self.init:

            conn = sqlite3.connect(pathlib.Path(self.kwargs['dbase']).expanduser().absolute())
            cursor = conn.cursor()
            create = ['CREATE TABLE IF NOT EXISTS dryadStudy \
                       (uid INTEGER PRIMARY KEY AUTOINCREMENT, \
                       doi TEXT, lastmoddate TEXT, dryadjson TEXT, \
                       dvjson TEXT);',
                       'CREATE TABLE IF NOT EXISTS dryadFiles \
                       (dryaduid INTEGER REFERENCES dryadStudy (uid), \
                       dryfilesjson TEXT);',
                       'CREATE TABLE IF NOT EXISTS dvStudy \
                       (dryaduid INTEGER references dryadStudy (uid), \
                       dvpid TEXT);',
                       'CREATE TABLE IF NOT EXISTS dvFiles \
                       (dryaduid INTEGER references dryadStudy (uid), \
                       dryfid INT, \
                       drymd5 TEXT, dvfid TEXT, dvmd5 TEXT, \
                       dvfilejson TEXT);',
                       'CREATE TABLE IF NOT EXISTS lastcheck \
                       (checkdate TEXT);',
                       'CREATE TABLE IF NOT EXISTS failed_uploads \
                       (dryaduid INTEGER references dryadstudy (uid), \
                       dryfid INT, status TEXT);'
                      ]

            for line in create:
                cursor.execute(line)
            conn.commit()
            conn.close()
        self.init = 1

    def __del__(self):
        '''
        Commits all database transactions on object deletion and closes database.
        '''
        self.conn.commit()
        self.conn.close()

    @property
    def lastmod(self):
        '''
        Returns last modification date from monitor.dbase.
        '''
        self.cursor.execute('SELECT checkdate FROM lastcheck ORDER BY rowid DESC;')
        last_mod = self.cursor.fetchall()
        if last_mod:
            return last_mod[0][0]
        return None

    def status(self, serial)->dict:
        '''
        Returns a dictionary with keys 'status' and 'dvpid' and 'notes'.

        Parameters
        ----------
        serial :  dryad2dataverse.serializer.Serializer

        Returns
        -------
        `{status :'updated', 'dvpid':'doi://some/ident'}`.

        Notes
        ------
        `status` is one of 'new', 'identical',  'lastmodsame',
        'updated'

        'new' is a completely new file.

        'identical' The metadata from Dryad is *identical* to the last time
        the check was run.

        'lastmodsame' Dryad lastModificationDate ==  last modification date
        in database AND output JSON is different.
        This can indicate a Dryad
        API output change, reindexing or something else.
        But the lastModificationDate
        is supposed to be an indicator of meaningful change, so this option
        exists so you can decide what to do given this option

        'updated' Indicates changes to lastModificationDate

        Note that Dryad constantly changes their API output, so the changes
        may not actually be meaningful.

        `dvpid` is a Dataverse persistent identifier.
        `None` in the case of status='new'

        `notes`: value of Dryad versionChanges field. One of `files_changed` or
        `metatdata_changed`. Non-null value present only when status is
        not `new` or `identical`. Note that Dryad has no way to indicate *both*
        a file and metadata change, so this value reflects only the *last* change
        in the Dryad state.
        '''
        # Last mod date is indicator of change.
        # From email w/Ryan Scherle 10 Nov 2020
        #The versionNumber updates for either a metadata change or a
        #file change. Although we save all of these changes internally, our web
        #interface only displays the versions that have file changes, along
        #with the most recent metadata. So a dataset that has only two versions
        #of files listed on the web may actually have several more versions in
        #the API.
        #
        #If your only need is to track when there are changes to a
        #dataset, you may want to use the `lastModificationDate`, which we have
        #recently added to our metadata.
        #
        #Note that the Dryad API output ISN'T STABLE; they add fields etc.
        #This means that a comparison of JSON may yield differences even though
        #metadata is technically "the same". Just comparing two dicts doesn't cut
        #it.
        #############################
        ## Note: by inspection, Dryad outputs JSON that is different
        ## EVEN IF lastModificationDate is unchanged. (14 January 2022)
        ## So now what?
        #############################
        doi = serial.dryadJson['identifier']
        self.cursor.execute('SELECT * FROM dryadStudy WHERE doi = ?',
                            (doi,))
        result = self.cursor.fetchall()

        if not result:
            return {'status': 'new', 'dvpid': None, 'notes': ''}
        # dvjson = json.loads(result[-1][4])
        # Check the fresh vs. updated jsons for the keys
        try:
            dryaduid = result[-1][0]
            self.cursor.execute('SELECT dvpid from dvStudy WHERE \
                                 dryaduid = ?', (dryaduid,))
            dvpid = self.cursor.fetchall()[-1][0]
            serial.dvpid = dvpid
        except TypeError as exc:
            LOGGER.error('Dryad DOI : %s. Error finding Dataverse PID', doi)
            LOGGER.exception(exc)
            raise exceptions.DatabaseError from exc

        newfile = copy.deepcopy(serial.dryadJson)
        testfile = copy.deepcopy(json.loads(result[-1][3]))
        if newfile == testfile:
            return {'status': 'identical', 'dvpid': dvpid, 'notes': ''}
        if newfile['lastModificationDate'] != testfile['lastModificationDate']:
            return {'status': 'updated', 'dvpid': dvpid,
                    'notes': newfile['versionChanges']}
        return {'status': 'lastmodsame', 'dvpid': dvpid,
                     'notes': newfile.get('versionChanges')}

    def diff_metadata(self, serial):
        '''
        Analyzes differences in metadata between current serializer
        instance and last updated serializer instance.

        Parameters
        ----------
        serial : dryad2dataverse.serializer.Serializer

        Returns
        -------
        Returns a list of field changes consisting of:
        [{key: (old_value, new_value}] or None if no changes.

        Notes
        -----
        For example:
        ```
        [{'title':
        ('Cascading effects of algal warming in a freshwater community',
         'Cascading effects of algal warming in a freshwater community theatre')}
        ]
        ```
        '''
        if self.status(serial)['status'] == 'updated':
            self.cursor.execute('SELECT dryadjson from dryadStudy \
                                 WHERE doi = ?',
                                (serial.dryadJson['identifier'],))
            oldJson = json.loads(self.cursor.fetchall()[-1][0])
            out = []
            for k in serial.dryadJson:
                if serial.dryadJson[k] != oldJson.get(k):
                    out.append({k: (oldJson.get(k), serial.dryadJson[k])})
            return out

        return None

    @staticmethod
    def __added_hashes(oldFiles, newFiles):
        '''
        Checks that two objects in dryad2dataverse.serializer.files format
        stripped of digestType and digest values are identical. Returns array
        of files with changed hash.

        Assumes name, mimeType, size, descr all unchanged, which is not
        necessarily a valid assumption

        Parameters
        ----------
        oldFiles : Union[list, tuple]
            (name, mimeType, size, descr, digestType, digest)

        newFiles : Union[list, tuple]
            (name, mimeType, size, descr, digestType, digest)
        '''
        hash_change = []
        old = [x[1:-2] for x in oldFiles]
        #URLs are not permanent
        old_no_url = [x[1:] for x in oldFiles]
        for fi in newFiles:
            if fi[1:-2] in old and fi[1:] not in old_no_url:
                hash_change.append(fi)
        return hash_change


    def diff_files(self, serial):
        '''
        Returns a dict with additions and deletions from previous Dryad
        to dataverse upload.

        Because checksums are not necessarily included in Dryad file
        metadata, this method uses dryad file IDs, size, or
        whatever is available.

        If dryad2dataverse.monitor.Monitor.status()
        indicates a change it will produce dictionary output with a list
        of additions, deletions or hash changes (ie, identical
        except for hash changes), as below:

        `{'add':[dyadfiletuples], 'delete:[dryadfiletuples],
          'hash_change': [dryadfiletuples]}`

        Parameters
        ----------
        serial : dryad2dataverse.serializer.Serializer
        '''
        #pylint: disable=too-many-locals

        diffReport = {}
        if self.status(serial)['status'] == 'new':
            #do we want to show what needs to be added?
            return {'add': serial.files}
            #return {}
        self.cursor.execute('SELECT uid from dryadStudy WHERE doi = ?',
                            (serial.doi,))
        mostRecent = self.cursor.fetchall()[-1][0]
        self.cursor.execute('SELECT dryfilesjson from dryadFiles WHERE \
                             dryaduid = ?', (mostRecent,))
        oldFileList = self.cursor.fetchall()[-1][0]
        if not oldFileList:
            oldFileList = []
        else:
            out = []
            #With Dryad API change, files are paginated
            #now stored as list
            for old in json.loads(oldFileList):
            #for old in oldFileList:
                oldFiles = old['_embedded'].get('stash:files')
                # comparing file tuples from dryad2dataverse.serializer.
                # Maybe JSON is better?
                # because of code duplication below.
                for f in oldFiles:
                    #Download links are not persistent. Be warned
                    try:
                        downLink = f['_links']['stash:file-download']['href']
                    except KeyError:
                        downLink = f['_links']['stash:download']['href']
                    downLink = f'{self.kwargs.get("dry_url", "https://datadryad.org")}{downLink}'
                    name = f['path']
                    mimeType = f['mimeType']
                    size = f['size']
                    descr = f.get('description', '')
                    digestType = f.get('digestType', '')
                    digest = f.get('digest', '')
                    out.append((downLink, name, mimeType, size, descr, digestType, digest))
                oldFiles = out
        newFiles = serial.files[:]
        # Tests go here
        #Check for identity first
        #if returned here there are definitely no changes
        if (set(oldFiles).issuperset(set(newFiles)) and
                set(newFiles).issuperset(oldFiles)):
            return diffReport
        #filenames for checking hash changes.
        #Can't use URL or hashes for comparisons because they can change
        #without warning, despite the fact that the API says that
        #file IDs are unique. They aren't. Verified by Ryan Scherle at
        #Dryad December 2021
        old_map = {x:{'orig':y, 'no_hash':y[1:4]} for x,y in enumerate(oldFiles)}
        new_map = {x:{'orig':y, 'no_hash':y[1:4]} for x,y in enumerate(newFiles)}
        old_no_hash = [old_map[x]['no_hash'] for x in old_map]
        new_no_hash = [new_map[x]['no_hash'] for x in new_map]

        #check for added hash only
        hash_change = Monitor.__added_hashes(oldFiles, newFiles)

        must = set(old_no_hash).issuperset(set(new_no_hash))
        if not must:
            needsadd = set(new_no_hash) - (set(old_no_hash) & set(new_no_hash))
            #Use the map created above to return the full file info
            diffReport.update({'add': [new_map[new_no_hash.index(x)]['orig']
                                       for x in needsadd]})
        must = set(new_no_hash).issuperset(old_no_hash)
        if not must:
            needsdel = set(old_no_hash) - (set(new_no_hash) & set(old_no_hash))
            diffReport.update({'delete' : [old_map[old_no_hash.index(x)]['orig']
                                           for x in needsdel]})
        if hash_change:
            diffReport.update({'hash_change': hash_change})
        return diffReport

    def get_dv_fid(self, url):
        '''
        Returns str — the Dataverse file ID from parsing a Dryad
        file download link.  Normally used for determining dataverse
        file ids for *deletion* in case of dryad file changes.

        Parameters
        ----------
        url : str
            *Dryad* file URL in form of
            'https://datadryad.org/api/v2/files/385819/download'.
        '''
        fid = url[url.rfind('/', 0, -10)+1:].strip('/download')
        try:
            fid = int(fid)
        except ValueError as e:
            LOGGER.error('File ID %s is not an integer', fid)
            LOGGER.exception(e)
            raise

        #File IDs are *CHANGEABLE* according to Dryad, Dec 2021
        #SQLite default returns are by ROWID ASC, so the last record
        #returned should still be the correct, ie. most recent, one.
        #However, just in case, this is now done explicitly.
        self.cursor.execute('SELECT dvfid, ROWID FROM dvFiles WHERE \
                             dryfid = ? ORDER BY ROWID ASC;', (fid,))
        dvfid = self.cursor.fetchall()
        if dvfid:
            return dvfid[-1][0]
        return None

    def get_dv_fids(self, filelist):
        '''
        Returns Dataverse file IDs from a list of Dryad file tuples.
        Generally, you would use the output from
        dryad2dataverse.monitor.Monitor.diff_files['delete']
        to discover Dataverse file ids for deletion.

        Parameters
        ----------
        filelist : list
            List of Dryad file tuples: eg:

            ```
            [('https://datadryad.org/api/v2/files/385819/download',
              'GCB_ACG_Mortality_2020.zip',
              'application/x-zip-compressed', 23787587),
             ('https://datadryad.org/api/v2/files/385820/download',
             'Readme_ACG_Mortality.txt',
             'text/plain', 1350)]
             ```
        '''
        fids = []
        for f in filelist:
            fids.append(self.get_dv_fid(f[0]))
        return fids
        # return [self.get_dv_fid(f[0]) for f in filelist]

    def get_json_dvfids(self, serial)->list:
        '''
        Return a list of Dataverse file ids for Dryad JSONs which were
        uploaded to Dataverse.
        Normally used to discover the file IDs to remove Dryad JSONs
        which have changed.

        Parameters
        ----------
        serial : dryad2dataverse.serializer.Serializer

        Returns
        -------
        list
        '''
        self.cursor.execute('SELECT max(uid) FROM dryadStudy WHERE doi=?',
                            (serial.doi,))
        try:
            uid = self.cursor.fetchone()[0]
            self.cursor.execute('SELECT dvfid FROM dvFiles WHERE \
                                 dryaduid = ? AND dryfid=?', (uid, 0))
            jsonfid = [f[0] for f in self.cursor.fetchall()]
            return jsonfid

        except TypeError:
            return []

    def update(self, transfer):
        '''
        Updates the Monitor database with information from a
        dryad2dataverse.transfer.Transfer instance.

        If a Dryad primary metadata record has changes, it will be
        deleted from the database.

        This method should be called after all transfers are completed,
        including Dryad JSON updates, as the last action for transfer.

        Parameters
        ----------
        transfer : dryad2dataverse.transfer.Transfer
        '''
        #pylint: disable=too-many-branches, too-many-statements, too-many-locals
        # get the pre-update dryad uid in case we need it.
        self.cursor.execute('SELECT max(uid) FROM dryadStudy WHERE doi = ?',
                            (transfer.dryad.dryadJson['identifier'],))
        olduid = self.cursor.fetchone()[0]
        if olduid:
            olduid = int(olduid)
        if self.status(transfer.dryad)['status'] != 'unchanged':
            doi = transfer.doi
            lastmod = transfer.dryad.dryadJson.get('lastModificationDate')
            dryadJson = json.dumps(transfer.dryad.dryadJson)
            dvJson = json.dumps(transfer.dvStudy)

            # Update study metadata
            self.cursor.execute('INSERT INTO dryadStudy \
                                 (doi, lastmoddate, dryadjson, dvjson) \
                                 VALUES (?, ?, ?, ?)',
                                (doi, lastmod, dryadJson, dvJson))
            self.cursor.execute('SELECT max(uid) FROM dryadStudy WHERE \
                                 doi = ?', (doi,))
            dryaduid = self.cursor.fetchone()[0]
            #if type(dryaduid) != int:
            if not isinstance(dryaduid, int):
                try:
                    raise TypeError('Dryad UID is not an integer')
                except TypeError as e:
                    LOGGER.error(e)
                    raise

            # Update dryad file json
            self.cursor.execute('INSERT INTO dryadFiles VALUES (?, ?)',
                                (dryaduid,
                                 json.dumps(transfer.dryad.fileJson)))
            # Update dataverse study map
            self.cursor.execute('SELECT dvpid FROM dvStudy WHERE \
                                 dvpid = ?', (transfer.dryad.dvpid,))
            if not self.cursor.fetchone():
                self.cursor.execute('INSERT INTO dvStudy VALUES (?, ?)',
                                    (dryaduid, transfer.dryad.dvpid))
            else:
                self.cursor.execute('UPDATE dvStudy SET dryaduid=?, \
                                     dvpid=? WHERE dvpid =?',
                                    (dryaduid, transfer.dryad.dvpid,
                                     transfer.dryad.dvpid))

            # Update the files table
            # Because we want to have a *complete* file list for each
            # dryaduid, we have to copy any existing old files,
            # then add and delete.
            if olduid:
                self.cursor.execute('SELECT * FROM dvFiles WHERE \
                                     dryaduid=?', (olduid,))
                inserter = self.cursor.fetchall()
                for rec in inserter:
                    # TODONE FIX THIS #I think it's fixed 11 Feb 21
                    self.cursor.execute('INSERT INTO dvFiles VALUES \
                                         (?, ?, ?, ?, ?, ?)',
                                        (dryaduid, rec[1], rec[2],
                                         rec[3], rec[4], rec[5]))
            # insert newly uploaded files
            for rec in transfer.fileUpRecord:
                try:
                    dvfid = rec[1]['data']['files'][0]['dataFile']['id']
                    # Screw you for burying the file ID this deep
                    recMd5 = rec[1]['data']['files'][0]['dataFile']['checksum']['value']
                except (KeyError, IndexError) as err:
                    #write to failed uploads table instead
                    status = rec[1].get('status')
                    if not status:
                        LOGGER.error('JSON read error for Dryad file ID %s', rec[0])
                        LOGGER.error('File %s for DOI %s may not be uploaded', rec[0], transfer.doi)
                        LOGGER.exception(err)
                        msg = {'status': 'Failure: Other non-specific '
                                         'failure. Check logs'}

                        self.cursor.execute('INSERT INTO failed_uploads VALUES \
                                        (?, ?, ?);', (dryaduid, rec[0], json.dumps(msg)))
                        continue
                    self.cursor.execute('INSERT INTO failed_uploads VALUES \
                                        (?, ?, ?);', (dryaduid, rec[0], json.dumps(rec[1])))
                    LOGGER.warning(type(err))
                    LOGGER.warning('%s. DOI %s, File ID %s',
                                   rec[1].get('status'),
                                   transfer.doi, rec[0])
                    continue
                # md5s verified during upload step, so they should
                # match already
                self.cursor.execute('INSERT INTO dvFiles VALUES \
                                     (?, ?, ?, ?, ?, ?)',
                                    (dryaduid, rec[0], recMd5,
                                     dvfid, recMd5, json.dumps(rec[1])))

            # Now the deleted files
            for rec in transfer.fileDelRecord:
                # fileDelRecord consists only of [fid,fid2, ...]
                # Dryad record ID is int not str
                self.cursor.execute('DELETE FROM dvFiles WHERE dvfid=? \
                                     AND dryaduid=?',
                                    (int(rec), dryaduid))
                LOGGER.debug('deleted dryfid = %s, dryaduid = %s', rec, dryaduid)

            # And lastly, any JSON metadata updates:
            # NOW WHAT?
            # JSON has dryfid==0
            self.cursor.execute('SELECT * FROM dvfiles WHERE \
                                 dryfid=? and dryaduid=?',
                                (0, dryaduid))
            try:
                exists = self.cursor.fetchone()[0]
                # Old metadata must be deleted on a change.
                if exists:
                    shouldDel = self.status(transfer.dryad)['status']
                    if shouldDel == 'updated':
                        self.cursor.execute('DELETE FROM dvfiles WHERE \
                                             dryfid=? and dryaduid=?',
                                            (0, dryaduid))
            except TypeError:
                pass

            if transfer.jsonFlag:
                # update dryad JSON
                djson5 = transfer.jsonFlag[1]['data']['files'][0]['dataFile']['checksum']['value']
                dfid = transfer.jsonFlag[1]['data']['files'][0]['dataFile']['id']
                self.cursor.execute('INSERT INTO dvfiles VALUES \
                                     (?, ?, ?, ?, ?, ?)',
                                    (dryaduid, 0, djson5, dfid,
                                     djson5, json.dumps(transfer.jsonFlag[1])))

        self.conn.commit()

    def set_timestamp(self, curdate=None):
        '''
        Adds current time to the database table. Can be queried and be used
        for subsequent checking for updates. To query last modification time,
        use the dataverse2dryad.monitor.Monitor.lastmod attribute.

        Parameters
        ----------
        curdate : str
            UTC datetime string in the format suitable for the Dryad API.
            eg. 2021-01-21T21:42:40Z
               or .strftime('%Y-%m-%dT%H:%M:%SZ').
        '''
        #Dryad API uses Zulu time
        if not curdate:
            curdate = datetime.datetime.now(datetime.timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')
        self.cursor.execute('INSERT INTO lastcheck VALUES (?)',
                            (curdate,))
        self.conn.commit()

lastmod property

Returns last modification date from monitor.dbase.

__del__()

Commits all database transactions on object deletion and closes database.

Source code in src/dryad2dataverse/monitor.py
def __del__(self):
    '''
    Commits all database transactions on object deletion and closes database.
    '''
    self.conn.commit()
    self.conn.close()

__init__(*args, **kwargs)

Initialize singleton instance of Monitor

Parameters:
  • *args

    Positional arguments. Only the first is used

  • **kwargs

    Keyword arguments. Only dbase is used, and it overwrites args[0] if present

Notes

Normally you would just pass a dryad2dataverse.config.Config object, ie. Monitor(**config)

These keyword parameters are required at a minimum, and are included as part of a Config instance. dbase : str Path to dryad2dataverse monitor database dry_url : str Dryad base URL

Source code in src/dryad2dataverse/monitor.py
def __init__(self, *args, **kwargs):
    '''
    Initialize singleton instance of Monitor

    Parameters
    ----------
    *args
        Positional arguments. Only the first is used
    **kwargs
        Keyword arguments. Only dbase is used, and it overwrites args[0] if present

    Notes
    -----
    Normally you would just pass a dryad2dataverse.config.Config object,
    ie. Monitor(**config)

    These keyword parameters are required at a minimum, and are included as part of a
    Config instance.
    dbase : str
        Path to dryad2dataverse monitor database
    dry_url : str
        Dryad base URL
    '''
    #pylint: disable=unused-argument
    #arguments are parsed in __new__ to make a singleton
    #but they need to be passed in __init__
    if not self.init:

        conn = sqlite3.connect(pathlib.Path(self.kwargs['dbase']).expanduser().absolute())
        cursor = conn.cursor()
        create = ['CREATE TABLE IF NOT EXISTS dryadStudy \
                   (uid INTEGER PRIMARY KEY AUTOINCREMENT, \
                   doi TEXT, lastmoddate TEXT, dryadjson TEXT, \
                   dvjson TEXT);',
                   'CREATE TABLE IF NOT EXISTS dryadFiles \
                   (dryaduid INTEGER REFERENCES dryadStudy (uid), \
                   dryfilesjson TEXT);',
                   'CREATE TABLE IF NOT EXISTS dvStudy \
                   (dryaduid INTEGER references dryadStudy (uid), \
                   dvpid TEXT);',
                   'CREATE TABLE IF NOT EXISTS dvFiles \
                   (dryaduid INTEGER references dryadStudy (uid), \
                   dryfid INT, \
                   drymd5 TEXT, dvfid TEXT, dvmd5 TEXT, \
                   dvfilejson TEXT);',
                   'CREATE TABLE IF NOT EXISTS lastcheck \
                   (checkdate TEXT);',
                   'CREATE TABLE IF NOT EXISTS failed_uploads \
                   (dryaduid INTEGER references dryadstudy (uid), \
                   dryfid INT, status TEXT);'
                  ]

        for line in create:
            cursor.execute(line)
        conn.commit()
        conn.close()
    self.init = 1

__new__(*args, **kwargs)

Creates a new singleton instance of Monitor.

Parameters:
  • *args
  • **kwargs
Source code in src/dryad2dataverse/monitor.py
def __new__(cls, *args, **kwargs):
    '''
    Creates a new singleton instance of Monitor.

    Parameters
    ----------
    *args
    **kwargs
    '''
    if not hasattr(cls, 'inst'):
        cls.inst = super().__new__(cls)
        #This ensures only the first set of kwargs (on instantiation)
        #are used.
        cls.init = 0
        cls.kwargs = kwargs
        if not cls.kwargs.get('dbase'):
            try:
                cls.kwargs['dbase'] = args[0]
            except ValueError as e:
                raise KeyError from e
        cls.conn = sqlite3.connect(pathlib.Path(cls.kwargs['dbase']).expanduser().absolute())
        cls.cursor = cls.conn.cursor()
        LOGGER.info('Open database %s', cls.kwargs['dbase'])
    return cls.inst

diff_files(serial)

Returns a dict with additions and deletions from previous Dryad to dataverse upload.

Because checksums are not necessarily included in Dryad file metadata, this method uses dryad file IDs, size, or whatever is available.

If dryad2dataverse.monitor.Monitor.status() indicates a change it will produce dictionary output with a list of additions, deletions or hash changes (ie, identical except for hash changes), as below:

{'add':[dyadfiletuples], 'delete:[dryadfiletuples], 'hash_change': [dryadfiletuples]}

Parameters:
Source code in src/dryad2dataverse/monitor.py
def diff_files(self, serial):
    '''
    Returns a dict with additions and deletions from previous Dryad
    to dataverse upload.

    Because checksums are not necessarily included in Dryad file
    metadata, this method uses dryad file IDs, size, or
    whatever is available.

    If dryad2dataverse.monitor.Monitor.status()
    indicates a change it will produce dictionary output with a list
    of additions, deletions or hash changes (ie, identical
    except for hash changes), as below:

    `{'add':[dyadfiletuples], 'delete:[dryadfiletuples],
      'hash_change': [dryadfiletuples]}`

    Parameters
    ----------
    serial : dryad2dataverse.serializer.Serializer
    '''
    #pylint: disable=too-many-locals

    diffReport = {}
    if self.status(serial)['status'] == 'new':
        #do we want to show what needs to be added?
        return {'add': serial.files}
        #return {}
    self.cursor.execute('SELECT uid from dryadStudy WHERE doi = ?',
                        (serial.doi,))
    mostRecent = self.cursor.fetchall()[-1][0]
    self.cursor.execute('SELECT dryfilesjson from dryadFiles WHERE \
                         dryaduid = ?', (mostRecent,))
    oldFileList = self.cursor.fetchall()[-1][0]
    if not oldFileList:
        oldFileList = []
    else:
        out = []
        #With Dryad API change, files are paginated
        #now stored as list
        for old in json.loads(oldFileList):
        #for old in oldFileList:
            oldFiles = old['_embedded'].get('stash:files')
            # comparing file tuples from dryad2dataverse.serializer.
            # Maybe JSON is better?
            # because of code duplication below.
            for f in oldFiles:
                #Download links are not persistent. Be warned
                try:
                    downLink = f['_links']['stash:file-download']['href']
                except KeyError:
                    downLink = f['_links']['stash:download']['href']
                downLink = f'{self.kwargs.get("dry_url", "https://datadryad.org")}{downLink}'
                name = f['path']
                mimeType = f['mimeType']
                size = f['size']
                descr = f.get('description', '')
                digestType = f.get('digestType', '')
                digest = f.get('digest', '')
                out.append((downLink, name, mimeType, size, descr, digestType, digest))
            oldFiles = out
    newFiles = serial.files[:]
    # Tests go here
    #Check for identity first
    #if returned here there are definitely no changes
    if (set(oldFiles).issuperset(set(newFiles)) and
            set(newFiles).issuperset(oldFiles)):
        return diffReport
    #filenames for checking hash changes.
    #Can't use URL or hashes for comparisons because they can change
    #without warning, despite the fact that the API says that
    #file IDs are unique. They aren't. Verified by Ryan Scherle at
    #Dryad December 2021
    old_map = {x:{'orig':y, 'no_hash':y[1:4]} for x,y in enumerate(oldFiles)}
    new_map = {x:{'orig':y, 'no_hash':y[1:4]} for x,y in enumerate(newFiles)}
    old_no_hash = [old_map[x]['no_hash'] for x in old_map]
    new_no_hash = [new_map[x]['no_hash'] for x in new_map]

    #check for added hash only
    hash_change = Monitor.__added_hashes(oldFiles, newFiles)

    must = set(old_no_hash).issuperset(set(new_no_hash))
    if not must:
        needsadd = set(new_no_hash) - (set(old_no_hash) & set(new_no_hash))
        #Use the map created above to return the full file info
        diffReport.update({'add': [new_map[new_no_hash.index(x)]['orig']
                                   for x in needsadd]})
    must = set(new_no_hash).issuperset(old_no_hash)
    if not must:
        needsdel = set(old_no_hash) - (set(new_no_hash) & set(old_no_hash))
        diffReport.update({'delete' : [old_map[old_no_hash.index(x)]['orig']
                                       for x in needsdel]})
    if hash_change:
        diffReport.update({'hash_change': hash_change})
    return diffReport

diff_metadata(serial)

Analyzes differences in metadata between current serializer instance and last updated serializer instance.

Parameters:
Returns:
  • Returns a list of field changes consisting of:
  • [{key: (old_value, new_value}] or None if no changes.
Notes

For example:

[{'title':
('Cascading effects of algal warming in a freshwater community',
 'Cascading effects of algal warming in a freshwater community theatre')}
]
Source code in src/dryad2dataverse/monitor.py
def diff_metadata(self, serial):
    '''
    Analyzes differences in metadata between current serializer
    instance and last updated serializer instance.

    Parameters
    ----------
    serial : dryad2dataverse.serializer.Serializer

    Returns
    -------
    Returns a list of field changes consisting of:
    [{key: (old_value, new_value}] or None if no changes.

    Notes
    -----
    For example:
    ```
    [{'title':
    ('Cascading effects of algal warming in a freshwater community',
     'Cascading effects of algal warming in a freshwater community theatre')}
    ]
    ```
    '''
    if self.status(serial)['status'] == 'updated':
        self.cursor.execute('SELECT dryadjson from dryadStudy \
                             WHERE doi = ?',
                            (serial.dryadJson['identifier'],))
        oldJson = json.loads(self.cursor.fetchall()[-1][0])
        out = []
        for k in serial.dryadJson:
            if serial.dryadJson[k] != oldJson.get(k):
                out.append({k: (oldJson.get(k), serial.dryadJson[k])})
        return out

    return None

get_dv_fid(url)

Returns str — the Dataverse file ID from parsing a Dryad file download link. Normally used for determining dataverse file ids for deletion in case of dryad file changes.

Parameters:
  • url (str) –

    Dryad file URL in form of ‘https://datadryad.org/api/v2/files/385819/download’.

Source code in src/dryad2dataverse/monitor.py
def get_dv_fid(self, url):
    '''
    Returns str — the Dataverse file ID from parsing a Dryad
    file download link.  Normally used for determining dataverse
    file ids for *deletion* in case of dryad file changes.

    Parameters
    ----------
    url : str
        *Dryad* file URL in form of
        'https://datadryad.org/api/v2/files/385819/download'.
    '''
    fid = url[url.rfind('/', 0, -10)+1:].strip('/download')
    try:
        fid = int(fid)
    except ValueError as e:
        LOGGER.error('File ID %s is not an integer', fid)
        LOGGER.exception(e)
        raise

    #File IDs are *CHANGEABLE* according to Dryad, Dec 2021
    #SQLite default returns are by ROWID ASC, so the last record
    #returned should still be the correct, ie. most recent, one.
    #However, just in case, this is now done explicitly.
    self.cursor.execute('SELECT dvfid, ROWID FROM dvFiles WHERE \
                         dryfid = ? ORDER BY ROWID ASC;', (fid,))
    dvfid = self.cursor.fetchall()
    if dvfid:
        return dvfid[-1][0]
    return None

get_dv_fids(filelist)

Returns Dataverse file IDs from a list of Dryad file tuples. Generally, you would use the output from dryad2dataverse.monitor.Monitor.diff_files[‘delete’] to discover Dataverse file ids for deletion.

Parameters:
  • filelist (list) –

    List of Dryad file tuples: eg:

    [('https://datadryad.org/api/v2/files/385819/download', 'GCB_ACG_Mortality_2020.zip', 'application/x-zip-compressed', 23787587), ('https://datadryad.org/api/v2/files/385820/download', 'Readme_ACG_Mortality.txt', 'text/plain', 1350)]

Source code in src/dryad2dataverse/monitor.py
def get_dv_fids(self, filelist):
    '''
    Returns Dataverse file IDs from a list of Dryad file tuples.
    Generally, you would use the output from
    dryad2dataverse.monitor.Monitor.diff_files['delete']
    to discover Dataverse file ids for deletion.

    Parameters
    ----------
    filelist : list
        List of Dryad file tuples: eg:

        ```
        [('https://datadryad.org/api/v2/files/385819/download',
          'GCB_ACG_Mortality_2020.zip',
          'application/x-zip-compressed', 23787587),
         ('https://datadryad.org/api/v2/files/385820/download',
         'Readme_ACG_Mortality.txt',
         'text/plain', 1350)]
         ```
    '''
    fids = []
    for f in filelist:
        fids.append(self.get_dv_fid(f[0]))
    return fids

get_json_dvfids(serial)

Return a list of Dataverse file ids for Dryad JSONs which were uploaded to Dataverse. Normally used to discover the file IDs to remove Dryad JSONs which have changed.

Parameters:
Returns:
  • list
Source code in src/dryad2dataverse/monitor.py
def get_json_dvfids(self, serial)->list:
    '''
    Return a list of Dataverse file ids for Dryad JSONs which were
    uploaded to Dataverse.
    Normally used to discover the file IDs to remove Dryad JSONs
    which have changed.

    Parameters
    ----------
    serial : dryad2dataverse.serializer.Serializer

    Returns
    -------
    list
    '''
    self.cursor.execute('SELECT max(uid) FROM dryadStudy WHERE doi=?',
                        (serial.doi,))
    try:
        uid = self.cursor.fetchone()[0]
        self.cursor.execute('SELECT dvfid FROM dvFiles WHERE \
                             dryaduid = ? AND dryfid=?', (uid, 0))
        jsonfid = [f[0] for f in self.cursor.fetchall()]
        return jsonfid

    except TypeError:
        return []

set_timestamp(curdate=None)

Adds current time to the database table. Can be queried and be used for subsequent checking for updates. To query last modification time, use the dataverse2dryad.monitor.Monitor.lastmod attribute.

Parameters:
  • curdate (str, default: None ) –

    UTC datetime string in the format suitable for the Dryad API. eg. 2021-01-21T21:42:40Z or .strftime(‘%Y-%m-%dT%H:%M:%SZ’).

Source code in src/dryad2dataverse/monitor.py
def set_timestamp(self, curdate=None):
    '''
    Adds current time to the database table. Can be queried and be used
    for subsequent checking for updates. To query last modification time,
    use the dataverse2dryad.monitor.Monitor.lastmod attribute.

    Parameters
    ----------
    curdate : str
        UTC datetime string in the format suitable for the Dryad API.
        eg. 2021-01-21T21:42:40Z
           or .strftime('%Y-%m-%dT%H:%M:%SZ').
    '''
    #Dryad API uses Zulu time
    if not curdate:
        curdate = datetime.datetime.now(datetime.timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')
    self.cursor.execute('INSERT INTO lastcheck VALUES (?)',
                        (curdate,))
    self.conn.commit()

status(serial)

Returns a dictionary with keys ‘status’ and ‘dvpid’ and ‘notes’.

Parameters:
  • serial ( dryad2dataverse.serializer.Serializer) –
Returns:
  • `{status :'updated', 'dvpid':'doi://some/ident'}`.
Notes

status is one of ‘new’, ‘identical’, ‘lastmodsame’, ‘updated’

‘new’ is a completely new file.

‘identical’ The metadata from Dryad is identical to the last time the check was run.

‘lastmodsame’ Dryad lastModificationDate == last modification date in database AND output JSON is different. This can indicate a Dryad API output change, reindexing or something else. But the lastModificationDate is supposed to be an indicator of meaningful change, so this option exists so you can decide what to do given this option

‘updated’ Indicates changes to lastModificationDate

Note that Dryad constantly changes their API output, so the changes may not actually be meaningful.

dvpid is a Dataverse persistent identifier. None in the case of status=’new’

notes: value of Dryad versionChanges field. One of files_changed or metatdata_changed. Non-null value present only when status is not new or identical. Note that Dryad has no way to indicate both a file and metadata change, so this value reflects only the last change in the Dryad state.

Source code in src/dryad2dataverse/monitor.py
def status(self, serial)->dict:
    '''
    Returns a dictionary with keys 'status' and 'dvpid' and 'notes'.

    Parameters
    ----------
    serial :  dryad2dataverse.serializer.Serializer

    Returns
    -------
    `{status :'updated', 'dvpid':'doi://some/ident'}`.

    Notes
    ------
    `status` is one of 'new', 'identical',  'lastmodsame',
    'updated'

    'new' is a completely new file.

    'identical' The metadata from Dryad is *identical* to the last time
    the check was run.

    'lastmodsame' Dryad lastModificationDate ==  last modification date
    in database AND output JSON is different.
    This can indicate a Dryad
    API output change, reindexing or something else.
    But the lastModificationDate
    is supposed to be an indicator of meaningful change, so this option
    exists so you can decide what to do given this option

    'updated' Indicates changes to lastModificationDate

    Note that Dryad constantly changes their API output, so the changes
    may not actually be meaningful.

    `dvpid` is a Dataverse persistent identifier.
    `None` in the case of status='new'

    `notes`: value of Dryad versionChanges field. One of `files_changed` or
    `metatdata_changed`. Non-null value present only when status is
    not `new` or `identical`. Note that Dryad has no way to indicate *both*
    a file and metadata change, so this value reflects only the *last* change
    in the Dryad state.
    '''
    # Last mod date is indicator of change.
    # From email w/Ryan Scherle 10 Nov 2020
    #The versionNumber updates for either a metadata change or a
    #file change. Although we save all of these changes internally, our web
    #interface only displays the versions that have file changes, along
    #with the most recent metadata. So a dataset that has only two versions
    #of files listed on the web may actually have several more versions in
    #the API.
    #
    #If your only need is to track when there are changes to a
    #dataset, you may want to use the `lastModificationDate`, which we have
    #recently added to our metadata.
    #
    #Note that the Dryad API output ISN'T STABLE; they add fields etc.
    #This means that a comparison of JSON may yield differences even though
    #metadata is technically "the same". Just comparing two dicts doesn't cut
    #it.
    #############################
    ## Note: by inspection, Dryad outputs JSON that is different
    ## EVEN IF lastModificationDate is unchanged. (14 January 2022)
    ## So now what?
    #############################
    doi = serial.dryadJson['identifier']
    self.cursor.execute('SELECT * FROM dryadStudy WHERE doi = ?',
                        (doi,))
    result = self.cursor.fetchall()

    if not result:
        return {'status': 'new', 'dvpid': None, 'notes': ''}
    # dvjson = json.loads(result[-1][4])
    # Check the fresh vs. updated jsons for the keys
    try:
        dryaduid = result[-1][0]
        self.cursor.execute('SELECT dvpid from dvStudy WHERE \
                             dryaduid = ?', (dryaduid,))
        dvpid = self.cursor.fetchall()[-1][0]
        serial.dvpid = dvpid
    except TypeError as exc:
        LOGGER.error('Dryad DOI : %s. Error finding Dataverse PID', doi)
        LOGGER.exception(exc)
        raise exceptions.DatabaseError from exc

    newfile = copy.deepcopy(serial.dryadJson)
    testfile = copy.deepcopy(json.loads(result[-1][3]))
    if newfile == testfile:
        return {'status': 'identical', 'dvpid': dvpid, 'notes': ''}
    if newfile['lastModificationDate'] != testfile['lastModificationDate']:
        return {'status': 'updated', 'dvpid': dvpid,
                'notes': newfile['versionChanges']}
    return {'status': 'lastmodsame', 'dvpid': dvpid,
                 'notes': newfile.get('versionChanges')}

update(transfer)

Updates the Monitor database with information from a dryad2dataverse.transfer.Transfer instance.

If a Dryad primary metadata record has changes, it will be deleted from the database.

This method should be called after all transfers are completed, including Dryad JSON updates, as the last action for transfer.

Parameters:
Source code in src/dryad2dataverse/monitor.py
def update(self, transfer):
    '''
    Updates the Monitor database with information from a
    dryad2dataverse.transfer.Transfer instance.

    If a Dryad primary metadata record has changes, it will be
    deleted from the database.

    This method should be called after all transfers are completed,
    including Dryad JSON updates, as the last action for transfer.

    Parameters
    ----------
    transfer : dryad2dataverse.transfer.Transfer
    '''
    #pylint: disable=too-many-branches, too-many-statements, too-many-locals
    # get the pre-update dryad uid in case we need it.
    self.cursor.execute('SELECT max(uid) FROM dryadStudy WHERE doi = ?',
                        (transfer.dryad.dryadJson['identifier'],))
    olduid = self.cursor.fetchone()[0]
    if olduid:
        olduid = int(olduid)
    if self.status(transfer.dryad)['status'] != 'unchanged':
        doi = transfer.doi
        lastmod = transfer.dryad.dryadJson.get('lastModificationDate')
        dryadJson = json.dumps(transfer.dryad.dryadJson)
        dvJson = json.dumps(transfer.dvStudy)

        # Update study metadata
        self.cursor.execute('INSERT INTO dryadStudy \
                             (doi, lastmoddate, dryadjson, dvjson) \
                             VALUES (?, ?, ?, ?)',
                            (doi, lastmod, dryadJson, dvJson))
        self.cursor.execute('SELECT max(uid) FROM dryadStudy WHERE \
                             doi = ?', (doi,))
        dryaduid = self.cursor.fetchone()[0]
        #if type(dryaduid) != int:
        if not isinstance(dryaduid, int):
            try:
                raise TypeError('Dryad UID is not an integer')
            except TypeError as e:
                LOGGER.error(e)
                raise

        # Update dryad file json
        self.cursor.execute('INSERT INTO dryadFiles VALUES (?, ?)',
                            (dryaduid,
                             json.dumps(transfer.dryad.fileJson)))
        # Update dataverse study map
        self.cursor.execute('SELECT dvpid FROM dvStudy WHERE \
                             dvpid = ?', (transfer.dryad.dvpid,))
        if not self.cursor.fetchone():
            self.cursor.execute('INSERT INTO dvStudy VALUES (?, ?)',
                                (dryaduid, transfer.dryad.dvpid))
        else:
            self.cursor.execute('UPDATE dvStudy SET dryaduid=?, \
                                 dvpid=? WHERE dvpid =?',
                                (dryaduid, transfer.dryad.dvpid,
                                 transfer.dryad.dvpid))

        # Update the files table
        # Because we want to have a *complete* file list for each
        # dryaduid, we have to copy any existing old files,
        # then add and delete.
        if olduid:
            self.cursor.execute('SELECT * FROM dvFiles WHERE \
                                 dryaduid=?', (olduid,))
            inserter = self.cursor.fetchall()
            for rec in inserter:
                # TODONE FIX THIS #I think it's fixed 11 Feb 21
                self.cursor.execute('INSERT INTO dvFiles VALUES \
                                     (?, ?, ?, ?, ?, ?)',
                                    (dryaduid, rec[1], rec[2],
                                     rec[3], rec[4], rec[5]))
        # insert newly uploaded files
        for rec in transfer.fileUpRecord:
            try:
                dvfid = rec[1]['data']['files'][0]['dataFile']['id']
                # Screw you for burying the file ID this deep
                recMd5 = rec[1]['data']['files'][0]['dataFile']['checksum']['value']
            except (KeyError, IndexError) as err:
                #write to failed uploads table instead
                status = rec[1].get('status')
                if not status:
                    LOGGER.error('JSON read error for Dryad file ID %s', rec[0])
                    LOGGER.error('File %s for DOI %s may not be uploaded', rec[0], transfer.doi)
                    LOGGER.exception(err)
                    msg = {'status': 'Failure: Other non-specific '
                                     'failure. Check logs'}

                    self.cursor.execute('INSERT INTO failed_uploads VALUES \
                                    (?, ?, ?);', (dryaduid, rec[0], json.dumps(msg)))
                    continue
                self.cursor.execute('INSERT INTO failed_uploads VALUES \
                                    (?, ?, ?);', (dryaduid, rec[0], json.dumps(rec[1])))
                LOGGER.warning(type(err))
                LOGGER.warning('%s. DOI %s, File ID %s',
                               rec[1].get('status'),
                               transfer.doi, rec[0])
                continue
            # md5s verified during upload step, so they should
            # match already
            self.cursor.execute('INSERT INTO dvFiles VALUES \
                                 (?, ?, ?, ?, ?, ?)',
                                (dryaduid, rec[0], recMd5,
                                 dvfid, recMd5, json.dumps(rec[1])))

        # Now the deleted files
        for rec in transfer.fileDelRecord:
            # fileDelRecord consists only of [fid,fid2, ...]
            # Dryad record ID is int not str
            self.cursor.execute('DELETE FROM dvFiles WHERE dvfid=? \
                                 AND dryaduid=?',
                                (int(rec), dryaduid))
            LOGGER.debug('deleted dryfid = %s, dryaduid = %s', rec, dryaduid)

        # And lastly, any JSON metadata updates:
        # NOW WHAT?
        # JSON has dryfid==0
        self.cursor.execute('SELECT * FROM dvfiles WHERE \
                             dryfid=? and dryaduid=?',
                            (0, dryaduid))
        try:
            exists = self.cursor.fetchone()[0]
            # Old metadata must be deleted on a change.
            if exists:
                shouldDel = self.status(transfer.dryad)['status']
                if shouldDel == 'updated':
                    self.cursor.execute('DELETE FROM dvfiles WHERE \
                                         dryfid=? and dryaduid=?',
                                        (0, dryaduid))
        except TypeError:
            pass

        if transfer.jsonFlag:
            # update dryad JSON
            djson5 = transfer.jsonFlag[1]['data']['files'][0]['dataFile']['checksum']['value']
            dfid = transfer.jsonFlag[1]['data']['files'][0]['dataFile']['id']
            self.cursor.execute('INSERT INTO dvfiles VALUES \
                                 (?, ?, ?, ?, ?, ?)',
                                (dryaduid, 0, djson5, dfid,
                                 djson5, json.dumps(transfer.jsonFlag[1])))

    self.conn.commit()

dryad2dataverse.handlers

Custom log handlers for sending log information to recipients.

SSLSMTPHandler

Bases: SMTPHandler

An SSL handler for logging.handlers

Source code in src/dryad2dataverse/handlers.py
class SSLSMTPHandler(SMTPHandler):
    '''
    An SSL handler for logging.handlers
    '''
    def emit(self, record:logging.LogRecord):
        '''
        Emit a record while using an SSL mail server.

        Parameters
        ----------
        record : logging.LogRecord
        '''
        #Praise be to
        #https://stackoverflow.com/questions/36937461/
        #how-can-i-send-an-email-using-python-loggings-
        #smtphandler-and-ssl
        try:
            port = self.mailport
            if not port:
                port = smtplib.SMTP_PORT
            smtp = smtplib.SMTP_SSL(self.mailhost, port)
            msg = self.format(record)
            out = EmailMessage()
            out['Subject'] = self.getSubject(record)
            out['From'] = self.fromaddr
            out['To'] = self.toaddrs
            out.set_content(msg)
            #global rec2
            #rec2 = record
            if self.username:
                smtp.login(self.username, self.password)
            #smtp.sendmail(self.fromaddr, self.toaddrs, msg)
            #Attempting to send using smtp.sendmail as above
            #results in messages with no text, so use
            smtp.send_message(out)
            smtp.quit()
        except (KeyboardInterrupt, SystemExit):
            raise
        except: # pylint: disable=bare-except
            self.handleError(record)

emit(record)

Emit a record while using an SSL mail server.

Parameters:
  • record (LogRecord) –
Source code in src/dryad2dataverse/handlers.py
def emit(self, record:logging.LogRecord):
    '''
    Emit a record while using an SSL mail server.

    Parameters
    ----------
    record : logging.LogRecord
    '''
    #Praise be to
    #https://stackoverflow.com/questions/36937461/
    #how-can-i-send-an-email-using-python-loggings-
    #smtphandler-and-ssl
    try:
        port = self.mailport
        if not port:
            port = smtplib.SMTP_PORT
        smtp = smtplib.SMTP_SSL(self.mailhost, port)
        msg = self.format(record)
        out = EmailMessage()
        out['Subject'] = self.getSubject(record)
        out['From'] = self.fromaddr
        out['To'] = self.toaddrs
        out.set_content(msg)
        #global rec2
        #rec2 = record
        if self.username:
            smtp.login(self.username, self.password)
        #smtp.sendmail(self.fromaddr, self.toaddrs, msg)
        #Attempting to send using smtp.sendmail as above
        #results in messages with no text, so use
        smtp.send_message(out)
        smtp.quit()
    except (KeyboardInterrupt, SystemExit):
        raise
    except: # pylint: disable=bare-except
        self.handleError(record)

dryad2dataverse.exceptions

Custom exceptions for error handling.

DatabaseError

Bases: Dryad2DataverseError

Tracking database error.

Source code in src/dryad2dataverse/exceptions.py
class DatabaseError(Dryad2DataverseError):
    '''
    Tracking database error.
    '''

DataverseBadApiKeyError

Bases: Dryad2DataverseError

Returned on not OK respose (ie, request.request.json()[‘message’] == ‘Bad api key ‘).

Source code in src/dryad2dataverse/exceptions.py
class DataverseBadApiKeyError(Dryad2DataverseError):
    '''
    Returned on not OK respose (ie, request.request.json()['message'] == 'Bad api key ').
    '''

DataverseDownloadError

Bases: Dryad2DataverseError

Returned on not OK respose (ie, not requests.status_code == 200).

Source code in src/dryad2dataverse/exceptions.py
class DataverseDownloadError(Dryad2DataverseError):
    '''
    Returned on not OK respose (ie, not requests.status_code == 200).
    '''

DataverseUploadError

Bases: Dryad2DataverseError

Returned on not OK respose (ie, not requests.status_code == 200).

Source code in src/dryad2dataverse/exceptions.py
class DataverseUploadError(Dryad2DataverseError):
    '''
    Returned on not OK respose (ie, not requests.status_code == 200).
    '''

DownloadSizeError

Bases: Dryad2DataverseError

Raised when download sizes don’t match reported Dryad file size.

Source code in src/dryad2dataverse/exceptions.py
class DownloadSizeError(Dryad2DataverseError):
    '''
    Raised when download sizes don't match reported
    Dryad file size.
    '''

Dryad2DataverseError

Bases: Exception

Base exception class for Dryad2Dataverse errors.

Source code in src/dryad2dataverse/exceptions.py
class Dryad2DataverseError(Exception):
    '''
    Base exception class for Dryad2Dataverse errors.
    '''

HashError

Bases: Dryad2DataverseError

Raised on hex digest mismatch.

Source code in src/dryad2dataverse/exceptions.py
class HashError(Dryad2DataverseError):
    '''
    Raised on hex digest mismatch.
    '''

NoTargetError

Bases: Dryad2DataverseError

No dataverse target supplied error.

Source code in src/dryad2dataverse/exceptions.py
class NoTargetError(Dryad2DataverseError):
    '''
    No dataverse target supplied error.
    '''