API Reference

dataverse_utils

Generalized dataverse utilities. Note that import dataverse_utils is the equivalent of import dataverse_utils.dataverse_utils

dataverse_utils.dvdata

Dataverse studies and files

Study Objects

class Study(dict)

Dataverse record. Dataverse study records are pure metadata so this is represented with a dictionary.

__init__
def __init__(pid: str, url: str, key: str, **kwargs)

pid : str Record persistent identifier: hdl or doi url : str Base URL to host Dataverse instance key : str Dataverse API key with downloader privileges

get_version
@classmethod
def get_version(cls, url: str, timeout: int = 100) -> float

Returns a float representing a Dataverse version number. Floating point value composed of: float(f’{major_version}.{minor_verson:03d}{patch:03d}’) ie, version 5.9.2 would be 5.009002 url : str URL of base Dataverse instance. eg: ‘https://abacus.library.ubc.ca’ timeout : int Request timeout in seconds

set_version
def set_version(url: str, timeout: int = 100) -> None

Sets self[‘target_version’] to appropriate integer value AND formats self[‘upload_json’] to correct JSON format

url : str URL of target Dataverse instance timeout : int request timeout in seconds

fix_licence
def fix_licence() -> None

With Dataverse v5.10+, a licence type of ‘NONE’ is now forbidden. Now, as per https://guides.dataverse.org/en/5.14/api/sword.html ?highlight=invalid%20license, non-standard licences may be replaced with None.

This function edits the same Study object in place, so returns nothing.

production_location
def production_location() -> None

Changes “multiple” to True where typeName == ‘productionPlace’ in Study[‘upload_json’] Changes are done in place. This change came into effect with Dataverse v5.13

File Objects

class File(dict)

Class representing a file on a Dataverse instance

__init__
def __init__(url: str, key: str, **kwargs)

url : str Base URL to host Dataverse instance key : str Dataverse API key with downloader privileges id : int or str File identifier; can be a file ID or PID args : list kwargs : dict

To initialize correctly, pass a value from Study[‘file_info’].

Eg: File(‘https://test.invalid’, ‘ABC123’, **Study_instance[‘file_info’][0])

download_file
def download_file()

Downloads the file to a temporary location. Data will be in the ORIGINAL format, not Dataverse-processed TSVs

del_tempfile
def del_tempfile()

Delete tempfile if it exists

produce_digest
def produce_digest(prot: str = 'md5', blocksize: int = 2**16) -> str

Returns hex digest for object

fname : str
   Path to a file object

prot : str
   Hash type. Supported hashes: 'sha1', 'sha224', 'sha256',
      'sha384', 'sha512', 'blake2b', 'blake2s', 'md5'.
      Default: 'md5'

blocksize : int
   Read block size in bytes

verify
def verify() -> None

Compares checksum with stated checksum

dataverse_utils.scripts.dv_record_copy

Copies a dataverse record to collection OR copies a record to an existing PID.

That way all you have to do is edit a few fields in the GUI instead of painfully editing JSON or painfully using the Dataverse GUI.

parsley
def parsley() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

main
def main()

You know what this does

dataverse_utils.scripts.dv_study_migrator

Copies an entire record and migrates it including the data

parsley
def parsley() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

upload_file_to_target
def upload_file_to_target(indict: dict, pid, source_url, source_key,
                          target_url, target_key)

Uploads a single file with metadata to a dataverse record

remove_target_files
def remove_target_files(record: dataverse_utils.dvdata.Study,
                        timeout: int = 100)

Removes all files from a dataverse record. record: dataverse_utils.dvdata.Study timeout: int Timeout in seconds

main
def main()

Run this, obviously

dataverse_utils.scripts.dv_ldc_uploader

Auto download/upload LDC metadata and files.

python3 uploadme.py LDC20201S01 . . . LDC2021T21 apikey

parse
def parse() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

upload_meta
def upload_meta(ldccat: str,
                url: str,
                key: str,
                dvs: str,
                verbose: bool = False,
                certchain: str = None) -> str

Uploads metadata to target dataverse collection. Returns persistentId.

ldccat : str Linguistic Data Consortium catalogue number url : str URL to base instance of Dataverse installation key : str API key dvs : str Target Dataverse collection short name certchain : str Path to LDC .PEM certificate chain

main
def main() -> None

Uploads metadata and data to Dataverse collection/study respectively

dataverse_utils.scripts.dv_pg_facet_date

Reads the date from a Dataverse study and forces the facet sidebar to use that date by manually updating the Dataverse Postgres database.

This must be run on the server that hosts a Dataverse installation, and the user must supply, at a minimum, the database password and a persistent ID to be read, as well as a date type.

Requires two non-standard python libraries: psycopg2 (use psycopg2-binary to avoid installing from source) and requests.

Psycopg2 is not part of the requirements for dataverse_utils because it is only used for the server portion of these utilities, and hence useless for them.

parsely
def parsely() -> argparse.ArgumentParser

Command line argument parser

parse_dtype
def parse_dtype(dtype) -> str

Returns correctly formatted date type string for Dataverse API


dtype : str One of the allowable values from the parser

write_old
def write_old(data) -> None

Writes older data to a tsv file. Assumes 4 values per item: id, authority, identifier, publicationdate.

publicationdate is assumed to be a datetime.datetime instance.

Arguments:

data : list Postqres query output list (ie, data = cursor.fetchall())

write_sql
def write_sql(data) -> None

Write SQL to file

get_datetime
def get_datetime(datestr) -> (datetime.datetime, str)

Return datetime from poorly formatted Dataverse dates string


datestr : str Dataverse date returned by API

fetch_date_api
def fetch_date_api(url, key, pid, dtype) -> str

Returns the requested date string from the Dataverse study record


url : str Base URL of Dataverse installation key :str API key for Dataverse user pid : str Persistent identifier for Dataverse study dtype : str Date type required

reindex
def reindex(pid) -> dict

Reindexes study in place. Localhost access only.


pid : str PersistentId for Dataverse study

main
def main()

The heart of the application

dataverse_utils.scripts.dv_release

Bulk release script for Dataverse.

This is almost identical to the dryad2dataverse bulk releaser except that the defaults are changed to https://abacus.library.ubc.ca

argp
def argp()

Parses the arguments from the command line.

Returns arparse.ArgumentParser

Dverse Objects

class Dverse()

An object representing a Dataverse installation

__init__
def __init__(dvurl, apikey, dvs)

Intializes Dataverse installation object.

Arguments:

  • dvurl: str. URL to base Dataverse installation (eg. ‘https://abacus.library.ubc.ca’)
  • apikey: str. API key for Dataverse user
  • dv: str. Short name of target Dataverse collection (eg. ‘statcan’)

study_list
@property
def study_list() -> list

Returns a list of all studies (published or not) in the Dataverse collection

unreleased
@property
def unreleased(all_stud: list = None) -> list

Finds only unreleased studies from a list of studies

Arguments:

  • all_stud: list. List of Dataverse studies. Defaults to output of Dverse.get_study_list()

Study Objects

class Study()

Instance representing a Dataverse study

__init__
def __init__(**kwargs)

:kwarg dvurl: str. Base URL for Dataverse instance :kwarg apikey: str. API key for Dataverse user :kwarg pid: str. Persistent identifier for study :kwarg stime: int. Time between file lock checks. Default 10 :kwarg verbose: Verbose output. Default False

status_ok
def status_ok()

Checks to see if study has a lock. Returns True if OK to continue, else False.

release_me
def release_me(interactive=False)

Releases study and waits until it’s unlocked before returning to the function

main
def main()

The primary function. Will release all unreleased studies in the the target Dataverse collection, or selected studies as required.

dataverse_utils.scripts.dv_del

Dataverse Bulk Deleter Deletes unpublished studies at the command line

delstudy
def delstudy(dvurl, key, pid)

Deletes Dataverse study

dvurl : str Dataverse installation base URL key : str Dataverse user API key pid : str Dataverse collection study persistent identifier

conf
def conf(tex)

Confirmation dialogue checker. Returns true if “Y” or “y”

getsize
def getsize(dvurl, pid, key)

Returns size of Dataverse study. Mostly here for debugging. dvurl : str Dataverse installation base URL pid : str Dataverse collection study persistent identifier key : str Dataverse user API key

parsley
def parsley() -> argparse.ArgumentParser

Argument parser as separate function

main
def main()

Command line bulk deleter

dataverse_utils.scripts.dv_replace_licence

Replace all licence in a study with one read from an external markdown file. This requires using a different API, the “semantic metadata api” https://guides.dataverse.org/en/5.6/developers/ dataset-semantic-metadata-api.html

parsley
def parsley() -> argparse.ArgumentParser()

parse the command line

replace_licence
def replace_licence(hdl, lic, key, url='https://abacus.library.ubc.ca')

Replace the licence for a dataverse study with persistent ID hdl.


hdl : str Dataverse persistent ID lic : str Licence text in Markdown format key : str Dataverse API key url : str Dataverse installation base URL

republish
def republish(hdl, key, url='https://abacus.library.ubc.ca')

Republish study without updating version


hdl : str Persistent Id key : str Dataverse API key url : str Dataverse installation base URL

def print_stat(rjson)

Prints error status to stdout

main
def main()

Main script function

dataverse_utils.scripts.dv_upload_tsv

Uploads data sets to a dataverse installation from the contents of a TSV (tab separated value) file. Metadata, file tags, paths, etc are all read from the TSV.

parse
def parse() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

main
def main() -> None

Uploads data to an already existing Dataverse study

dataverse_utils.scripts.dv_manifest_gen

Creates a file manifest in tab separated value format which can be used with other dataverse_util library utilities and functions to upload files complete with metadata.

parse
def parse() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

quotype
def quotype(quote: str) -> int

Parse quotation type for csv parser.

returns csv quote constant.

main
def main() -> None

The main function call

dataverse_utils.dataverse_utils

A collection of Dataverse utilities for file and metadata manipulation

DvGeneralUploadError Objects

class DvGeneralUploadError(Exception)

Raised on non-200 URL response

Md5Error Objects

class Md5Error(Exception)

Raised on md5 mismatch

make_tsv
def make_tsv(start_dir,
             in_list=None,
             def_tag='Data',
             inc_header=True,
             mime=False,
             quotype=csv.QUOTE_MINIMAL) -> str

Recurses the tree for files and produces tsv output with with headers ‘file’, ‘description’, ‘tags’.

The ‘description’ is the filename without an extension.

Returns tsv as string.


Arguments:

start_dir : str Path to start directory

in_list : list Input file list. Defaults to recursive walk of current directory.

def_tag : str Default Dataverse tag (eg, Data, Documentation, etc) Separate tags with a comma: eg. (‘Data, 2016’)

inc_header : bool Include header row

mime : bool Include automatically determined mimetype

  • quotype - int integer value or csv quote type. Default = csv.QUOTE_MINIMAL Acceptable values: csv.QUOTE_MINIMAL / 0 csv.QUOTE_ALL / 1 csv.QUOTE_NONNUMERIC / 2 csv.QUOTE_NONE / 3

dump_tsv
def dump_tsv(start_dir, filename, in_list=None, **kwargs)

Dumps output of make_tsv manifest to a file.


Arguments:

start_dir : str Path to start directory

in_list : list List of files for which to create manifest entries. Will default to recursive directory crawl

OPTIONAL KEYWORD ARGUMENTS

def_tag : str Default Dataverse tag (eg, Data, Documentation, etc) Separate tags with an easily splitable character: eg. (‘Data, 2016’) - Default - ‘Data’

inc_header : bool Include header for tsv. Default : True

  • quotype - int integer value or csv quote type. Default : csv.QUOTE_MINIMAL Acceptable values: csv.QUOTE_MINIMAL / 0 csv.QUOTE_ALL / 1 csv.QUOTE_NONNUMERIC / 2 csv.QUOTE_NONE / 3

file_path
def file_path(fpath, trunc='') -> str

Create relative file path from full path string

file_path(‘/tmp/Data/2011/excelfile.xlsx’, ‘/tmp/’) ‘Data/2011’ file_path(‘/tmp/Data/2011/excelfile.xlsx’, ‘/tmp’) ‘Data/2011’


Arguments:

fpath : str File location (ie, complete path)

trunc : str Leftmost portion of path to remove

check_lock
def check_lock(dv_url, study, apikey) -> bool

Checks study lock status; returns True if locked.


Arguments:

dvurl : str URL of Dataverse installation

  • study - str Persistent ID of study

apikey : str API key for user

force_notab_unlock
def force_notab_unlock(study, dv_url, fid, apikey, try_uningest=True) -> int

Forcibly unlocks and uningests to prevent tabular file processing. Required if mime and filename spoofing is not sufficient.

Returns 0 if unlocked, file id if locked (and then unlocked).


Arguments:

study : str Persistent indentifer of study

dv_url : str URL to base Dataverse installation

fid : str File ID for file object

apikey : str API key for user

try_uningest : bool Try to uningest the file that was locked. - Default - True

uningest_file
def uningest_file(dv_url, fid, apikey, study='n/a')

Tries to uningest a file that has been ingested. Requires superuser API key.


Arguments:

dv_url : str URL to base Dataverse installation

fid : int or str File ID of file to uningest

apikey : str API key for superuser

study : str Optional handle parameter for log messages

upload_file
def upload_file(fpath, hdl, **kwargs)

Uploads file to Dataverse study and sets file metadata and tags.


Arguments:

fpath : str file location (ie, complete path)

hdl : str Dataverse persistent ID for study (handle or DOI)

kwargs : dict

other parameters. Acceptable keywords and contents are:

dv : str REQUIRED url to base Dataverse installation - eg - ‘https://abacus.library.ubc.ca’

apikey : str REQUIRED API key for user

descr : str OPTIONAL file description

md5 : str OPTIONAL md5sum for file checking

tags : list OPTIONAL list of text file tags. Eg [‘Data’, ‘June 2020’]

dirlabel : str OPTIONAL Unix style relative pathname for Dataverse file path: eg: path/to/file/

nowait : bool OPTIONAL Force a file unlock and uningest instead of waiting for processing to finish

trunc : str OPTIONAL Leftmost portion of path to remove

rest : bool OPTIONAL Restrict file. Defaults to false unless True supplied

mimetype : str OPTIONAL Mimetype of file. Useful if using File Previewers. Mimetype for zip files (application/zip) will be ignored to circumvent Dataverse’s automatic unzipping function. label : str OPTIONAL If included in kwargs, this value will be used for the label timeout = int OPTIONAL Timeout in seconds

restrict_file
def restrict_file(**kwargs)

Restrict file in Dataverse study.


Arguments:

kwargs : dict

other parameters. Acceptable keywords and contents are:

One of pid or fid is required pid : str file persistent ID

fid : str file database ID

dv : str REQUIRED url to base Dataverse installation - eg - ‘https://abacus.library.ubc.ca’

apikey : str REQUIRED API key for user

rest : bool On True, restrict. Default True

upload_from_tsv
def upload_from_tsv(fil, hdl, **kwargs)

Utility for bulk uploading. Assumes fil is formatted as tsv with headers ‘file’, ‘description’, ‘tags’.

‘tags’ field will be split on commas.


Arguments:

fil : filelike object Open file object or io.IOStream()

hdl : str Dataverse persistent ID for study (handle or DOI)

trunc : str Leftmost portion of Dataverse study file path to remove. - eg - trunc =’/home/user/’ if the tsv field is ‘/home/user/Data/ASCII’ would set the path for that line of the tsv to ‘Data/ASCII’. Defaults to None.

kwargs : dict

other parameters. Acceptable keywords and contents are:

dv : str REQUIRED url to base Dataverse installation - eg - ‘https://abacus.library.ubc.ca’

apikey : str REQUIRED API key for user

rest : bool On True, restrict access. Default False

dataverse_utils.ldc

Creates dataverse JSON from Linguistic Data Consortium website page.

Ldc Objects

class Ldc(ds.Serializer)

An LDC item (eg, LDC2021T01)

__init__
def __init__(ldc, cert=None)

Returns a dict with keys created from an LDC catalogue web page.


Arguments:

ldc : str Linguistic Consortium Catalogue Number (eg. ‘LDC2015T05’. This is what forms the last part of the LDC catalogue URL. cert : str Path to certificate chain; LDC has had a problem with intermediate certificates, so you can download the chain with a browser and supply a path to the .pem with this parameter

ldcJson
@property
def ldcJson()

Returns a JSON based on the LDC web page scraping

dryadJson
@property
def dryadJson()

LDC metadata in Dryad JSON format

dvJson
@property
def dvJson()

LDC metadata in Dataverse JSON format

embargo
@property
def embargo()

Boolean indicating embargo status

fileJson
@property
def fileJson(timeout=45)

Returns False: No attached files possible at LDC

files
@property
def files()

Returns None. No files possible

fetch_record
def fetch_record(url=None, timeout=45)

Downloads record from LDC website

make_ldc_json
def make_ldc_json()

Returns a dict with keys created from an LDC catalogue web page.

name_parser
@staticmethod
def name_parser(name)

Returns lastName/firstName JSON snippet from name


Arguments:

name : str A name

make_dryad_json
def make_dryad_json(ldc=None)

Creates a Dryad-style dict from an LDC dictionary


Arguments:

ldc : dict Dictionary containing LDC data. Defaults to self.ldcJson

find_block_index
@staticmethod
def find_block_index(dvjson, key)

Finds the index number of an item in Dataverse’s idiotic JSON list


Arguments:

dvjson : dict Dataverse JSON

key : str key for which to find list index

make_dv_json
def make_dv_json(ldc=None)

Returns complete Dataverse JSON


Arguments:

ldc : dict LDC dictionary. Defaults to self.ldcJson

upload_metadata
def upload_metadata(**kwargs) -> dict

Uploads metadata to dataverse

Returns json from connection attempt.


Arguments:

kwargs:

url : str base url to Dataverse

key : str api key

dv : str Dataverse to which it is being uploaded