API Reference¶

dataverse_utils¶

Generalized dataverse utilities. Note that import dataverse_utils is the equivalent of import dataverse_utils.dataverse_utils

dataverse_utils.dvdata¶

Dataverse studies and files

Study Objects¶

class Study(dict)

Dataverse record. Dataverse study records are pure metadata so this is represented with a dictionary.

init¶

def __init__(pid: str, url: str, key: str, **kwargs)

pid : str Record persistent identifier: hdl or doi url : str Base URL to host Dataverse instance key : str Dataverse API key with downloader privileges

get_version¶

@classmethod
def get_version(cls, url: str, timeout: int = 100) -> float

Returns a float representing a Dataverse version number. Floating point value composed of: float(f’{major_version}.{minor_verson:03d}{patch:03d}’) ie, version 5.9.2 would be 5.009002 url : str URL of base Dataverse instance. eg: ‘https://abacus.library.ubc.ca’ timeout : int Request timeout in seconds

set_version¶

def set_version(url: str, timeout: int = 100) -> None

Sets self[‘target_version’] to appropriate integer value AND formats self[‘upload_json’] to correct JSON format

url : str URL of target Dataverse instance timeout : int request timeout in seconds

fix_licence¶

def fix_licence() -> None

With Dataverse v5.10+, a licence type of ‘NONE’ is now forbidden. Now, as per https://guides.dataverse.org/en/5.14/api/sword.html ?highlight=invalid%20license, non-standard licences may be replaced with None.

This function edits the same Study object in place, so returns nothing.

production_location¶

def production_location() -> None

Changes “multiple” to True where typeName == ‘productionPlace’ in Study[‘upload_json’] Changes are done in place. This change came into effect with Dataverse v5.13

File Objects¶

class File(dict)

Class representing a file on a Dataverse instance

init¶

def __init__(url: str, key: str, **kwargs)

url : str Base URL to host Dataverse instance key : str Dataverse API key with downloader privileges id : int or str File identifier; can be a file ID or PID args : list kwargs : dict

To initialize correctly, pass a value from Study[‘file_info’].

Eg: File(‘https://test.invalid’, ‘ABC123’, **Study_instance[‘file_info’][0])

download_file¶

def download_file()

Downloads the file to a temporary location. Data will be in the ORIGINAL format, not Dataverse-processed TSVs

del_tempfile¶

def del_tempfile()

Delete tempfile if it exists

produce_digest¶

def produce_digest(prot: str = 'md5', blocksize: int = 2**16) -> str

Returns hex digest for object

fname : str
   Path to a file object

prot : str
   Hash type. Supported hashes: 'sha1', 'sha224', 'sha256',
      'sha384', 'sha512', 'blake2b', 'blake2s', 'md5'.
      Default: 'md5'

blocksize : int
   Read block size in bytes

verify¶

def verify() -> None

Compares checksum with stated checksum

dataverse_utils.scripts.dv_record_copy¶

Copies a dataverse record to collection OR copies a record to an existing PID.

That way all you have to do is edit a few fields in the GUI instead of painfully editing JSON or painfully using the Dataverse GUI.

parsley¶

def parsley() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

main¶

def main()

You know what this does

dataverse_utils.scripts.dv_study_migrator¶

Copies an entire record and migrates it including the data

parsley¶

def parsley() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

upload_file_to_target¶

def upload_file_to_target(indict: dict, pid, source_url, source_key,
                          target_url, target_key)

Uploads a single file with metadata to a dataverse record

remove_target_files¶

def remove_target_files(record: dataverse_utils.dvdata.Study,
                        timeout: int = 100)

Removes all files from a dataverse record. record: dataverse_utils.dvdata.Study timeout: int Timeout in seconds

main¶

def main()

Run this, obviously

dataverse_utils.scripts.dv_ldc_uploader¶

Auto download/upload LDC metadata and files.

python3 uploadme.py LDC20201S01 . . . LDC2021T21 apikey

parse¶

def parse() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

upload_meta¶

def upload_meta(ldccat: str,
                url: str,
                key: str,
                dvs: str,
                verbose: bool = False,
                certchain: str = None) -> str

Uploads metadata to target dataverse collection. Returns persistentId.

ldccat : str Linguistic Data Consortium catalogue number url : str URL to base instance of Dataverse installation key : str API key dvs : str Target Dataverse collection short name certchain : str Path to LDC .PEM certificate chain

main¶

def main() -> None

Uploads metadata and data to Dataverse collection/study respectively

dataverse_utils.scripts.dv_pg_facet_date¶

Reads the date from a Dataverse study and forces the facet sidebar to use that date by manually updating the Dataverse Postgres database.

This must be run on the server that hosts a Dataverse installation, and the user must supply, at a minimum, the database password and a persistent ID to be read, as well as a date type.

Requires two non-standard python libraries: psycopg2 (use psycopg2-binary to avoid installing from source) and requests.

Psycopg2 is not part of the requirements for dataverse_utils because it is only used for the server portion of these utilities, and hence useless for them.

parsely¶

def parsely() -> argparse.ArgumentParser

Command line argument parser

parse_dtype¶

def parse_dtype(dtype) -> str

Returns correctly formatted date type string for Dataverse API

dtype : str One of the allowable values from the parser

write_old¶

def write_old(data) -> None

Writes older data to a tsv file. Assumes 4 values per item: id, authority, identifier, publicationdate.

publicationdate is assumed to be a datetime.datetime instance.¶

Arguments:

data : list Postqres query output list (ie, data = cursor.fetchall())

write_sql¶

def write_sql(data) -> None

Write SQL to file

get_datetime¶

def get_datetime(datestr) -> (datetime.datetime, str)

Return datetime from poorly formatted Dataverse dates string

datestr : str Dataverse date returned by API

fetch_date_api¶

def fetch_date_api(url, key, pid, dtype) -> str

Returns the requested date string from the Dataverse study record

url : str Base URL of Dataverse installation key :str API key for Dataverse user pid : str Persistent identifier for Dataverse study dtype : str Date type required

reindex¶

def reindex(pid) -> dict

Reindexes study in place. Localhost access only.

pid : str PersistentId for Dataverse study

main¶

def main()

The heart of the application

dataverse_utils.scripts.dv_release¶

Bulk release script for Dataverse.

This is almost identical to the dryad2dataverse bulk releaser except that the defaults are changed to https://abacus.library.ubc.ca

argp¶

def argp()

Parses the arguments from the command line.

Returns arparse.ArgumentParser

Dverse Objects¶

class Dverse()

An object representing a Dataverse installation

init¶

def __init__(dvurl, apikey, dvs)

Intializes Dataverse installation object.

Arguments:

dvurl: str. URL to base Dataverse installation (eg. ‘https://abacus.library.ubc.ca’)
apikey: str. API key for Dataverse user
dv: str. Short name of target Dataverse collection (eg. ‘statcan’)

study_list¶

@property
def study_list() -> list

Returns a list of all studies (published or not) in the Dataverse collection

unreleased¶

@property
def unreleased(all_stud: list = None) -> list

Finds only unreleased studies from a list of studies

Arguments:

all_stud: list. List of Dataverse studies. Defaults to output of Dverse.get_study_list()

Study Objects¶

class Study()

Instance representing a Dataverse study

init¶

def __init__(**kwargs)

:kwarg dvurl: str. Base URL for Dataverse instance :kwarg apikey: str. API key for Dataverse user :kwarg pid: str. Persistent identifier for study :kwarg stime: int. Time between file lock checks. Default 10 :kwarg verbose: Verbose output. Default False

status_ok¶

def status_ok()

Checks to see if study has a lock. Returns True if OK to continue, else False.

release_me¶

def release_me(interactive=False)

Releases study and waits until it’s unlocked before returning to the function

main¶

def main()

The primary function. Will release all unreleased studies in the the target Dataverse collection, or selected studies as required.

dataverse_utils.scripts.dv_del¶

Dataverse Bulk Deleter Deletes unpublished studies at the command line

delstudy¶

def delstudy(dvurl, key, pid)

Deletes Dataverse study

dvurl : str Dataverse installation base URL key : str Dataverse user API key pid : str Dataverse collection study persistent identifier

conf¶

def conf(tex)

Confirmation dialogue checker. Returns true if “Y” or “y”

getsize¶

def getsize(dvurl, pid, key)

Returns size of Dataverse study. Mostly here for debugging. dvurl : str Dataverse installation base URL pid : str Dataverse collection study persistent identifier key : str Dataverse user API key

parsley¶

def parsley() -> argparse.ArgumentParser

Argument parser as separate function

main¶

def main()

Command line bulk deleter

dataverse_utils.scripts.dv_replace_licence¶

Replace all licence in a study with one read from an external markdown file. This requires using a different API, the “semantic metadata api” https://guides.dataverse.org/en/5.6/developers/ dataset-semantic-metadata-api.html

parsley¶

def parsley() -> argparse.ArgumentParser()

parse the command line

replace_licence¶

def replace_licence(hdl, lic, key, url='https://abacus.library.ubc.ca')

Replace the licence for a dataverse study with persistent ID hdl.

hdl : str Dataverse persistent ID lic : str Licence text in Markdown format key : str Dataverse API key url : str Dataverse installation base URL

republish¶

def republish(hdl, key, url='https://abacus.library.ubc.ca')

Republish study without updating version

hdl : str Persistent Id key : str Dataverse API key url : str Dataverse installation base URL

print_stat¶

def print_stat(rjson)

Prints error status to stdout

main¶

def main()

Main script function

dataverse_utils.scripts.dv_upload_tsv¶

Uploads data sets to a dataverse installation from the contents of a TSV (tab separated value) file. Metadata, file tags, paths, etc are all read from the TSV.

parse¶

def parse() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

main¶

def main() -> None

Uploads data to an already existing Dataverse study

dataverse_utils.scripts.dv_manifest_gen¶

Creates a file manifest in tab separated value format which can be used with other dataverse_util library utilities and functions to upload files complete with metadata.

parse¶

def parse() -> argparse.ArgumentParser()

Parses the arguments from the command line.

Returns argparse.ArgumentParser

quotype¶

def quotype(quote: str) -> int

Parse quotation type for csv parser.

returns csv quote constant.

main¶

def main() -> None

The main function call

dataverse_utils.dataverse_utils¶

A collection of Dataverse utilities for file and metadata manipulation

DvGeneralUploadError Objects¶

class DvGeneralUploadError(Exception)

Raised on non-200 URL response

Md5Error Objects¶

class Md5Error(Exception)

Raised on md5 mismatch

make_tsv¶

def make_tsv(start_dir,
             in_list=None,
             def_tag='Data',
             inc_header=True,
             mime=False,
             quotype=csv.QUOTE_MINIMAL) -> str

Recurses the tree for files and produces tsv output with with headers ‘file’, ‘description’, ‘tags’.

The ‘description’ is the filename without an extension.

Returns tsv as string.

Arguments:

start_dir : str Path to start directory

in_list : list Input file list. Defaults to recursive walk of current directory.

def_tag : str Default Dataverse tag (eg, Data, Documentation, etc) Separate tags with a comma: eg. (‘Data, 2016’)

inc_header : bool Include header row

mime : bool Include automatically determined mimetype

quotype - int integer value or csv quote type. Default = csv.QUOTE_MINIMAL Acceptable values: csv.QUOTE_MINIMAL / 0 csv.QUOTE_ALL / 1 csv.QUOTE_NONNUMERIC / 2 csv.QUOTE_NONE / 3

dump_tsv¶

def dump_tsv(start_dir, filename, in_list=None, **kwargs)

Dumps output of make_tsv manifest to a file.

Arguments:

start_dir : str Path to start directory

in_list : list List of files for which to create manifest entries. Will default to recursive directory crawl

OPTIONAL KEYWORD ARGUMENTS

def_tag : str Default Dataverse tag (eg, Data, Documentation, etc) Separate tags with an easily splitable character: eg. (‘Data, 2016’) - Default - ‘Data’

inc_header : bool Include header for tsv. Default : True

quotype - int integer value or csv quote type. Default : csv.QUOTE_MINIMAL Acceptable values: csv.QUOTE_MINIMAL / 0 csv.QUOTE_ALL / 1 csv.QUOTE_NONNUMERIC / 2 csv.QUOTE_NONE / 3

file_path¶

def file_path(fpath, trunc='') -> str

Create relative file path from full path string

file_path(‘/tmp/Data/2011/excelfile.xlsx’, ‘/tmp/’) ‘Data/2011’ file_path(‘/tmp/Data/2011/excelfile.xlsx’, ‘/tmp’) ‘Data/2011’

Arguments:

fpath : str File location (ie, complete path)

trunc : str Leftmost portion of path to remove

check_lock¶

def check_lock(dv_url, study, apikey) -> bool

Checks study lock status; returns True if locked.

Arguments:

dvurl : str URL of Dataverse installation

study - str Persistent ID of study

apikey : str API key for user

force_notab_unlock¶

def force_notab_unlock(study, dv_url, fid, apikey, try_uningest=True) -> int

Forcibly unlocks and uningests to prevent tabular file processing. Required if mime and filename spoofing is not sufficient.

Returns 0 if unlocked, file id if locked (and then unlocked).

Arguments:

study : str Persistent indentifer of study

dv_url : str URL to base Dataverse installation

fid : str File ID for file object

apikey : str API key for user

try_uningest : bool Try to uningest the file that was locked. - Default - True

uningest_file¶

def uningest_file(dv_url, fid, apikey, study='n/a')

Tries to uningest a file that has been ingested. Requires superuser API key.

Arguments:

dv_url : str URL to base Dataverse installation

fid : int or str File ID of file to uningest

apikey : str API key for superuser

study : str Optional handle parameter for log messages

upload_file¶

def upload_file(fpath, hdl, **kwargs)

Uploads file to Dataverse study and sets file metadata and tags.

Arguments:

fpath : str file location (ie, complete path)

hdl : str Dataverse persistent ID for study (handle or DOI)

kwargs : dict

other parameters. Acceptable keywords and contents are:

dv : str REQUIRED url to base Dataverse installation - eg - ‘https://abacus.library.ubc.ca’

apikey : str REQUIRED API key for user

descr : str OPTIONAL file description

md5 : str OPTIONAL md5sum for file checking

tags : list OPTIONAL list of text file tags. Eg [‘Data’, ‘June 2020’]

dirlabel : str OPTIONAL Unix style relative pathname for Dataverse file path: eg: path/to/file/

nowait : bool OPTIONAL Force a file unlock and uningest instead of waiting for processing to finish

trunc : str OPTIONAL Leftmost portion of path to remove

rest : bool OPTIONAL Restrict file. Defaults to false unless True supplied

mimetype : str OPTIONAL Mimetype of file. Useful if using File Previewers. Mimetype for zip files (application/zip) will be ignored to circumvent Dataverse’s automatic unzipping function. label : str OPTIONAL If included in kwargs, this value will be used for the label timeout = int OPTIONAL Timeout in seconds

restrict_file¶

def restrict_file(**kwargs)

Restrict file in Dataverse study.

Arguments:

kwargs : dict

other parameters. Acceptable keywords and contents are:

One of pid or fid is required pid : str file persistent ID

fid : str file database ID

dv : str REQUIRED url to base Dataverse installation - eg - ‘https://abacus.library.ubc.ca’

apikey : str REQUIRED API key for user

rest : bool On True, restrict. Default True

upload_from_tsv¶

def upload_from_tsv(fil, hdl, **kwargs)

Utility for bulk uploading. Assumes fil is formatted as tsv with headers ‘file’, ‘description’, ‘tags’.

‘tags’ field will be split on commas.

Arguments:

fil : filelike object Open file object or io.IOStream()

hdl : str Dataverse persistent ID for study (handle or DOI)

trunc : str Leftmost portion of Dataverse study file path to remove. - eg - trunc =’/home/user/’ if the tsv field is ‘/home/user/Data/ASCII’ would set the path for that line of the tsv to ‘Data/ASCII’. Defaults to None.

kwargs : dict

other parameters. Acceptable keywords and contents are:

dv : str REQUIRED url to base Dataverse installation - eg - ‘https://abacus.library.ubc.ca’

apikey : str REQUIRED API key for user

rest : bool On True, restrict access. Default False

dataverse_utils.ldc¶

Creates dataverse JSON from Linguistic Data Consortium website page.

Ldc Objects¶

class Ldc(ds.Serializer)

An LDC item (eg, LDC2021T01)

init¶

def __init__(ldc, cert=None)

Returns a dict with keys created from an LDC catalogue web page.

Arguments:

ldc : str Linguistic Consortium Catalogue Number (eg. ‘LDC2015T05’. This is what forms the last part of the LDC catalogue URL. cert : str Path to certificate chain; LDC has had a problem with intermediate certificates, so you can download the chain with a browser and supply a path to the .pem with this parameter

ldcJson¶

@property
def ldcJson()

Returns a JSON based on the LDC web page scraping

dryadJson¶

@property
def dryadJson()

LDC metadata in Dryad JSON format

dvJson¶

@property
def dvJson()

LDC metadata in Dataverse JSON format

embargo¶

@property
def embargo()

Boolean indicating embargo status

fileJson¶

@property
def fileJson(timeout=45)

Returns False: No attached files possible at LDC

files¶

@property
def files()

Returns None. No files possible

fetch_record¶

def fetch_record(url=None, timeout=45)

Downloads record from LDC website

make_ldc_json¶

def make_ldc_json()

Returns a dict with keys created from an LDC catalogue web page.

name_parser¶

@staticmethod
def name_parser(name)

Returns lastName/firstName JSON snippet from name

Arguments:

name : str A name

make_dryad_json¶

def make_dryad_json(ldc=None)

Creates a Dryad-style dict from an LDC dictionary

Arguments:

ldc : dict Dictionary containing LDC data. Defaults to self.ldcJson

find_block_index¶

@staticmethod
def find_block_index(dvjson, key)

Finds the index number of an item in Dataverse’s idiotic JSON list

Arguments:

dvjson : dict Dataverse JSON

key : str key for which to find list index

make_dv_json¶

def make_dv_json(ldc=None)

Returns complete Dataverse JSON

Arguments:

ldc : dict LDC dictionary. Defaults to self.ldcJson

upload_metadata¶

def upload_metadata(**kwargs) -> dict

Uploads metadata to dataverse

Returns json from connection attempt.

Arguments:

kwargs:

url : str base url to Dataverse

key : str api key

dv : str Dataverse to which it is being uploaded

API Reference¶

dataverse_utils¶

dataverse_utils.dvdata¶

Study Objects¶

__init__¶

get_version¶

set_version¶

fix_licence¶

production_location¶

File Objects¶

__init__¶

download_file¶

del_tempfile¶

produce_digest¶

verify¶

dataverse_utils.scripts.dv_record_copy¶

parsley¶

main¶

dataverse_utils.scripts.dv_study_migrator¶

parsley¶

upload_file_to_target¶

remove_target_files¶

main¶

dataverse_utils.scripts.dv_ldc_uploader¶

parse¶

upload_meta¶

main¶

dataverse_utils.scripts.dv_pg_facet_date¶

parsely¶

parse_dtype¶

write_old¶

publicationdate is assumed to be a datetime.datetime instance.¶

write_sql¶

get_datetime¶

fetch_date_api¶

reindex¶

main¶

dataverse_utils.scripts.dv_release¶

argp¶

Dverse Objects¶

__init__¶

study_list¶

unreleased¶

Study Objects¶

__init__¶

status_ok¶

release_me¶

main¶

dataverse_utils.scripts.dv_del¶

delstudy¶

conf¶

getsize¶

parsley¶

main¶

dataverse_utils.scripts.dv_replace_licence¶

parsley¶

replace_licence¶

republish¶

print_stat¶

main¶

dataverse_utils.scripts.dv_upload_tsv¶

parse¶

main¶

dataverse_utils.scripts.dv_manifest_gen¶

parse¶

quotype¶

main¶

dataverse_utils.dataverse_utils¶

DvGeneralUploadError Objects¶

Md5Error Objects¶

make_tsv¶

dump_tsv¶

file_path¶

check_lock¶

force_notab_unlock¶

uningest_file¶

upload_file¶

restrict_file¶

upload_from_tsv¶

dataverse_utils.ldc¶

init¶

init¶

init¶

init¶

init¶