API Reference¶
fcheck¶
Manifest generator for data files.
Produces a text file with user specificied checksums for all files from the top of a specified tree and checks line length and ASCII character status for text files.
For statistics program files: SAS .sas7bdat SPSS .sav Stata .dta
Checker() will report number of cases and variables as rows and columns respectively.
Checker Objects¶
class Checker()
A collection of various tools attached to a file
__init__¶
| __init__(fname: str)
Initializes Checker instance
fname : str
Path to file
__del__¶
| __del__()
Destructor closes file
produce_digest¶
| produce_digest(prot: str = 'md5', blocksize: int = 2*16) -> str
Returns hex digest for object
fname : str
Path to a file object
prot : str
Hash type. Supported hashes: 'sha1', 'sha224', 'sha256',
'sha384', 'sha512', 'blake2b', 'blake2s', 'md5'.
Default: 'md5'
blocksize : int
Read block size in bytes
flat_tester¶
| flat_tester(**kwargs) -> dict
Checks file for line length and number of records.
Returns a dictionary:
{'min_cols': int, 'max_cols' : int, 'numrec':int, 'constant' : bool}
non_ascii_tester¶
| non_ascii_tester(**kwargs) -> list
Returns a list of dicts of positions of non-ASCII characters in a text file.
[{'row': int, 'col':int, 'char':str}...]
fname : str
Path/filename
Keyword arguments:
flatfile : bool
— Perform rectangularity check. If False, returns dictionary
with all values as 'N/A'
null_count¶
| null_count(**kwargs) -> dict
Returns an integer count of null characters in the file (‘\x00’) or None if skipped
Keyword arguments:
flatfile : bool
— Test is useless if not a text file. If False, returns 'N/A'
dos¶
| dos(**kwargs) -> bool
Checks for presence of carriage returns in file
Returns True if a carriage return ie, ord(13) is present
Keyword arguments:
flatfile : bool
— Perform rectangularity check. If False, returns dictionary
with all values as 'N/A'
manifest¶
| manifest(out: str = 'txt', **kwargs)
Returns your desired output type as string
out : str — Acceptable values are ‘txt’, ‘csv’, ‘json’
Accepted keywords and defaults: digest : str — Hash algorithm. Default ‘md5’
flat : bool
— Flat file checking. Default True
nonascii : bool
— Check for non-ASCII characters. Default True
dos : bool
— check for Windows CR/LF combo. Default True
flatfile : bool
— Perform rectangularity check. If False, returns dictionary
with all values as 'N/A'
headers : bool
— Include csv header (only has any effect with out='csv')
Default is False