Why Should I Use This?
The goal of pyPreservica is to allow you to make use of the Preservica Entity API for reading and writing objects within a Preservica repository without having to manage the underlying REST HTTPS requests and XML parsing. The library provides a level of abstraction which reflects the underlying data model, such as structural and information objects.
The pyPreservica library allows Preservica users to build applications which interact with the repository such as metadata synchronisation with 3rd party systems etc.
Hint
Access to the Preservica API’s for the cloud hosted system does depend on which Preservica Edition has been licensed. See https://preservica.com/digital-archive-software/products-editions for details.
SDK Features
Entity API Features
Fetch and Update Entity Objects (Folders, Assets, Content Objects)
Add, Delete and Update External Identifiers
Add, Delete and Update Descriptive Metadata Fragments
Change Security tags on Folders and Assets
Create new Folder Entities
Move Assets and Folders within the repository
Deleting Assets and Folders
Fetch Folders and Assets belonging to parent Folders
Retrieve Representations, Generations & Bitstreams from Assets
Download digital files and thumbnails
Fetch lists of changed entities over the last n days
Request information on completed integrity checks
Add or remove asset and folder icons
Replace existing content objects within an Asset
Export OPEX Package
Fetch audit trail events on Entities and across the repository
Create Relationships between Assets
Content API Features
Fetch a list of indexed Solr Fields
Search based on a single query term
Filtered searches on indexed fields
Upload API Features
Create single Content Object Packages with multiple Representations
Create multiple Content Object Packages with multiple Representations
Upload packages to Preservica
Spreadsheet Metadata
Ingest Web Video
Ingest Twitter Feeds
Admin API Features
Schema Management (XML Templates, XSD Schema’s & XSLT Transforms)
User Management (create and remove user accounts)
Security Tags (add and remove security tags)
Retention Management API Features
Create new retention policies
Delete retention policies
Update retention policies
Assign retention policies to entities
Workflow API Features
Get Workflow Contexts
Get Workflow Instance
Start Workflow Instances
Webhook API Features
Subscribe to Webhook endpoints
Unsubscribe
List Subscriptions
Background
They key to working with the pyPreservica library is that the services follow the Preservica core data model closely.
The Preservica data model represents a hierarchy of entities, starting with the structural objects which are used to represent aggregations of digital assets. Structural objects define the organisation of the data. In a library context they may be referred to as collections, in an archival context they may be Fonds, Sub-Fonds, Series etc and in a records management context they could be simply a hierarchy of folders or directories.
These structural objects may contain other structural objects in the same way as a computer filesystem may contain folders within folders.
Within the structural objects comes the information objects. These objects which are sometimes referred to as the digital assets are what PREMIS defines as an Intellectual Entity. Information objects are considered a single intellectual unit for purposes of management and description: for example, a book, document, map, photograph or database etc.
Representations are used to define how the information object are composed in terms of technology and structure. For example, a book may be represented as a single multiple page PDF, a single eBook file or a set of single page image files.
Representations are usually associated with a use case such as access or long-term preservation. All Information objects have a least one representation defined by default. Multiple representations can be either created outside of Preservica through a process such as digitisation or within Preservica through preservation processes such a normalisation.
Content Objects represent the components of the asset. Simple assets such as digital images may only contain a single content object whereas more complex assets such as books or 3d models may contain multiple content objects. In most cases content objects will map directly to digital files or bitstreams.
Generations represent changes to content objects over time, as formats become obsolete new generations may need to be created to make the information accessible.
Bitstreams represent the actual computer files as ingested into Preservica, i.e. the TIFF photograph or the PDF document.
PIP Installation
pyPreservica is available from the Python Package Index (PyPI)
https://pypi.org/project/pyPreservica/
pyPreservica is built and tested against Python 3.8. Older versions of Python may not work.
To install pyPreservica, simply run this simple command in your terminal of choice:
$ pip install pyPreservica
or you can install in a virtual python environment using:
$ pipenv install pyPreservica
pyPreservica is under active development and the latest version is installed using
$ pip install --upgrade pyPreservica
Get the Source Code
pyPreservica is developed on GitHub, where the code is always available.
You can clone the public repository
$ git clone git://github.com/carj/pyPreservica.git
Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/carj/pyPreservica
Support
pyPreservica is 3rd party open source client and is not affiliated or supported by Preservica Ltd
For announcements about new versions and discussion of pyPreservica please subscribe to the google groups forum https://groups.google.com/g/pypreservica
Bug reports can be raised directly on either GitHub or on the google group forum
General questions and queries about using pyPreservica posted on the google group forum above.
Examples
Using the python console, create the entity API client object and request an Asset (Information Object) by its unique reference and display some of its attributes.
All entities within the Preservica system have one unique reference which can be used to retrieve them.
The reference used to fetch entities (Assets, Folders) is the Preservica internal unique identifier. This is a universally unique identifier (UUID)
You can find the reference when viewing the object metadata within Explorer. Later on we will look at how we can fetch entities using other 3rd party external identifiers which may be more meaningful such as ISBNs DOIs etc.
To create the client object you will need valid credentials to connect to the Preservica server. See the following section on available authentication options.
>>> from pyPreservica import *
>>> client = EntityAPI()
>>> client
pyPreservica version: 0.8.5 (Preservica 6.2 Compatible)
Connected to: us.preservica.com Version: 6.2.0 as test@test.com
>>> asset = client.asset("dc949259-2c1d-4658-8eee-c17b27a8823d")
>>> asset.reference
'dc949259-2c1d-4658-8eee-c17b27a8823d'
>>> asset.title
'LC-USZ62-20901'
>>> asset.parent
'ae108c8f-b058-4228-b099-6049175d2f0c'
>>> asset.security_tag
'open'
>>> asset.entity_type
<EntityType.ASSET: 'IO'>
If your credentials are valid, pyPreservica returns a client object which is the connection to the server. Printing the client returns information about the connection such as the server and the user name etc. This can be useful to check that you are connected to the correct system.
All entities have a parent reference attribute, for Assets this always points to the parent Folder.
For Content Objects the parent points to the Asset and for Folders it points to the parent Folder if it exists.
Folders at the root level of the repository do not have a parent and the attribute returns the special Python
value of None
This example shows how pyPreservica can be used to upload and ingest a local file, picture.tiff
into Preservica using the UploadAPI class. The tiff file will be ingested as a new Asset object inside the existing Preservica folder given
by the folder UUID.
The simple_asset_package
function creates the package, in this case an XIPv6 formatted package and the upload_zip_package
method
uploads it directly to the Preservica server using the S3 protocol.
>>> from pyPreservica import *
>>> client = UploadAPI()
>>> folder = "dc949259-2c1d-4658-8eee-c17b27a8823d"
>>> zip_p = simple_asset_package(preservation_file="picture.tiff", parent_folder=folder)
>>> client.upload_zip_package(zip_p)
Authentication
pyPreservica provides 4 different methods for authentication. The library requires the username and password of a Preservica user and an optional Tenant identifier along with the server hostname.
Tip
The Tenant parameter is now optional when connecting to a Preservica 6.3 system.
1 Method Arguments
Include the user credentials as arguments to the EntityAPI Class
from pyPreservica import *
client = EntityAPI(username="test@test.com", password="123444",
tenant="PREVIEW", server="preview.preservica.com")
If you don’t want to include your Preservica credentials within your python script because you are sharing scripts or using a version control system then one of the following two methods should be used.
2 Environment Variable
Export the credentials as environment variables as part of the session
$ export PRESERVICA_USERNAME="test@test.com"
$ export PRESERVICA_PASSWORD="123444"
$ export PRESERVICA_TENANT="PREVIEW"
$ export PRESERVICA_SERVER="preview.preservica.com"
$ python3
from pyPreservica import *
client = EntityAPI()
3 Properties File
Create a properties file called “credentials.properties” with the following property names and save to the working directory
[credentials]
username=test@test.com
password=123444
tenant=PREVIEW
server=preview.preservica.com
from pyPreservica import *
client = EntityAPI()
You can create a new credentials.properties file automatically using the save_config()
method
from pyPreservica import *
client = EntityAPI(username="test@test.com", password="123444",
tenant="PREVIEW", server="preview.preservica.com")
client.save_config()
4 Shared Secrets
pyPreservica now supports authentication using shared secrets rather than a login account username and password. This allows a trusted external applications such as pyPreservica to acquire a Preservica API authentication token without having to use a set of login credentials.
This option is useful if you want to provide limited API access to a 3rd party without providing login access to Preservica.
To use the shared secret authentication you need to add a secure secret key to your Preservica system.
The username, password, tenant and server attributes are used as normal, the password field now holds the shared secret and not the users password.
from pyPreservica import *
client = EntityAPI(username="test@test.com", password="shared-secret", tenant="PREVIEW",
server="preview.preservica.com", use_shared_secret=True)
If you are using a credentials.properties file then
from pyPreservica import *
client = EntityAPI(use_shared_secret=True)
2 Factor Authentication
pyPreservica now supports the new 2-Factor authentication for APIs introduced with Preservica 6.8
The Preservica system should be first setup for 2-Factor authentication and the one time password key used to seed the 2FA (HMAC-Based One-Time Password Algorithm) should be retained and used with the API.
The one time password or seed key is available to view and should be saved when setting up the 2FA for a user. You can find the two factor seed key from the user 2FA setup page under the “Reveal Key” button at the bottom of the page.
Keep this key secret along with your account password as it will be required when authenticating the API calls.
To call pyPreservica once 2-Factor authentication process has been setup, you need the username and password as normal along with the additional two factor key.
You can pass the additional two factor key as an argument to the constructor for the API classes or use environment variables or the credentials file.
from pyPreservica import *
client = EntityAPI(username="test@test.com", password="my-login-password", tenant="PREVIEW",
server="preview.preservica.com", two_fa_secret_key="AJC5DEGUVM6UQ1TT")
The environment variable for holding the 2 factor seed key is called PRESERVICA_2FA_TOKEN and the credential file property name is twoFactorToken.
$ export PRESERVICA_2FA_TOKEN=AJC5DEGUVM6UQ1TT
i.e
[credentials]
username=test@test.com
password=123444
tenant=PREVIEW
server=preview.preservica.com
twoFactorToken=AJC5DEGUVM6UQ1TT
Tip
Preservica uses time based One Time Passwords (OTP), this means the time on your local machine must match time on the server.
SSL Certificates
pyPreservica will by default connect to servers which use the https:// protocol and will always validate certificates when connected via https.
For Enterprise on Premise customers on secure networks, you can change the default protocol to use http:// via the constructor.
client = EntityAPI(protocol="http")
pyPreservica uses the Certifi project to provide SSL certificate validation.
Self-signed certificates used by on-premise deployments are not part of the Certifi certification authority (CA) bundle and therefore need to be set explicitly.
The CA bundle is a file that contains root and intermediate certificates. The end-entity certificate along with a CA bundle constitutes the certificate chain.
For on-premise deployments the trusted CAs can be specified through the REQUESTS_CA_BUNDLE
environment variable. e.g.
$ export REQUESTS_CA_BUNDLE=/usr/local/share/ca-certificates/my-server.cert
Application Logging
You can add logging to your pyPreservica scripts by simply including the following
import logging
from pyPreservica import *
logging.basicConfig(level=logging.DEBUG)
client = EntityAPI()
This will log all messages from level DEBUG or higher to standard output, i.e the console.
When logging to files, the main thing to be wary of is that log files need to be rotated regularly. The application needs to detect the log file being renamed and handle that situation. While Python provides its own file rotation handler, it is best to leave log rotation to dedicated tools such as logrotate. The WatchedFileHandler will keep track of the log file and reopen it if it is rotated, making it work well with logrotate without requiring any specific signals.
Here’s a sample implementation.
import logging
import logging.handlers
import os
from pyPreservica import *
handler = logging.handlers.WatchedFileHandler("pyPreservica.log")
formatter = logging.Formatter(logging.BASIC_FORMAT)
handler.setFormatter(formatter)
root = logging.getLogger()
root.setLevel(logging.DEBUG)
root.addHandler(handler)
client = EntityAPI()
Low Level Logging
pyPreservica now provides low level event hooks into the underlying API requests to the server. To use this functionality, create a call back function with the following signature call_back(r, *args, **kwargs)
The first argument is the request object which can be queried.
This API allows clients to do things such as audit all the API endpoints which are called for example.
def print_url(r, *args, **kwargs):
print(r.url)
client = EntityAPI(request_hook=print_url)
for f in client.descendants():
pass
https://us.preservica.com/api/accesstoken/login https://us.preservica.com/api/entity/versiondetails/version https://us.preservica.com/api/user/details https://us.preservica.com/api/entity/root/children?start=0&max=100