Entity API¶
Making a call to the Preservica repository is very simple.
Begin by importing the pyPreservica module
from pyPreservica import *
Now, let’s create the EntityAPI
client
client = EntityAPI()
Fetching Entities (Assets, Folders & Content Objects)¶
Fetch an Asset by its reference and print its attributes
asset = client.asset("9bad5acf-e7a1-458a-927d-2d1e7f15974d")
print(asset.reference)
print(asset.title)
print(asset.description)
print(asset.security_tag)
print(asset.parent)
print(asset.entity_type)
We can also fetch the same attributes for both Folders
folder = client.folder("0b0f0303-6053-4d4e-a638-4f6b81768264")
print(folder.reference)
print(folder.title)
print(folder.description)
print(folder.security_tag)
print(folder.parent)
print(folder.entity_type)
and Content Objects
content_object = client.content_object("1a2a2101-6053-4d4e-a638-4f6b81768264")
print(content_object.reference)
print(content_object.title)
print(content_object.description)
print(content_object.security_tag)
print(content_object.parent)
print(content_object.entity_type)
We can fetch any of Assets, Folders and Content Objects using the entity type and the unique reference
asset = client.entity(EntityType.ASSET, "9bad5acf-e7a1-458a-927d-2d1e7f15974d")
folder = client.entity(EntityType.FOLDER, asset.parent)
To get a list of parent Folders of an Asset all the way to the root of the repository
folder = client.folder(asset.parent)
print(folder.title)
while folder.parent is not None:
folder = client.folder(folder.parent)
print(folder.title)
Fetching Children of Entities¶
The immediate children of a Folder can also be retrieved using the library.
To get a set of all the root Folders use
root_folders = client.children(None)
or
root_folders = client.children()
To get a set of children of a particular Folder use
entities = client.children(folder.reference)
To get the siblings of an Asset you can use
entities = client.children(asset.parent)
The set of entities returned may contain both Assets and other Folders. The default size of the result set is 50 items. The size can be configured and for large result sets paging is available.
next_page = None
while True:
root_folders = client.children(None, maximum=10, next_page=next_page)
for e in root_folders.results:
print(f'{e.title} : {e.reference} : {e.entity_type}')
if not root_folders.has_more:
break
else:
next_page = root_folders.next_page
A version of this method is also available as a generator function which does not require explicit paging. This version returns a lazy iterator which does the paging internally. It will default to 100 items between server requests
for entity in client.descendants():
print(entity.title)
You can pass a parent reference to get the children of any folder in the same way as the explict paging version
for entity in client.descendants(folder.parent):
print(entity.title)
This is the preferred way to get children of folders as the paging is managed automatically.
If you only need the folders or Assets from a parent you can filter the results using a pre-defined filter
for asset in filter(only_assets, client.descendants(asset.parent)):
print(asset.title)
or
for folders in filter(only_folders, client.descendants(asset.parent)):
print(folders.title)
Note
Entities within the returned set only contain the attributes (type, reference and title). If you need the full object you have to request it.
You can request the entity back without knowing exactly what type it is by using the entity()
call
for f in client.descendants():
e = client.entity(f.entity_type, f.reference)
print(e)
If you want all the entities below a point in the hierarchy, i.e a recursive list of all folders and Assets the you can
call all_descendants()
this is a generator function which returns a lazy iterator which will make
repeated calls to the server for each page of results.
The following will return all entities within the repository from the root folders down
for e in client.all_descendants():
print(e.title)
again if you need a list of every Asset in the system you can filter using
for asset in filter(only_assets, client.all_descendants()):
print(asset.title)
Creating new Folders¶
Folder objects can be created directly in the repository, the create_folder()
function takes 3
mandatory parameters, folder title, description and security tag.
new_folder = client.create_folder("title", "description", "open")
print(new_folder.reference)
This will create a folder at the top level of the repository. You can create child folders by passing the reference of the parent as the last argument.
new_folder = client.create_folder("title", "description", "open", folder.reference)
print(new_folder.reference)
assert new_folder.parent == folder.reference
Adding Physical Assets¶
Preservica supports the creation of intellectual entities which correspond to physical objects. These are similar to regular assets, but they do not point to digital files like regular assets.
To use Physical Assets the system needs a system property set to active the functionality, this can be done by the Preservica help desk.
parent = client.folder("9bad5acf-e7a1-458a-927d-2d1e7f15974d")
physical_asset = client.add_physical_asset("title", "description", parent, "open")
print(physical_asset.reference)
Physical assets support 3rd party identifiers, thumbnails and descriptive metadata in the same way as regular assets.
client.add_identifier(physical_asset, "ISBN", "978-3-16-148410-0")
client.add_thumbnail(physical_asset, "icon.png")
Updating Entities¶
We can update either the title or description attribute for assets,
folders and content objects using the save()
method
asset = client.asset("9bad5acf-e7a1-458a-927d-2d1e7f15974d")
asset.title = "New Asset Title"
asset.description = "New Asset Description"
asset = client.save(asset)
folder = client.folder("0b0f0303-6053-4d4e-a638-4f6b81768264")
folder.title = "New Folder Title"
folder.description = "New Folder Description"
folder = client.save(folder)
content_object = client.content_object("1a2a2101-6053-4d4e-a638-4f6b81768264")
content_object.title = "New Content Object Title"
content_object.description = "New Content Object Description"
content_object = client.save(content_object)
This method can also be used to set the Type of an asset or folder. By default Information objects have a type “Asset” and Structural objects have a type “Folder”. You can use the API to change these defaults for example you may want to use the type field to set the level of description of a Structural object to “Fonds” or “Series” etc.
To change the type use the custom_type attribute on the object, e.g.
folder = client.folder("9bad5acf-e7a1-458a-927d-2d1e7f15974d")
folder.custom_type = "Series"
folder = client.save(folder)
asset = client.asset("9bad5acf-e7a1-458a-927d-2d1e7f15974d")
asset.custom_type = "Manuscript"
asset = client.save(asset)
If you want to change the type back, just set the value to None
asset = client.asset("9bad5acf-e7a1-458a-927d-2d1e7f15974d")
asset.custom_type = None
asset = client.save(asset)
Security Tags¶
To change the security tag on an Asset or Folder we have a separate API. Since this may be a long running process. You can choose either a asynchronous (non-blocking) call which returns immediately or synchronous (blocking call) which waits for the security tag to be changed before returning.
This is the asynchronous call which returns immediately returning a process id
pid = client.security_tag_async(entity, new_tag)
You can determine the current status of the asynchronous call by passing the argument to get_async_progress
status = client.get_async_progress(pid)
The synchronous version will block until the security tag has been updated on the entity. This call does not recursively change entities within a folder. It only applies to the named entity passed as an argument.
entity = client.security_tag_sync(entity, new_tag)
3rd Party External Identifiers¶
3rd party or external identifiers are a useful way to provide additional names or identities to objects to provide an alternate way of accessing them. For example if you are synchronising metadata between an external metadata catalogue and Preservica adding the catalogue identifiers to the Preservica objects allows the catalogue to query Preservica using its own ids.
Each Preservica entity can hold as many external identifiers as you need.
Note
Adding, Updating and Deleting external identifiers is only available in version 6.1 and above
We can add external identifiers to either Assets, Folders or Content Objects. External identifiers have a name or type and a value. External identifiers do not have to be unique in the same way as internal identifiers. The same external identifiers can be added to multiple entities to form sets of objects.
asset = client.asset("9bad5acf-e7ce-458a-927d-2d1e7f15974d")
client.add_identifier(asset, "ISBN", "978-3-16-148410-0")
client.add_identifier(asset, "DOI", "https://doi.org/10.1109/5.771073")
client.add_identifier(asset, "URN", "urn:isan:0000-0000-2CEA-0000-1-0000-0000-Y")
Fetch external identifiers on an entity. This call returns a set of tuples (identifier_type, identifier_value)
identifiers = client.identifiers_for_entity(folder)
for identifier in identifiers:
identifier_type = identifier[0]
identifier_value = identifier[1]
You can search the repository for entities with matching external identifiers. The call returns a set of objects which may include any type of entity.
for e in client.identifier("ISBN", "978-3-16-148410-0"):
print(e.entity_type, e.reference, e.title)
Note
Entities within the set only contain the attributes (type, reference and title). If you need the full object you have to request it.
For example
for ident in client.identifier("DOI", "urn:nbn:de:1111-20091210269"):
entity = client.entity(ident.entity_type, ident.reference)
print(entity.title)
print(entity.description)
To delete identifiers attached to an entity
client.delete_identifiers(entity)
Will delete all identifiers on the entity
client.delete_identifiers(entity, identifier_type="ISBN")
Will delete all identifiers which have type “ISBN”
client.delete_identifiers(entity, identifier_type="ISBN", identifier_value="978-3-16-148410-0")
Will only delete identifiers which match the type and value
Descriptive Metadata¶
You can query an entity to determine if it has any attached descriptive metadata using the metadata attribute. This returns a dictionary object the dictionary key is a url which can be used to the fetch metadata and the value is the schema name
for url, schema in entity.metadata.items():
print(url, schema)
The descriptive XML metadata document can be returned as a string by passing the key of the map (url)
to the metadata()
method
for url in entity.metadata:
xml_string = client.metadata(url)
An alternative is to call the metadata_for_entity
directly
xml_string = client.metadata_for_entity(entity, "https://www.person.com/person")
this will fetch the first metadata document which matches the schema argument on the entity
If you need all the descriptive XML fragments attached to an Asset or Folder you can call all_metadata
this is a Generator which returns a Tuple containing the schema as the first item and the xml document in the second.
for metadata in client.all_metadata(entity):
schema = metadata[0]
xml_string = metadata[1]
Metadata can be attached to entities either by passing an XML document as a string
folder = entity.folder("723f6f27-c894-4ce0-8e58-4c15a526330e")
xml = "<person:Person xmlns:person='https://www.person.com/person'>" \
"<person:Name>Bob Smith</person:Name>" \
"<person:Phone>01234 100 100</person:Phone>" \
"<person:Email>test@test.com</person:Email>" \
"<person:Address>Abingdon, UK</person:Address>" \
"</person:Person>"
folder = client.add_metadata(folder, "https://www.person.com/person", xml)
or by reading the metadata from a file
with open("DublinCore.xml", 'r', encoding="utf-8") as md:
asset = client.add_metadata(asset, "http://purl.org/dc/elements/1.1/", md)
Descriptive metadata can also be updated to amend values or change the document structure To update an existing metadata document call
client.update_metadata(entity, schema, xml_string)
For example the following python fragment appends a new element to an existing document.
folder = client.folder("723f6f27-c894-4ce0-8e58-4c15a526330e") # call into the API
for url, schema in folder.metadata.items():
if schema == "https://www.person.com/person":
xml_string = client.metadata(url) # call into the API
xml_document = ElementTree.fromstring(xml_string)
postcode = ElementTree.Element('{https://www.person.com/person}Postcode')
postcode.text = "OX14 3YS"
xml_document.append(postcode)
xml_string = ElementTree.tostring(xml_document, encoding='UTF-8').decode("utf-8")
entity.update_metadata(folder, schema, xml_string) # call into the API
Relationships Between Entities¶
Preservica allows arbitrary relationships between entities such as Assets and Folders. These relationships appear in the Preservica user interface as links from one entity to another. All entities have existing vertical parent child relationships which determine the level of description for an asset. These relationships are additional relationships which relate different entities across the repository.
For example relationships may be used to link different editions of the same work, or a translation of an existing document etc.
Any type of relationship is supported, for example The Dublin Core Metadata Initiative provide a set of standard relationships between entities, and these have been provided as part of the Relationship class, but any text string is allowed for the relationship type.
>>>Relationship.DCMI_isVersionOf
http://purl.org/dc/terms/isVersionOf
>>>Relationship.DCMI_isReplacedBy
http://purl.org/dc/terms/isReplacedBy
Relationships are created between two entities A and B and have a type, for example;
A isVersionOf B.
This is a relationship from A to B. You can also create links going in the other direction and have bi-directional links between the same assets. For example;
A isVersionOf B and B hasVersion A.
To create a relationship between entities use the add_relation
method.
A_asset = client.asset("de1c32a3-bd9f-4843-a5f1-46df080f83d2")
B_asset = client.asset("683f9db7-ff81-4859-9c03-f68cfa5d9c3d")
client.add_relation(A_asset, Relationship.DCMI_isVersionOf, B_asset)
client.add_relation(B_asset, Relationship.DCMI_hasVersion, A_asset)
client.add_relation(A_asset, "Supersedes", B_asset)
Note
The Relationship API is only available when connected to Preservica version 6.3.1 or above
You can list the relationships from an asset using:
for r in client.relationships(A_asset):
print(r)
This returns a Generator of Relationship
objects.
To delete relationships between assets use:
client.delete_relationships(A_asset)
This will delete all relationships FROM the specified entity to another entity, It does not delete relationships TO this entity.
If only need to delete a specific relationship, you can pass the relationship name as a second argument
client.delete_relationships(A_asset, "Supersedes")
Representations, Content Objects & Generations¶
Each asset in Preservica contains one or more representations, such as Preservation or Access etc.
To get a list of all the representations of an Asset
for representation in client.representations(asset):
print(representation.rep_type)
print(representation.name)
print(representation.asset.title)
Each Representation will contain one or more Content Objects. Simple Assets contain a single Content Object whereas more complex objects such as 3D models, books, multi-page documents may have several content objects.
for content_object in client.content_objects(representation):
print(content_object.reference)
print(content_object.title)
print(content_object.description)
print(content_object.parent)
print(content_object.metadata)
print(content_object.asset.title)
Each content object will contain a least one Generation, migrated content may have multiple Generations.
for generation in client.generations(content_object):
print(generation.original)
print(generation.active)
print(generation.content_object)
print(generation.format_group)
print(generation.effective_date)
print(generation.bitstreams)
Each Generation has a list of BitStream ids which can be used to fetch the actual content from the server or fetch technical metadata about the bitstream itself
for bitstream in generation.bitstreams:
print(bitstream.filename)
print(bitstream.length)
for algorithm,value in bitstream.fixity.items():
print(algorithm, value)
If you have an Asset object and you would like to fetch all the available bitstreams you would use something like:
for representation in client.representations(asset):
for content_object in client.content_objects(representation):
for generation in client.generations(content_object):
for bitstream in generation.bitstreams:
If you only need the current or active Generations, then you can use the following short cut method which returns each Bitstream from all the Representations and Content Objects within the Asset.
for bitstream in client.bitstreams_for_asset(asset):
do_something(bitstream)
The actual content files can be download using bitstream_content()
client.bitstream_content(bitstream, bitstream.filename)
To download all the access bitstreams to the current folder you would use.
for representation in client.representations(asset):
if representation.rep_type == "Access":
for content_object in client.content_objects(representation):
for generation in client.generations(content_object):
for bitstream in generation.bitstreams:
client.bitstream_content(bitstream, bitstream.filename)
Integrity Check History¶
You can request the history of all integrity checks which have been carried out on a bitstream
for bitstream in generation.bitstreams:
for check in client.integrity_checks(bitstream):
print(check)
The list of returned checks includes both full and quick integrity checks.
Note
This call does not start a new check, it only returns information about previous checks.
Moving Entities¶
We can move entities between folders using the move
call
client.move(entity, dest_folder)
Where entity is the object to move either an Asset or Folder and the second argument is destination folder where the entity is moved to.
Folders can be moved to the root of the repository by passing None as the second argument.
entity = client.move(folder, None)
The move()
call is an alias for move_sync()
which is a synchronous (blocking call)
entity = client.move_sync(entity, dest_folder)
An asynchronous (non-blocking) version is also available which returns a progress id.
pid = client.move_async(entity, dest_folder)
You can determine the completed status of the asynchronous move call by passing the
argument to get_async_progress
status = client.get_async_progress(pid)
Deleting Entities¶
You can initiate and approve a deletion request using the API.
Note
Deletion is a two stage process within Preservica and requires two distinct sets of credentials. To use the delete functions you must be using the “credentials.properties” authentication method.
Note
The Deletion API is only available when connected to Preservica version 6.2 or above
Add manager.username and manager.password to the credentials file.
[credentials]
username=
password=
server=
tenant=
manager.username=
manager.password=
Deleting an asset
asset_ref = client.delete_asset(asset, "operator comments", "supervisor comments")
print(asset_ref)
Deleting a folder
folder_ref = client.delete_folder(folder, "operator comments", "supervisor comments")
print(folder_ref)
Warning
This API call deletes entities within the repository, it both initiates and approves the deletion request and therefore must be used with care.
Finding Updated Entities¶
We can query Preservica for entities which have changed over the last n days using
for e in client.updated_entities(previous_days=30):
print(e)
The argument is the number of previous days to check for changes. This call does paging internally.
Downloading Files¶
The pyPreservica library also provides a web service call which is part of the content API which allows downloading of digital content directly without having to request the Representations and Generations first. This call is a short-cut to request the Bitstream from the latest Generation of the first Content Object in the Access Representation of an Asset. If the asset does not have an Access Representation then the Preservation Representation is used.
For very simple assets which comprise a single digital file in a single Representation then this call will probably do what you expect.
asset = client.asset("edf403d0-04af-46b0-ab21-e7a620bfdedf")
filename = client.download(asset, "asset.jpg")
For complex multi-part assets which have been through preservation actions it may be better to use the data model
and the bitstream_content()
function to fetch the exact bitstream you need.
Events on Specific Entities¶
List actions performed against this entity
entity_events()
returns a iterator which contains events on an entity, either an asset or folder
asset = client.asset("edf403d0-04af-46b0-ab21-e7a620bfdedf")
for event in client.entity_events(asset)
print(event)
Events Across Entities¶
List actions performed against all entities within the repository. The event is a dict()
object containing
the event attributes. This call is generator function which returns the events as needed.
for event in client.all_events():
print(event)
Ingest Events¶
Return a generator of ingest events over the last n days
for ingest_event in client.all_ingest_events(previous_days=1):
print(ingest_event)
Get, Add or Remove asset and folder icons¶
You can now add and remove icons on assets and folders using the API. The icons will be displayed in the Explorer and Universal Access interfaces.
folder = client.folder("edf403d0-04af-46b0-ab21-e7a620bfdedf")
client.add_thumbnail(folder, "../my-icon.png")
client.remove_thumbnail(folder)
and for assets
asset = client.asset("edf403d0-04af-46b0-ab21-e7a620bfdedf")
client.add_thumbnail(asset, "../my-icon.png")
client.remove_thumbnail(asset)
We also have a function to fetch the thumbnail image for an asset or folder
asset = client.asset("edf403d0-04af-46b0-ab21-e7a620bfdedf")
filename = client.thumbnail(asset, "thumbnail.png")
You can specify the size of the thumbnail by passing a second argument
asset = client.asset("edf403d0-04af-46b0-ab21-e7a620bfdedf")
filename = client.thumbnail(asset, "thumbnail.png", Thumbnail.LARGE) ## 400×400 pixels
filename = client.thumbnail(asset, "thumbnail.png", Thumbnail.MEDIUM) ## 150×150 pixels
filename = client.thumbnail(asset, "thumbnail.png", Thumbnail.SMALL) ## 64×64 pixels
Replacing Content Objects¶
Preservica now supports replacing individual Content Objects within an Asset. The use case here is you have uploaded a large digitised object such as book and you subsequently discover that a page has been digitised incorrectly. You would like to replace a single page (Content Object) without having to delete and re-ingest the complete Asset.
The non-blocking (asynchronous) API call will replace the last active Generation of the Content Object
content_object = client.content_object('0f2997f7-728c-4e55-9f92-381ed1260d70')
file = "C:/book/page421.tiff"
pid = client.replace_generation_async(content_object, file)
This will return a process id which can be used to monitor the replacement workflow using
status = client.get_async_progress(pid)
By default the API will generate a new fixity value on the client using the same fixity algorithm as the original Generation you are replacing. If you want to use a different fixity algorithm or you want to use a pre-calculated or existing fixity value you can specify the algorithm and value.
content_object = client.content_object('0f2997f7-728c-4e55-9f92-381ed1260d70')
file = "C:/book/page421.tiff"
pid = client.replace_generation_async(content_object, file, fixity_algorithm='SHA1', fixity_value='2fd4e1c67a2d28fced849ee1bb76e7391b93eb12')
There is also an synchronous or blocking version which will wait for the replace workflow to complete before returning back to the caller.
content_object = client.content_object('0f2997f7-728c-4e55-9f92-381ed1260d70')
file = "C:/book/page421.tiff"
workflow_status = client.replace_generation_sync(content_object, file)
Export OPEX Package¶
pyPreservica allows clients to request a full package export from the system by folder or asset, this will start an export workflow and download the resulting dissemination package when the export workflow has completed.
The resulting package will be a zipped OPEX formatted package containing the digital content and metadata.
The export_opex
API is a blocking call which will wait for the export workflow to complete before downloading the package.
folder = client.folder('0f2997f7-728c-4e55-9f92-381ed1260d70')
opex_zip = client.export_opex(folder)
The output is the name of the downloaded zip file in the current working directory.
By default the OPEX package includes metadata, digital content with the latest active generations and the parent hierarchy.
The API can be called on either a folder or a single asset.
asset = client.asset('1f2129f7-728c-4e55-9f92-381ed1260d70')
opex_zip = client.export_opex(asset)
The call also takes the following optional arguments
IncludeContent
“Content” or “NoContent”IncludeMetadata
“Metadata” or “NoMetadata” or “MetadataWithEvents”IncludedGenerations
“LatestActive” or “AllActive” or “All”IncludeParentHierarchy
“true” or “false”
e.g.
folder = client.folder('0f2997f7-728c-4e55-9f92-381ed1260d70')
opex_zip = client.export_opex(folder, IncludeContent="Content", IncludeMetadata="MetadataWithEvents")