Content API

pyPreservica now contains interfaces to the content API which supports searching the repository.

https://us.preservica.com/api/content/documentation.html

The content API is a readonly interface which returns json documents rather than XML and which has some duplication with the entity API, but it does contain search capabilities.

The content API client is created using

from pyPreservica import *

client = ContentAPI()

object-details

Get the details for an Asset or Folder as a Python dictionary object containing CMIS attributes

client = ContentAPI()

client.object_details("IO", "uuid")
client.object_details("SO", "uuid")

e.g.

from pyPreservica import *

client = ContentAPI()

details = client.object_details("IO", "de1c32a3-bd9f-4843-a5f1-46df080f83d2")
print(details['name'])

or

from pyPreservica import *

client = ContentAPI()

details = client.object_details(EntityType.ASSET, "de1c32a3-bd9f-4843-a5f1-46df080f83d2")
print(details['name'])

Indexed Fields

Get a list of all the indexed metadata fields within the Preservica search engine. This includes the default xip.* fields and any custom indexes which have been created through custom index files.

client = ContentAPI()

client.indexed_fields()

Full Text Index

If a document contains text such as a PDF or a Word document or it has been OCR’d the full text index will contain the extracted text.

To extract the value of the full text index for an Asset use the following call:

from pyPreservica import *

content = ContentAPI()

text: str = content.full_text("48c79abd-01f3-4b77-8132-546a76e0d337")

The reference supplied must be a valid Asset reference.

This allows you to copy the full text index into a description field to allow users to view the OCR text, for example:

from pyPreservica import *

content = ContentAPI()
entity = EntityAPI()

asset = entity.asset("48c79abd-01f3-4b77-8132-546a76e0d337")

asset.description = content.full_text(asset.reference)
entity.save(asset)

Reporting Examples

Create a spreadsheet containing all Assets within the repository

Generate a CSV report on all assets within the system, spreadsheet columns include asset title, description, security tag etc

from pyPreservica import *

client = ContentAPI()


if __name__ == '__main__':
    metadata_fields = {
        "xip.reference": "*", "xip.title": "",  "xip.description": "", "xip.document_type": "IO",  "xip.parent_ref": "",
        "xip.security_descriptor": "*",
        "xip.identifier": "", "xip.bitstream_names_r_Preservation": ""}

    client.search_callback(client.ReportProgressCallBack())

    client.search_index_filter_csv("", "assets.csv", metadata_fields)

Create a spreadsheet containing all Assets and Folders within the repository

from pyPreservica import *

client = ContentAPI()

if __name__ == '__main__':
    metadata_fields = {
        "xip.reference": "*", "xip.title": "",  "xip.description": "", "xip.document_type": "*",  "xip.parent_ref": "",
        "xip.security_descriptor": "*",
        "xip.identifier": "", "xip.bitstream_names_r_Preservation": ""}

    client.search_callback(client.ReportProgressCallBack())

    client.search_index_filter_csv("", "all_objects.csv", metadata_fields)

Create a spreadsheet containing all Assets and Folders underneath a specific folder

from pyPreservica import *

content = ContentAPI()
entity = EntityAPI()

folder = entity.folder(sys.argv[1])

print(f"Searching inside folder {folder.title}")

if __name__ == '__main__':
    metadata_fields = {
        "xip.reference": "*", "xip.title": "", "xip.description": "", "xip.document_type": "*", "xip.parent_hierarchy": f"{folder.reference}",
        "xip.security_descriptor": "*",
        "xip.identifier": "", "xip.bitstream_names_r_Preservation": ""}


    content.search_callback(content.ReportProgressCallBack())

    content.search_index_filter_csv("", "assets.csv", metadata_fields)

User Security Tags

You can get a list of available security tags for the current user by calling:

client = ContentAPI()

client.user_security_tags()