Welcome to python-pdf-analytics-client API’s documentation!

PDFAnalytics is a web service which lets you use to verify PDF content for free.

This python-pdf-analytics-client library allows you to automate most common PDFAnalytics operations using Python 2 or Python 3.

python-pdf-analytics-client can be installed from the pip tool or downloaded from PyPI: https://pypi.python.org/pypi/python-pdf-analytics-client

The source is available on: https://github.com/pdf-analytics/python-pdf-analytics-client

Contents:

Introduction

Purpose

The purpose of python-pdf-analytics-client is to provide a library that would help you to automate most common PDFAnalytics using its REST API.

python-pdf-analytics-client can verify :

  • textural content, like text, font style, its location (using coordinates and page number)
  • image content based on a locally stored image (pixel-by-pixel comparison), its actual size and location in the PDF
  • pdf-to-pdf comparison, compare an uploaded PDF with a local one pixel-by-pixel, page-by-page

Examples

This example asserts there is the figure.png image on page 4 inside the demo.pdf PDF file.

>>> from pdf_analytics_client import APIClient
>>> server = APIClient(token='my_token')
>>> pdf_job = server.create_job(local_file='/Users/tester/demo.pdf')
>>> pdf_job.verify_image(local_img='/Users/tester/figure.png', top=24, left=64, page=4)

Dependencies

python-pdf-analytics-client has only one dependency python-requests . All the dependencies shall be installed automatically when you will install the python-pdf-analytics-client module with pip.

Examples

You may find the examples at the GitHub repository : https://github.com/pdf-analytics/python-pdf-analytics-client/tree/master/examples

To run the examples you need to have registered to the site pdf-analytics and to get your token number.

To run the examples:

$ cd examples
# Install the dependecies
$ pip install -r requirements_examples
# Run the examples
$ behave -D token=<your_token_id>

Installation

Python 3

To install python-pdf-analytics-client, install the python-pdf-analytics-client package from PyPI and dependencies.

On Windows, this is:

C:\Python36\pip.exe install python-pdf-analytics-client

(Though you may have a different version of Python installed other than 3.6)

On OS X, this is:

pip3 install python-pdf-analytics-client

On Linux, this is:

pip install python-pdf-analytics-client

Python 2

To install python-pdf-analytics-client, install the python-pdf-analytics-client package from PyPI and dependencies.

On Windows, this is:

C:\Python27\pip.exe install python-pdf-analytics-client

(Though you may have a different version of Python installed other than 2.7)

On OS X, this is:

pip install python-pdf-analytics-client

On Linux, this is:

pip install python-pdf-analytics-client

python-pdf-analytics-client will try to install the only dependency i.e. the python-requests library. This happens when pip installs python-pdf-analytics-client.

References

This is a quickstart reference to using PyPDFAnalyticsClient.

PDF Analytics Client

The PDF Analytics Client is a high level module that enables the verification of the images and text of a local PDF file.

class api_client.APIClient(token, url=u'https://pdf-analytics.com/api/')[source]

Main API client class

create_job(local_file, wait_to_complete=True)[source]

Create a PDF analysis job

Parameters:
  • local_file – the path of the local PDF file that needs to be uploaded to the server for the analysis
  • wait_to_complete – wait for the PDF analysis to complete. Default value is True.
Returns:

The JobClass object,

get_account_details()[source]

Get my account details

Returns:a dictionary object with the user’s account details { ‘max_pdf_size_mb’: 3, ‘daily_max_count’: 10, ‘today_remaining’: 4, }
get_job(job_id)[source]

Get PDF analysis job

Parameters:job_id – the PDF analysis job ID
Returns:The JobClass object,
class api_client.JobClass(id, client)[source]

Basic PDF analysis Job class

get_item(left, top, page, type=u'any')[source]

Get any item from the PDF (TODO: get figure)

Parameters:
  • left – Distance from the left of the page in points. Accepts single integer. e.g. 150
  • top – Distance from the top of the page in points. Accepts single integer. e.g 200
  • page – Number of page, e.g. 4
  • type – Type of the the item.
Returns:

A JSON object with the item’s information

get_metadata()[source]

Get the metadata of the PDF

Returns:A JSON object with the metadata of the PDF
get_status()[source]

Get the status of the PDF analysis

Returns:The analysis status as string. The string can be “In progress”, “Error” or “Complete”
Return type:str
verify_image(path, left, top, page, compare_method=u'pbp', tolerance=0.0)[source]

Verify a local image file exists in the PDF

Parameters:
  • path – The absolute or relative path of the locally stored image e.g. ‘/User/tester/apple.png’
  • left – Distance from the left of the page in points. Accepts single integer. e.g. 150
  • top – Distance from the top of the page in points. Accepts single integer. e.g 200
  • page – Number of page, e.g. an integer 4 or a string ‘all’, ‘last’, ‘1-4’
  • compare_method – Image comparison method
  • tolerance – Comparison tolerance. Default value 0.0. Example: 0.02
Returns:

If the request is successful it returns 200. If it is not successful it returns the error message.

Return type:

JSON

verify_pdf(path, excluded_areas=u'', tolerance=0.0)[source]

Verify a local PDF file with the uploaded job’s PDF

Parameters:
  • path – The absolute or relative path of the locally stored PDF ilfe e.g. ‘/User/tester/report.pdf’
  • excluded_areas – Excluded areas. List field. Example : [ {‘left’:146, ‘top’:452, ‘width’:97, ‘height’:13,’page’:2}, {‘left’: 414, ‘top’: 747, ‘width’: 45, ‘height’: 16, ‘page’: ‘all’},]
  • tolerance – Comparison tolerance. Default value 0.0. Example: 0.02
Returns:

If the request is successful it returns 200. If it is not successful it returns the error message.

Return type:

JSON

verify_text(text, left, top, page, method=u'contains')[source]

Verify a text exists in the PDF

Parameters:
  • text – The expected textural content. Accepts string. e.g. ‘This is the expected text’
  • left – Distance from the left of the page in points. Accepts single integer. e.g. 150
  • top – Distance from the top of the page in points. Accepts single integer. e.g 200
  • page – Number of page, e.g. an integer 4 or a string ‘all’, ‘last’, ‘1-4’
  • method – Text comparison method
Returns:

If the request is successful it returns 200. If it is not successful it returns the error message.

wait_analysis_to_complete()[source]

Wait for the PDF analysis to complete

After you submit the PDF to PDF Analytics website, the takes some seconds until it is ready to be used for verification.

Returns:If the analysis is completed and returns True else if in 20 seconds the job is not complete, returns False
Return type:bool

Changelog

This document will track major changes in the project.

1.0.6, December 13, 2017

  • Add requests as dependency

1.0.5, December 11, 2017

  • Fix ModuleNotFoundError

1.0.4, November 25, 2017

  • Fix the documentation
  • Add logo to the documentation

1.0.3, November 24, 2017

  • Add Python 3 support

1.0.2

  • Add Categories and keywords in pip / setup

1.0.1

  • Fix the SSL cert verification
  • Fix the documentation
  • Cosmetic changes to the python-behave examples

1.0.0

  • First release

Indices and tables