Voyance Vision is an exciting technology that allows users to automatically detect objects (regions of interest) in images and extract text from those regions.

With Vision, you’re enabled to seamlessly capture data in documents (such as invoices, ID cards, e.t.c) and export the data to CSV or JSON via an API call.

The platform gives you the flexibility to define regions of interest in your dataset (images) by using state of the art computer vision algorithms to encode the information to provide accurate predictions.

Sign up for Voyance Vision here.

Now that you have signed up, sign in to your account and lets get started!

We are going to create a demo model to extract information from invoices using Voyance vision.


  1. Minimum of 40 images. It is crucial to ensure that the uploaded images are consistent and that they are the images that you expect to use the model on to ensure good results...


  1. The first thing we’re going to do is upload and annotate our images. This means we are trying to draw bounding boxes around the regions in our images that we would like to extract text from. Click on the “create model” icon on the top right portion of the screen to get started.

You can choose to either use a pre-trained model or train your model from scratch on the following web page. Pre-trained models refer to models that have been trained, optimised and are ready for use. Currently, we are providing pre-trained models for extracting ID information from international passports, and we will release support for more document types in due time.

However, we will be training a model from scratch. Thus, we will select “Create your Model”, as shown below.

2. Now we will create labels for our model. Labels refer to the name of the fields in the document we will extract data from. Labels vary from document to document, and they are entirely dependent on the use case. For this tutorial, we will be extracting text from the following fields (company name, invoice number & total amount) from invoices that follow the template shown below. In this scenario, our labels will be company_name, invoice_number, total_amount.

3. Next, we will upload the images that would serve as the dataset for training the model. Voyance vision supports uploading images from 3 primary data sources. They are Amazon S3 buckets, Google Drive and your local machine. To train a model, you need at least 40 images, and you need to ensure that those images are a good representation of the images the model will be used on going forward.

4. After uploading, it’s time to annotate your images. Annotation here refers to labelling our images by drawing bounding boxes around the area of the image where we want to extract text from. These bounding boxes help the model identify the region in the images we want to extract text from. We do this for all the images uploaded.

Now it's time to train! At this point, you should have annotated all your images and are ready to train.

We are here to help. Your questions and concerns mean a lot to us.

Please contact us here at jen@voyancehq.com.