X-Ray: Document Understanding
Comprehensive Document Parsing for Modern Applications
In this guide, we'll introduce EyeLevel's X-Ray, a modern parser designed to extract high quality data from complicated real-world documents. X-Ray employs cutting edge parsing techniques which are specifically designed to support modern workflows like RAG, Agents, and Document Summarization, allowing developers to connect data from human-centric documents to LLM powered applications.
X-Ray in a Nutshell
You can think of X-Ray as a cocktail of document understanding and advanced parsing approaches packaged together under a single API. To give you an idea, these are some of the components which X-Ray employs to understand human-centric documents:
- Bespoke document understanding models to detect key elements within documents.
- Advanced OCR processes which facilitate textual extraction from a variety of document representations.
- A repairing and reformatting pipeline that improves parse interpretability.
- A re-contextualization system that promotes fully contextualized summarizations of parsed results.
The upshot is a system which can extract complete ideas from complex documents, and represent those ideas in a way which is easy for both developers and LLMs to understand.
See it for yourself
X-Ray's fine tuned vision model is one of the most critical components of the system. Over the last 4 years, EyeLevel has collected a comprehensive set of documents from a variety of domains which have been used to train, in our opinion, the highest quality vision model for understanding complex real-world documents to date. You can use this demo to get an idea of how X-Ray works with your documents.
An example of X-Ray identifying and extracting key elements from a real-world document.
Or you can get started with our APIs by following these simple steps:
How to use X-Ray
1) Account Setup
X-Ray exists as a sub-component of a product called GroundX. We won't be using GroundX's core functionality in this article, but we will use GroundX to invoke X-Ray and query the results. Thus, our first step is to set up a GroundX API Key. First set up an account, then you can find your API key by navigating to the API Key page. GroundX has a free trial tier which you can use to experiment with X-Ray.
Once you're set up, install the SDK.
2) Install Dependencies for this Guide
If you are using Python, you may skip this step. The TypeScript version of this guide uses third party dependencies to demonstrate the GroundX APIs.
You may already have these installed on your system. If not, you will need to install the following dependencies to run the code in this guide.
3) Creating a Bucket
Once you have a GroundX API key you may wish to create a bucket. Buckets can be used to organize documents into different groupings, which can be useful for certain applications. We can list all available buckets via , and create a new bucket via .
4) Uploading Documents
Uploading documents to a GroundX bucket will automatically trigger X-Ray. There are a variety of uploading options which might be useful for a variety of use cases. In this example we're uploading a document which is stored locally using .
5) Querying Upload Status
Ingesting returns a process_id
, which can be used with to query the progress of the upload. This code checks the status of the process every 10 seconds until ingestion is done.
6) Getting X-Ray Results
Now that our documents are fully uploaded we can get all the documents in our bucket via . We only uploaded a single document, so we can get the one and only document at index 0
, and then get the URL in which the X-Ray output is stored.
7) Interpreting X-Ray Results
X-Ray provides a rich set of results which may be useful in a variety of use cases. Here are some noteworthy outputs of X-Ray:
- fileKeywords: A list of keywords which describe the document
- fileSummary: A summary of the entire document
- boundingBoxes: Key regions within the document which contain meaningful content.
- contentType: The type of content a certain chunk is. Textual paragraph, graphical figures, or tables.
- json: A reformatted representation of graphs and figures in a json format, useful for both LLM and programatic workflows.
- narrative: A reformatted representation of graphs and figures in a narrative format, often useful in LLM applications.
- sectionSummary: A contextually summarized representation of a particular section of the document.
This is the full structure of an X-Ray parse: