logo
Guides

Remote Hosted Content Ingest

Introduction

In this tutorial, we'll go over how to add or ingest remote hosted content into Ground X. This is the step where the magic begins. Ground X's ingestion pipeline is much more than just simply extracting content from your files.

Through its unique ingestion pipeline, comprised of three critical stages, Ground X:

  • Formats your content for LLM use
  • Parses content into intelligible text chunks
  • And generates contextual search data

Unlike other platforms that require content to be processed in plain text before being ingested, Ground X can automatically ingest a wide range of content types for you and also recognizes document structures, such as tables or page numbers, eliminates clutter, and rewrites content that can be clearly understood by an LLM.

GroundX's fine-tuned computer vision model identifies the coordinates of these objects, extracts their content, and then converts them into LLM-readable formats.

Getting started

Required information

To add online files to GroundX, you simply need the following information:

  • BucketID: The ID of the GroundX bucket in which you will store your document.
  • SourceURL: The URL of the file you want to add to your GroundX bucket.

Example:

bucketId = 6839;
sourceUrl = "https://data.chhs.ca.gov/dataset/hci_walk_bicycle.xls";
const bucketId = 6839;
const sourceUrl = "https://data.chhs.ca.gov/dataset/hci_walk_bicycle.xls";
const searchData = {
    title: "Time Walk Bike to Work, 2001-2011",
    publisher: "California Department of Public Health",
    homepage: "https://catalog.data.gov/dataset/,
    abstract: "This table contains data on the percent of population aged 16 years or older whose commute to work is 10 or more minutes/day by walking or biking for California, its regions, counties, and cities/towns."
};

Adding extra search data

Provide document context with extra search data. Although not required because Ground X automatically generates contextual data for your files, you can add extra search data to take maximum advantage of Ground X's search capabilities, help maintain document context in the search query responses, and add tags or notes indicating instructions on how to handle the search results.

Example:

searchData = {
    title: "Time Walk Bike to Work, 2001-2011",
    publisher: "California Department of Public Health",
    homepage: "https://catalog.data.gov/dataset/,
    abstract: "This table contains data on the percent of population aged 16 years or older whose commute to work is 10 or more minutes/day by walking or biking for California, its regions, counties, and cities/towns."
}
const searchData = {
    title: "Time Walk Bike to Work, 2001-2011",
    publisher: "California Department of Public Health",
    homepage: "https://catalog.data.gov/dataset/,
    abstract: "This table contains data on the percent of population aged 16 years or older whose commute to work is 10 or more minutes/day by walking or biking for California, its regions, counties, and cities/towns."
};

Set up environment

Set up your environment.

Example:

from groundx import Groundx
groundx = Groundx(
api_key=GROUNDX_API_KEY,
)
import { Groundx } from 'groundx-typescript-sdk';
import fs from "fs";
const groundx = new Groundx({
apiKey: GROUNDX_API_KEY,
});

API request

Now simply make an API request to upload remote documents and include bucket ID, source URL, and extra search data in the request body.

Example:

ingest = groundx.documents.ingest_remote(
documents=[
{
"bucketId": bucketId,
"searchData": searchData,
"sourceUrl": ingestHosted,
"fileType": fileType,
}
],
)
print(ingest.body)
const ingest = await groundx.documents.ingestRemote({
    documents: [
        {
            bucketId: bucketId,
            sourceUrl: sourceUrl,
            searchData: searchData
        }
    ]
});
console.log(ingest.data)

API response

After making the request, you should receive a response with processId and status. This response indicates that GroundX is uploading or ingesting your file into the indicated bucket.

Example:

{
"ingest": {
"processId": "744aaf18-ff7f-459e-831c-071866dcfa2d",
"status": "queued"
}
}

Final details

Processing time depends on the size of your files. File size can be up to ten megabytes.

After automatically ingesting your files and simplifying all of its complexity for you, Ground X has prepared your content for searchability and automated response generation for your queries.