How to ingest documents in AI Assistant
Document ingestion is the process of preparing documents so they can be effectively processed and understood by large language models (LLMs). This step ensures the content of the document is indexed, analyzed, and made accessible for AI-powered features such as summarization, question answering, and contextual assistance.
Methods for document ingestion
Ingestion can happen in two ways:
-
Automatic ingestion — When using our iOS, Android, or Web SDKs, documents are automatically ingested when a user opens a chat window. However, this initial processing may take some time.
-
Manual ingestion — Depending on your setup, you can manually ingest documents either:
-
Directly, if AI Assistant runs as a standalone service.
-
Through Document Engine, if you’re leveraging its collaboration features alongside AI capabilities.
-
For an improved user experience, we recommend preingesting documents whenever possible. This is particularly useful when the set of documents intended for AI assistance is known in advance.
Direct document ingestion
Here’s a sample cURL command that ingests document.pdf
(assuming AI Assistant is running locally on port 4000):
curl -X POST "http://localhost:4000/server/api/v1/documents/ingest" \ -H "Authorization: Bearer YOUR_API_TOKEN" \ -H "Content-Type: application/pdf" \ --data-binary @document.pdf
Response:
{ "permanentId": "1defd934dbbf77587eb9b7f45d162d2a3aea16c840a9e7cfa190fb2ea1f40a76", "changingId": "3a4b5c6d7e8f90123456789abcdef0123456789abcdef0123456789abcdef0123" }
Document ingestion with Document Engine
For applications that use Document Engine, you can ingest documents that are already stored in Document Engine. This approach is ideal when you need both Document Engine’s collaboration features and AI Assistant capabilities.
Step 1: Upload a document to Document Engine
First, upload your document to Document Engine:
curl -X POST "https://localhost:5000/api/documents" \ -H "Authorization: Bearer YOUR_DOCUMENT_ENGINE_TOKEN" \ -H "Content-Type: multipart/form-data" \ -F "[email protected]" \
Response:
{ "document_id": "01H2XZ38NNVT6SGKH7PPTE0000", "title": "My Document", "sourcePdfSha256": "1defd934dbbf77587eb9b7f45d162d2a3aea16c840a9e7cfa190fb2ea1f40a76", "fileSize": 1024000, "pages": 10 }
Step 2: Ingest the document from Document Engine into AI Assistant
After the document is uploaded to Document Engine, ingest it into AI Assistant:
curl -X GET "https://localhost:4000/server/api/v1/documents/01H2XZ38NNVT6SGKH7PPTE0000/ingest/1defd934dbbf77587eb9b7f45d162d2a3aea16c840a9e7cfa190fb2ea1f40a76" \ -H "Authorization: Bearer YOUR_AI_ASSISTANT_TOKEN"
A successful response returns a 204 No Content
status code, indicating the document was processed.
Working with document layers
If you’re using Document Engine layers for annotations or other content, you can ingest a specific layer:
curl -X GET "https://localhost:4000/server/api/v1/documents/01H2XZ38NNVT6SGKH7PPTE0000/layers/myLayer/ingest/1defd934dbbf77587eb9b7f45d162d2a3aea16c840a9e7cfa190fb2ea1f40a76" \ -H "Authorization: Bearer YOUR_AI_ASSISTANT_TOKEN"
Verifying document ingestion status
To check if a document has been successfully ingested into AI Assistant, you can use the following API call. This will verify the ingestion status by matching the document’s unique identifier and file hash:
curl -X GET "https://localhost:4000/server/api/v1/documents/01H2XZ38NNVT6SGKH7PPTE0000/fileHash/1defd934dbbf77587eb9b7f45d162d2a3aea16c840a9e7cfa190fb2ea1f40a76" \ -H "Authorization: Bearer YOUR_AI_ASSISTANT_TOKEN"
-
A
204 No Content
response confirms the document has been successfully ingested and is ready for use. -
A
404 Not Found
response indicates the document hasn’t been ingested.
Full reference
For the complete API reference, which includes document ingestion and other AI Assistant capabilities, refer to the following link: