API Documentation
Table of Contents
This documentation provides detailed information on how to use the API for efficient document processing and text extraction.
Document Text Extraction API
The StructHub API endpoint for text extraction from documents:
POST https://api.structhub.io/extract
Request Parameters
Headers
API-KEY: Your subscription API key for authentication.
Form Data
file: Upload the document file. Ensure the correct file path is provided.ocr: (Optional) Set to “auto” (default), true, or false to enable Optical Character Recognition (OCR).lang: (Optional) Explicitly set the language of the uploaded document. Languages are auto-detected but you can also explicitly set language param in case results are not optimum. E.g., Use “eng+fra” for documents with both english and frech text.out_format: (Optional) Set the output format to text, xml, json, or html.
Example Curl Request
curl --location 'https://api.structhub.io/extract' \--header 'API-KEY: YOUR_API_KEY' \--form 'file=@"/path/to/your/document.docx"' \--form 'ocr="auto"' \--form 'lang="eng+fra"' \--form 'out_format="text"'Response
[ { "page": 1, "text": "<text output>" }, ...]Request Parameters Details
| Parameter | Default Value | Required | Description |
|---|---|---|---|
file | - | Yes | Use the file parameter to upload the document file. Only one file can be uploaded at a time. Ensure the correct file path is provided. |
ocr | “auto” | No | By default, OCR is set to “auto”. The API detects if OCR is required (e.g., for scanned documents or documents with images). Set ocr to true or false as needed. Enabling OCR can significantly slow down processing. |
lang | - | No | Use the lang parameter to explicitly set the language of the uploaded document. While the API detects some languages, explicitly setting “hin” can enhance processing for Hindi documents. |
out_format | “text” | No | Set the out_format parameter to define the output format. Choose from text, xml, json, or html. Each page’s extracted data will be returned in the specified format. |
Knowledge Base As a Service API
- The StructHub API endpoint to search knowledge base:
POST https://api.structhub.io/search
Request Parameters
Headers
API-KEY: Your subscription API key for authentication.
Request Body
q: Search query string to search knowledge base.topk: top count of results.
Example Curl Request
curl --location 'https://api.structhub.io/search' \--header 'API-KEY: YOUR_API_KEY' \--data-raw '{"query":"dd","topk":10}' \Response
{ "count": 9, "data": [ { "source": "sample-pdf.pdf", "page": 2.0, "text": "matched docuemnt text chunk" }, .... ]}Rate Limit
Each subscription comes with a per-minute rate limit. The rate limit is calculated within a moving 1-minute window. If the rate limit is exceeded, the API will respond with a 429 error. Ensure your application adheres to the rate limits to avoid disruptions.
Response Codes
401 Unauthorized: Invalid API key or no API key provided.
200 OK: Successful operation.
429 Too Many Requests: Rate limit exceeded.
Ensure to replace YOUR_API_KEY in the example Curl request with your actual subscription API key.
For any questions or assistance, feel free to contact our support team.