Getting Started Guide
This guide will help you get started with the ParseMyFile API.
Overview
The ParseMyFile API allows you to process PDF, DOCX, XLSX documents and images to extract structured data in JSON format. It uses a custom YAML configuration file to define the fields to extract.
Prerequisites
- A valid API key
- A file to process (PDF, DOCX, XLSX or image)
- A YAML configuration file
- An HTTP client (cURL, Postman, or code in your preferred language)
Installation and Configuration
1. Get an API Key
To get an API key, create an account on the website: https://parsemyfile.com. Once logged in, go to the "API Keys" section. Create a key or use an existing valid key. This key will be required for all API requests.
2. Prepare your files
File to process
Criteria depend on your subscription
- Supported formats: PDF, JPG, PNG, JPEG, XLSX, DOCX
- Maximum size: 1-10 MB
- Recommended quality: 300 DPI minimum
YAML configuration file
Create a YAML file describing the fields to extract.
Here's an example:
yaml
schemas:
data:
type: object
properties:
name:
type: string
description: client name
email:
type: string
description: client email address
phone:
type: string
description: client phone number
amount:
type: double
description: total invoice amountFirst API Call
With cURL
bash
curl -X POST "https://api.parsemyfile.com/api/v1/generate" \
-H "X-API-KEY: your_api_key_here" \
-F "file=@my_document.pdf" \
-F "yaml_file=@my_configuration.yaml"With JavaScript
javascript
const formData = new FormData();
formData.append('file', fileInput.files[0]);
formData.append('yaml_file', yamlFileInput.files[0]);
const response = await fetch('https://api.parsemyfile.com/api/v1/generate', {
method: 'POST',
headers: {
'X-API-KEY': 'your_api_key_here'
},
body: formData
});
const result = await response.json();
console.log(result);With Python
python
import requests
url = "https://api.parsemyfile.com/api/v1/generate"
headers = {"X-API-KEY": "your_api_key_here"}
files = {
'file': ('document.pdf', open('document.pdf', 'rb'), 'application/pdf'),
'yaml_file': ('configuration.yaml', open('configuration.yaml', 'rb'), 'text/yaml')
}
response = requests.post(url, headers=headers, files=files)
result = response.json()
print(result)API Health Check
Before processing your documents, you can verify that the API is working correctly:
bash
curl -X GET "https://api.parsemyfile.com/health"Expected response:
json
{
"status": "healthy",
"timestamp": "2024-01-15T10:30:00Z",
"version": "1.0.0"
}