Artificial Intelligence
Bearer Token
Inference APIs (AI) REST API
Run AI models via REST API for production deployments
Inference APIs provide REST endpoints for running machine learning models in production. These APIs enable developers to deploy and scale AI models for tasks like text generation, image classification, embedding creation, and natural language processing without managing infrastructure. Popular for building AI-powered applications with low latency and high availability.
Base URL
https://api.inference.rest/v1
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /completions | Generate text completions from a prompt using language models |
| POST | /chat/completions | Generate chat-based responses with conversation history support |
| POST | /embeddings | Create vector embeddings from text for semantic search and similarity |
| POST | /images/generations | Generate images from text prompts using diffusion models |
| POST | /images/edits | Edit or modify existing images using AI models |
| POST | /audio/transcriptions | Transcribe audio files to text using speech recognition models |
| POST | /audio/translations | Translate audio from one language to another |
| POST | /classifications | Classify text or images into predefined categories |
| GET | /models | List all available AI models and their capabilities |
| GET | /models/{model_id} | Get detailed information about a specific model |
| POST | /predictions | Run custom model predictions with arbitrary inputs |
| GET | /predictions/{prediction_id} | Get the status and results of a prediction job |
| POST | /batch | Submit batch inference jobs for processing multiple requests |
| GET | /batch/{batch_id} | Check the status of a batch inference job |
| DELETE | /predictions/{prediction_id} | Cancel a running prediction job |
Code Examples
curl -X POST https://api.inference.rest/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-2-70b",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"temperature": 0.7,
"max_tokens": 500
}'
Connect Inference APIs (AI) to AI
Deploy a Inference APIs (AI) MCP server on IOX Cloud and connect it to Claude, ChatGPT, Cursor, or any AI client. Your AI assistant gets direct access to Inference APIs (AI) through these tools:
generate_text
Generate text completions using specified language models with customizable parameters
create_embeddings
Convert text into vector embeddings for semantic search and similarity comparison
classify_content
Classify text or images into categories using pre-trained classification models
transcribe_audio
Transcribe audio files to text using speech-to-text models
list_available_models
Query available AI models and their capabilities for different inference tasks
Deploy in 60 seconds
Describe what you need, AI generates the code, and IOX deploys it globally.
Deploy Inference APIs (AI) MCP Server →