Inference APIs (AI) REST API
Run AI models via REST API for production deployments
Inference APIs provide REST endpoints for running machine learning models in production. These APIs enable developers to deploy and scale AI models for tasks like text generation, image classification, embedding creation, and natural language processing without managing infrastructure. Popular for building AI-powered applications with low latency and high availability.
https://api.inference.rest/v1
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /completions | Generate text completions from a prompt using language models |
| POST | /chat/completions | Generate chat-based responses with conversation history support |
| POST | /embeddings | Create vector embeddings from text for semantic search and similarity |
| POST | /images/generations | Generate images from text prompts using diffusion models |
| POST | /images/edits | Edit or modify existing images using AI models |
| POST | /audio/transcriptions | Transcribe audio files to text using speech recognition models |
| POST | /audio/translations | Translate audio from one language to another |
| POST | /classifications | Classify text or images into predefined categories |
| GET | /models | List all available AI models and their capabilities |
| GET | /models/{model_id} | Get detailed information about a specific model |
| POST | /predictions | Run custom model predictions with arbitrary inputs |
| GET | /predictions/{prediction_id} | Get the status and results of a prediction job |
| POST | /batch | Submit batch inference jobs for processing multiple requests |
| GET | /batch/{batch_id} | Check the status of a batch inference job |
| DELETE | /predictions/{prediction_id} | Cancel a running prediction job |
Sponsor this page
AvailableReach developers actively building with Inference APIs (AI). See live pageview data and self-serve checkout — your slot goes live in minutes.
View inventory & pricing →Code Examples
curl -X POST https://api.inference.rest/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-2-70b",
"messages": [
{"role": "user", "content": "Explain quantum computing"}
],
"temperature": 0.7,
"max_tokens": 500
}'
Use Inference APIs (AI) from Claude / Cursor / ChatGPT
Get a hosted MCP endpoint for Inference APIs (AI). Paste your Inference APIs (AI) API key, copy back one URL, drop it into Claude Desktop, Cursor, or any AI client that supports remote MCP. Your AI calls Inference APIs (AI) directly with your credentials — no local install, works on mobile.
generate_text
Generate text completions using specified language models with customizable parameters
create_embeddings
Convert text into vector embeddings for semantic search and similarity comparison
classify_content
Classify text or images into categories using pre-trained classification models
transcribe_audio
Transcribe audio files to text using speech-to-text models
list_available_models
Query available AI models and their capabilities for different inference tasks
Connect in 60 seconds
Paste your Inference APIs (AI) key → get an MCP URL → paste into Claude/Cursor. Hosted by IOX, encrypted at rest.
Connect Inference APIs (AI) to your AI →