Connect to AI
Artificial Intelligence Bearer Token

Inference APIs (AI) REST API

Run AI models via REST API for production deployments

Inference APIs provide REST endpoints for running machine learning models in production. These APIs enable developers to deploy and scale AI models for tasks like text generation, image classification, embedding creation, and natural language processing without managing infrastructure. Popular for building AI-powered applications with low latency and high availability.

Base URL https://api.inference.rest/v1

API Endpoints

MethodEndpointDescription
POST/completionsGenerate text completions from a prompt using language models
POST/chat/completionsGenerate chat-based responses with conversation history support
POST/embeddingsCreate vector embeddings from text for semantic search and similarity
POST/images/generationsGenerate images from text prompts using diffusion models
POST/images/editsEdit or modify existing images using AI models
POST/audio/transcriptionsTranscribe audio files to text using speech recognition models
POST/audio/translationsTranslate audio from one language to another
POST/classificationsClassify text or images into predefined categories
GET/modelsList all available AI models and their capabilities
GET/models/{model_id}Get detailed information about a specific model
POST/predictionsRun custom model predictions with arbitrary inputs
GET/predictions/{prediction_id}Get the status and results of a prediction job
POST/batchSubmit batch inference jobs for processing multiple requests
GET/batch/{batch_id}Check the status of a batch inference job
DELETE/predictions/{prediction_id}Cancel a running prediction job

Code Examples

curl -X POST https://api.inference.rest/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-2-70b",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Use Inference APIs (AI) from Claude / Cursor / ChatGPT

Get a hosted MCP endpoint for Inference APIs (AI). Paste your Inference APIs (AI) API key, copy back one URL, drop it into Claude Desktop, Cursor, or any AI client that supports remote MCP. Your AI calls Inference APIs (AI) directly with your credentials — no local install, works on mobile.

generate_text Generate text completions using specified language models with customizable parameters
create_embeddings Convert text into vector embeddings for semantic search and similarity comparison
classify_content Classify text or images into categories using pre-trained classification models
transcribe_audio Transcribe audio files to text using speech-to-text models
list_available_models Query available AI models and their capabilities for different inference tasks

Connect in 60 seconds

Paste your Inference APIs (AI) key → get an MCP URL → paste into Claude/Cursor. Hosted by IOX, encrypted at rest.

Connect Inference APIs (AI) to your AI →

Related APIs