Deploy MCP Server
Artificial Intelligence Bearer Token

Inference APIs (AI) REST API

Run AI models via REST API for production deployments

Inference APIs provide REST endpoints for running machine learning models in production. These APIs enable developers to deploy and scale AI models for tasks like text generation, image classification, embedding creation, and natural language processing without managing infrastructure. Popular for building AI-powered applications with low latency and high availability.

Base URL https://api.inference.rest/v1

API Endpoints

MethodEndpointDescription
POST/completionsGenerate text completions from a prompt using language models
POST/chat/completionsGenerate chat-based responses with conversation history support
POST/embeddingsCreate vector embeddings from text for semantic search and similarity
POST/images/generationsGenerate images from text prompts using diffusion models
POST/images/editsEdit or modify existing images using AI models
POST/audio/transcriptionsTranscribe audio files to text using speech recognition models
POST/audio/translationsTranslate audio from one language to another
POST/classificationsClassify text or images into predefined categories
GET/modelsList all available AI models and their capabilities
GET/models/{model_id}Get detailed information about a specific model
POST/predictionsRun custom model predictions with arbitrary inputs
GET/predictions/{prediction_id}Get the status and results of a prediction job
POST/batchSubmit batch inference jobs for processing multiple requests
GET/batch/{batch_id}Check the status of a batch inference job
DELETE/predictions/{prediction_id}Cancel a running prediction job

Code Examples

curl -X POST https://api.inference.rest/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-2-70b",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Connect Inference APIs (AI) to AI

Deploy a Inference APIs (AI) MCP server on IOX Cloud and connect it to Claude, ChatGPT, Cursor, or any AI client. Your AI assistant gets direct access to Inference APIs (AI) through these tools:

generate_text Generate text completions using specified language models with customizable parameters
create_embeddings Convert text into vector embeddings for semantic search and similarity comparison
classify_content Classify text or images into categories using pre-trained classification models
transcribe_audio Transcribe audio files to text using speech-to-text models
list_available_models Query available AI models and their capabilities for different inference tasks

Deploy in 60 seconds

Describe what you need, AI generates the code, and IOX deploys it globally.

Deploy Inference APIs (AI) MCP Server →

Related APIs