Artificial Intelligence Bearer Token

Inference APIs (AI) REST API

Run AI models via REST API for production deployments

Inference APIs provide REST endpoints for running machine learning models in production. These APIs enable developers to deploy and scale AI models for tasks like text generation, image classification, embedding creation, and natural language processing without managing infrastructure. Popular for building AI-powered applications with low latency and high availability.

Base URL https://api.inference.rest/v1

API Endpoints

Method	Endpoint	Description
POST	`/completions`	Generate text completions from a prompt using language models
POST	`/chat/completions`	Generate chat-based responses with conversation history support
POST	`/embeddings`	Create vector embeddings from text for semantic search and similarity
POST	`/images/generations`	Generate images from text prompts using diffusion models
POST	`/images/edits`	Edit or modify existing images using AI models
POST	`/audio/transcriptions`	Transcribe audio files to text using speech recognition models
POST	`/audio/translations`	Translate audio from one language to another
POST	`/classifications`	Classify text or images into predefined categories
GET	`/models`	List all available AI models and their capabilities
GET	`/models/{model_id}`	Get detailed information about a specific model
POST	`/predictions`	Run custom model predictions with arbitrary inputs
GET	`/predictions/{prediction_id}`	Get the status and results of a prediction job
POST	`/batch`	Submit batch inference jobs for processing multiple requests
GET	`/batch/{batch_id}`	Check the status of a batch inference job
DELETE	`/predictions/{prediction_id}`	Cancel a running prediction job

Code Examples

curl -X POST https://api.inference.rest/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-2-70b",
    "messages": [
      {"role": "user", "content": "Explain quantum computing"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

const response = await fetch('https://api.inference.rest/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'llama-2-70b',
    messages: [
      { role: 'user', content: 'Explain quantum computing' }
    ],
    temperature: 0.7,
    max_tokens: 500
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

import requests

url = 'https://api.inference.rest/v1/chat/completions'
headers = {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
}
data = {
    'model': 'llama-2-70b',
    'messages': [
        {'role': 'user', 'content': 'Explain quantum computing'}
    ],
    'temperature': 0.7,
    'max_tokens': 500
}

response = requests.post(url, headers=headers, json=data)
result = response.json()
print(result['choices'][0]['message']['content'])

Use Inference APIs (AI) from Claude / Cursor / ChatGPT

Get a hosted MCP endpoint for Inference APIs (AI). Paste your Inference APIs (AI) API key, copy back one URL, drop it into Claude Desktop, Cursor, or any AI client that supports remote MCP. Your AI calls Inference APIs (AI) directly with your credentials — no local install, works on mobile.

generate_text Generate text completions using specified language models with customizable parameters

create_embeddings Convert text into vector embeddings for semantic search and similarity comparison

classify_content Classify text or images into categories using pre-trained classification models

transcribe_audio Transcribe audio files to text using speech-to-text models

list_available_models Query available AI models and their capabilities for different inference tasks

Connect in 60 seconds

Paste your Inference APIs (AI) key → get an MCP URL → paste into Claude/Cursor. Hosted by IOX, encrypted at rest.

Connect Inference APIs (AI) to your AI →

Inference APIs (AI) REST API

API Endpoints

Sponsor this page

Code Examples

Use Inference APIs (AI) from Claude / Cursor / ChatGPT

Connect in 60 seconds

Related APIs