AI API Services Directory
Curated providers for LLMs, chat, vision, multimodal
Alibaba Cloud Model Studio (Bailian) API - OpenAI Compatible
Qwen via OpenAI‑compatible /v1
Alibaba Cloud Model Studio (Bailian) exposes Qwen-family models via OpenAI-compatible interfaces (/v1). Core endpoints include Chat Completions (/v1/chat/completions) and Embeddings (/v1/embeddings), supporting streaming, tool calls, system prompts, and multi-message context. Configure region-specific BASE_URL: Beijing (https://dashscope.aliyuncs.com/compatible-mode/v1) and Singapore (https://dashscope-intl.aliyuncs.com/compatible-mode/v1).
AWS AI Services
Bedrock inference + the AWS AI suite
AWS AI Services span both generative and traditional AI: Amazon Bedrock (foundation model hosting and inference), Comprehend (NLP), Rekognition (Vision), Polly (Text-to-Speech), Transcribe (Speech-to-Text), Translate (MT), and Lex (Conversational AI). This file uses Amazon Polly’s HTTP endpoint (/v1/speech) as a concrete example; see links for other services.
Azure AI
OpenAI‑compatible /v1 wired into Azure’s enterprise stack
Azure AI is Microsoft’s integrated AI platform on Azure for building AI applications and AI agents, spanning Azure OpenAI Service, Azure AI Services (Vision/Speech/Language/Translator/Content Safety), Azure AI Search, Azure Machine Learning, and Azure AI Studio. This solutions page offers an overview and entry points. Azure AI Inference provides OpenAI‑compatible /v1 endpoints for quick integration and migration.
Claude API — Anthropic Developer Platform
Messages API + prompt caching, tool use done right
Anthropic's Claude API centers on the Messages API, supporting text and image understanding/generation, tool calls (functions/skills), computer use, system prompts, streaming outputs, token counting, and prompt caching. Suitable for assistants, RAG, automation workflows, and enterprise integrations.
DeepSeek API
High-performance, low-cost, and ecosystem-compatible reasoning LLM API.
DeepSeek offers reasoning-enhanced LLM APIs compatible with OpenAI and Anthropic formats, supporting chat completions, tool calls, Chain-of-Thought output, and Beta features (prefix completion, FIM).
Google AI for Developers — Gemini API
Multimodal done right: structured outputs + function calling
The Gemini API by Google AI is a multimodal generative AI service that supports understanding and generation across text, images, video, audio, and PDF. It provides structured outputs, function calling, context caching, batch processing, embeddings, and token counting, suitable for chat agents, content generation, RAG, agent tool use, and large-scale pipelines.
Hugging Face
Open‑source ecosystem with unified inference and OpenAI routing
Hugging Face is a leading open-source and open-science AI platform. Its core includes the Hugging Face Hub for models/datasets/Spaces, inference and hosting (Inference Providers and Inference Endpoints), and a rich ecosystem of open-source libraries (Transformers, Diffusers, Datasets, Tokenizers, Accelerate, PEFT, TRL, Safetensors, Transformers.js, smolagents, Text Generation Inference, etc.). The platform supports text, image, audio, video and 3D modalities with unified access via Python, JavaScript and REST/OpenAI-compatible endpoints.
Kimi API
Long context with JSON/Partial Modes — OpenAI‑compatible
Kimi by Moonshot AI provides OpenAI-compatible APIs/SDKs for chat completions, tool calls, JSON Mode, Partial Mode, long context, and vision (via kimi-latest). The primary endpoint is POST /v1/chat/completions with streaming and context caching support.
OpenAI API
Unified Responses + Realtime — flagship developer API
OpenAI offers developer REST, streaming, and realtime APIs including the unified Responses API, legacy Chat Completions, Embeddings, Image generation, Audio (TTS/transcription), and Assistants with Threads/Runs. Base URL is https://api.openai.com/v1 and authentication uses Bearer tokens. Usage and rate limits vary by account tier and model; see official rate limit docs and your account console.
Replicate HTTP API
One unified predictions API — sync, async, and SSE
Replicate provides a unified HTTP API to run community and official AI models (text, image, audio, video, etc.). The core resource is a "prediction". Developers create runs via POST /v1/predictions or model/deployment-specific endpoints, with support for sync (Prefer: wait) and async modes, SSE streaming, and webhooks. Authentication uses Bearer tokens and the base is https://api.replicate.com/v1. Rate limits are ~600 requests/min for create prediction and ~3000 requests/min for other endpoints. Pricing is per model (time-based or token-based).
Production-grade AI API providers across LLMs, chat/messages, reasoning, embeddings, image/audio/video and model hosting. Includes OpenAI, Anthropic Claude, Google Gemini, Azure AI, AWS AI Services, Alibaba Model Studio (Bailian/Qwen), DeepSeek, Kimi (Moonshot AI), Hugging Face, and Replicate, with many OpenAI-compatible endpoints.