Alibaba Cloud Model Studio (Bailian) API - OpenAI Compatible

Alibaba Cloud

Qwen via OpenAI‑compatible /v1

Alibaba Cloud Model Studio (Bailian) exposes Qwen-family models via OpenAI-compatible interfaces (/v1). Core endpoints include Chat Completions (/v1/chat/completions) and Embeddings (/v1/embeddings), supporting streaming, tool calls, system prompts, and multi-message context. Configure region-specific BASE_URL: Beijing (https://dashscope.aliyuncs.com/compatible-mode/v1) and Singapore (https://dashscope-intl.aliyuncs.com/compatible-mode/v1).

AWS AI Services

Amazon Web Services

Bedrock inference + the AWS AI suite

AWS AI Services span both generative and traditional AI: Amazon Bedrock (foundation model hosting and inference), Comprehend (NLP), Rekognition (Vision), Polly (Text-to-Speech), Transcribe (Speech-to-Text), Translate (MT), and Lex (Conversational AI). This file uses Amazon Polly’s HTTP endpoint (/v1/speech) as a concrete example; see links for other services.

Azure AI

Microsoft

OpenAI‑compatible /v1 wired into Azure’s enterprise stack

Azure AI is Microsoft’s integrated AI platform on Azure for building AI applications and AI agents, spanning Azure OpenAI Service, Azure AI Services (Vision/Speech/Language/Translator/Content Safety), Azure AI Search, Azure Machine Learning, and Azure AI Studio. This solutions page offers an overview and entry points. Azure AI Inference provides OpenAI‑compatible /v1 endpoints for quick integration and migration.

Claude API — Anthropic Developer Platform

Anthropic PBC

Messages API + prompt caching, tool use done right

Anthropic's Claude API centers on the Messages API, supporting text and image understanding/generation, tool calls (functions/skills), computer use, system prompts, streaming outputs, token counting, and prompt caching. Suitable for assistants, RAG, automation workflows, and enterprise integrations.

DeepSeek API

DeepSeek AI

High-performance, low-cost, and ecosystem-compatible reasoning LLM API.

DeepSeek offers reasoning-enhanced LLM APIs compatible with OpenAI and Anthropic formats, supporting chat completions, tool calls, Chain-of-Thought output, and Beta features (prefix completion, FIM).

Google AI for Developers — Gemini API

Google LLC

Multimodal done right: structured outputs + function calling

The Gemini API by Google AI is a multimodal generative AI service that supports understanding and generation across text, images, video, audio, and PDF. It provides structured outputs, function calling, context caching, batch processing, embeddings, and token counting, suitable for chat agents, content generation, RAG, agent tool use, and large-scale pipelines.

Hugging Face

Hugging Face Inc.

Open‑source ecosystem with unified inference and OpenAI routing

Hugging Face is a leading open-source and open-science AI platform. Its core includes the Hugging Face Hub for models/datasets/Spaces, inference and hosting (Inference Providers and Inference Endpoints), and a rich ecosystem of open-source libraries (Transformers, Diffusers, Datasets, Tokenizers, Accelerate, PEFT, TRL, Safetensors, Transformers.js, smolagents, Text Generation Inference, etc.). The platform supports text, image, audio, video and 3D modalities with unified access via Python, JavaScript and REST/OpenAI-compatible endpoints.

Kimi API

Moonshot AI

Long context with JSON/Partial Modes — OpenAI‑compatible

Kimi by Moonshot AI provides OpenAI-compatible APIs/SDKs for chat completions, tool calls, JSON Mode, Partial Mode, long context, and vision (via kimi-latest). The primary endpoint is POST /v1/chat/completions with streaming and context caching support.

OpenAI API

OpenAI

Unified Responses + Realtime — flagship developer API

OpenAI offers developer REST, streaming, and realtime APIs including the unified Responses API, legacy Chat Completions, Embeddings, Image generation, Audio (TTS/transcription), and Assistants with Threads/Runs. Base URL is https://api.openai.com/v1 and authentication uses Bearer tokens. Usage and rate limits vary by account tier and model; see official rate limit docs and your account console.

Replicate HTTP API

Replicate

One unified predictions API — sync, async, and SSE

Replicate provides a unified HTTP API to run community and official AI models (text, image, audio, video, etc.). The core resource is a "prediction". Developers create runs via POST /v1/predictions or model/deployment-specific endpoints, with support for sync (Prefer: wait) and async modes, SSE streaming, and webhooks. Authentication uses Bearer tokens and the base is https://api.replicate.com/v1. Rate limits are ~600 requests/min for create prediction and ~3000 requests/min for other endpoints. Pricing is per model (time-based or token-based).

Production-grade AI API providers across LLMs, chat/messages, reasoning, embeddings, image/audio/video and model hosting. Includes OpenAI, Anthropic Claude, Google Gemini, Azure AI, AWS AI Services, Alibaba Model Studio (Bailian/Qwen), DeepSeek, Kimi (Moonshot AI), Hugging Face, and Replicate, with many OpenAI-compatible endpoints.