Spaces:

huggingface
/

HuggingDiscussions

Running

App Files Files Community

[FEEDBACK] Inference Providers

#49

by julien-c - opened Jan 17, 2025

Discussion

julien-c

Hugging Face org Jan 17, 2025

Any inference provider you love, and that you'd like to be able to access directly from the Hub?

reach-vb

Jan 28, 2025

•

edited Jan 28, 2025

Love that I can call DeepSeek R1 directly from the Hub 🔥

from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="together",
    api_key="xxxxxxxxxxxxxxxxxxxxxxxx"
)

messages = [
    {
        "role": "user",
        "content": "What is the capital of France?"
    }
]

completion = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1", 
    messages=messages, 
    max_tokens=500
)

print(completion.choices[0].message)

benhaotang

Jan 28, 2025

•

edited Jan 28, 2025

Is it possible to set a monthly payment budget or rate limits for all the external providers? I don't see such options in billings tab. In case a key is or session token is stolen, it can be quite dangerous to my thin wallet:(

julien-c

Hugging Face org Jan 28, 2025

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

benhaotang

Jan 28, 2025

•

edited Jan 28, 2025

@benhaotang you already get spending notifications when crossing important thresholds ($10, $100, $1,000) but we'll add spending limits in the future

Thanks for your quick reply, good to know!

sylanaustin

Jan 28, 2025

Would be great if you could add Nebius AI Studio to the list :) New inference provider on the market, with the absolute cheapest prices and the highest rate limits...

Hazzzardous

Jan 28, 2025

Could be good to add featherless.ai

teentitan

Jan 28, 2025

TitanML !!

160 hidden messages

Expand all

uomi-admin

29 days ago

Hi @Wauplin @SBrandeis @julien-c @hanouticelina — opening the registration for UomiRouter as a new inference provider.

UomiRouter is an OpenAI-compatible inference network. Traffic is served by accredited operator nodes that are part of the UOMI network — each operator runs the engine on their own GPU hardware (datacenter or homelab) after a hardware + reliability vetting. Throughput and quality SLAs are guaranteed across the listed catalog. Operators commit to a strict privacy policy (no prompt logging, no training-data collection), payload obfuscation in transit and at rest, and OPoC (Off-chain Proof of Computation): every response is signed by the operator's wallet key and carries a SHA256 of the output (returned as x-wallet-signature / x-wallet-pubkey headers), and a sampled fraction is cross-dispatched to an independent operator for re-verification. The on-chain anchoring layer on UOMI L1 is the next milestone and is not live yet.

The differentiator vs centralized APIs (closed box) and naive decentralized GPU markets (no proof of computation at all) is verifiability today: clients can check off-chain that the operator they were billed for actually produced the tokens they got.

3 PRs already open per the new-provider checklist

huggingface.js: https://github.com/huggingface/huggingface.js/pull/2193
huggingface_hub: https://github.com/huggingface/huggingface_hub/pull/4256
hub-docs (page + sidebar + table + logos): https://github.com/huggingface/hub-docs/pull/2499

Integration details

Endpoint: https://gateway.uomi.ai (OpenAI Chat Completions spec; streaming, tool calling, structured output, vision via Qwen3.6-VL all supported)
Billing endpoint live: POST /partner/hf/billing returns {requests:[{requestId, costNanoUsd}]} per spec, batched up to 10k. Auth token ready to share via DM.
Per-request Inference-Id header: UUID4 emitted on every response.
Org: uomi-network — Team plan active, ready for the server-side partner-flag flip so we can call /api/partners/uomirouter/models for the staging mappings.

Initial catalog (3 conversational models)

FP8 served internally for the two Qwen models, FP8-Dynamic for Gemma:

HF model ID	Type
`Qwen/Qwen3.6-27B`	dense
`Qwen/Qwen3.6-35B-A3B`	MoE
`google/gemma-4-31b-it`	VLM

Contact

info@uomi.ai

Happy to jump on a call or set up a Slack channel for the integration review.

TonyWu888

27 days ago

This is a great idea for Wan model inference. By the way, if you ever need to share HTML previews of your AI video outputs, the HTML to URL Converter is super handy for quick sharing without any signup.

marvintong

26 days ago

•

edited 26 days ago

Hi @julien-c @Wauplin @SBrandeis @hanouticelina,

We're the team behind Phala and we'd like to formally express our interest in joining the Hugging Face Inference Providers program.

About Phala

Phala (phala.com) is a confidential AI cloud that delivers private LLM inference on hardware-protected GPU infrastructure. Phala runs LLMs on dedicated GPU clusters with hardware-level isolation, providing runtime attestation so users can cryptographically verify that their prompts and outputs were never exposed. Our OpenAI-compatible API gateway at https://api.redpill.ai/v1 gives developers a drop-in replacement for the OpenAI SDK with verifiable privacy guarantees.

We are already a verified inference provider on OpenRouter (openrouter.ai/provider/phala), where we have processed over 2.9 billion tokens across 14 models as of April 2026 — demonstrating production-grade reliability and scale.

Why We're a Strong Fit

Unique Differentiation — Confidential Inference: Phala is the only inference provider in the current HF lineup that offers hardware-attested, confidential GPU inference. For users handling sensitive data, regulated workloads, or privacy-critical applications, this is a meaningful capability gap we can fill.

OpenAI-Compatible API: Our API (/v1/chat/completions, /v1/embeddings) is a full drop-in replacement for the OpenAI SDK — making JS and Python client integration straightforward.

Proven Scale on OpenRouter: Our top models by token volume on OpenRouter:

Model	Tokens Processed (Apr 2026)
Qwen2.5 7B Instruct	1.03B
gpt-oss-120b	653M
Kimi K2.6	246M
Qwen3.5-27B	196M
GLM 5.1	171M
Qwen3 VL 30B A3B Instruct	154M
Kimi K2.5	114M
GLM 4.7 Flash	113M
Gemma 3 27B	70.1M

Supported Models & Tasks

We currently serve the following open-weight models on Phala's confidential GPU infrastructure:

HF Model	Task	Context
Qwen/Qwen3.5-27B	Chat Completion (LLM)	256K
Qwen/Qwen3.5-397B-A17B	Chat Completion (LLM)	256K
Qwen/Qwen3-VL-30B-A3B-Instruct	Chat Completion (VLM)	262K
Qwen/Qwen2.5-7B-Instruct	Chat Completion (LLM)	131K
google/gemma-3-27b-it	Chat Completion (VLM)	131K
google/gemma-4-31b-it	Chat Completion (VLM)	262K
openai/gpt-oss-120b	Chat Completion (LLM)	131K
openai/gpt-oss-20b	Chat Completion (LLM)	131K
moonshotai/Kimi-K2.6	Chat Completion (VLM)	262K
moonshotai/Kimi-K2.5	Chat Completion (VLM)	262K
THUDM/GLM-5.1	Chat Completion (LLM)	203K
THUDM/GLM-5	Chat Completion (LLM)	203K
THUDM/GLM-4.7-Flash	Chat Completion (LLM)	200K
THUDM/GLM-4.7	Chat Completion (LLM)	200K
minimax/MiniMax-M2.5	Chat Completion (LLM)	205K
Qwen/Qwen3-Embedding-8B	Feature Extraction	32K
sentence-transformers/all-MiniLM-L6-v2	Feature Extraction	512

Supported Tasks: Chat Completion (LLM), Chat Completion (VLM), Feature Extraction (Embeddings)

Integration Readiness

Our API is fully OpenAI-compatible, so the JS and Python client integration should be straightforward.
We have an existing organization on the Hub (huggingface.co/phalanetwork) and are ready to upgrade to a Team/Enterprise plan as required.
We're prepared to submit the JS client PR, register model mappings via the Model Mapping API, implement the billing endpoint, and follow the full onboarding checklist.
We can have a working integration ready within 1–2 weeks of receiving guidance.

Contact

Website: https://phala.com
API Base URL: https://api.redpill.ai/v1
OpenRouter Profile: https://openrouter.ai/provider/phala
Hub Org: https://huggingface.co/phalanetwork

We're happy to provide API access for testing, jump on a call, or align our implementation to any specific requirements. Looking forward to hearing from you!

Best,
The Phala Team

TonyWu888

17 days ago

This is a really cool move for UomiRouter! Network reliability is so important for inference. By the way, if you ever need to check your connection stability, you can test packet loss online at packetlosstest.dev. Super handy for ensuring smooth performance.

iastafev

13 days ago

New Inference Provider — GonkaBroker (gonkabroker)

Hi! We'd like to be listed as an Inference Provider. We route to the Gonka.ai decentralized inference network via an OpenAI-compatible API.

Org / provider slug: gonkabroker (Team plan active)
JS client PR: https://github.com/huggingface/huggingface.js/pull/2226 (conversational + text-generation)
Base URL: https://proxy.gonkabroker.com (/v1/chat/completions, /v1/completions, /v1/models)
/v1/models exposes per-model pricing (USD/M) + context length for the comparison table / :cheapest.
Billing API: POST https://app.gonkabroker.com/api/partners/huggingface/billing, Bearer auth, {"requestIds":[...]} → {"requests":[{"requestId","costNanoUsd"}]} (nano-USD).
Request ID header: Inference-Id on every response.
Logo (light + dark) ready to send.

Could you enable the org server-side so we can register model mappings? Happy to share the master key for routed traffic over a private channel. Thanks!

iastafev

8 days ago

Following up on the GonkaBroker provider request above — the JS PR is open and ready for review: https://github.com/huggingface/huggingface.js/pull/2226 (CI awaiting maintainer approval to run). Paid Team plan active, all prerequisites done. Would love help getting it moving.

GamerHash

6 days ago

•

edited 6 days ago

New Inference Provider integration — deAPI

Hi HF team 👋

We're deAPI (https://deapi.ai, HF org: https://huggingface.co/deAPI-ai) and we'd like to register as an official Inference Provider on the Hub.

deAPI provides an OpenAI-compatible inference API at:

https://oai.deapi.ai/v1

with Bearer token authentication. The endpoint works with the official OpenAI SDK by changing the base_url, and we also maintain a native Python SDK for deAPI workflows.

For the initial integration, we'd like to start with image, audio and embedding tasks:

text-to-image
- FLUX.2 Klein 4B BF16
- Z-Anime
- Z-Image-Turbo INT8
- FLUX.1 schnell
image-to-image
- FLUX.2 Klein 4B BF16
- Qwen Image Edit Plus
automatic-speech-recognition
- Whisper Large V3
text-to-speech
- Qwen3 TTS CustomVoice
- Qwen3 TTS VoiceDesign
- Qwen3 TTS VoiceClone
- Chatterbox
- Kokoro
feature-extraction / embeddings
- BGE M3

Exact model slugs, limits and defaults can be resolved through our Model Selection endpoint during the mapping setup.

LLM/chat models are also on our near-term roadmap, and we expect to add conversational / text-generation support shortly after the initial provider setup is live.

We've read the “register as a provider” guide and have already opened the following huggingface.js PR:

https://github.com/huggingface/huggingface.js/pull/2240

The PR starts with text-to-image on a single model in staging. We're also adding a per-response correlation header using Inference-Id with a UUID value, so it can support routed billing and request reconciliation.

One clarification on plan requirements before we finalize the setup. We understand a Team/Enterprise org plan is needed for the Model Mapping API (the routed path). Is the same plan required for a custom-key-only listing (clientSideRoutingOnly, like OpenAI) — where users add their own deAPI key and we bill them directly? We’re considering starting custom-key-only and adding routed billing (dedicated key + reconciliation) in a later phase, so we’d like to understand the plan prerequisites for each path. Thanks!

Happy to continue here or by email if that works better. Thanks!

wassel-umans-ai

2 days ago

•

edited 2 days ago

Hugging Face CLI

Request to become an Inference Provider: Umans AI (GLM 5.2 + Kimi K2.7-Code)

Hi HF team 👋

We'd like to list Umans AI as an Inference Provider. We serve open-weight models on our own infrastructure, starting with:

zai-org/GLM-5.2
moonshot/kimi-k2.7-code (Kimi K2.7-Code)

On our side:

HF org umans-ai is on the Team plan.
We expose an OpenAI-compatible API (/v1/chat/completions, and /v1/models already serving pricing + context_length), which is the surface HF integrates with, plus an Anthropic Messages-compatible API (/v1/messages).
We've implemented the billing callback (it maps each request id to its cost in nano-USD) and emit a unique Inference-Id response header, per the provider docs.
Ready to open the huggingface.js provider PR and register model mappings as soon as our org is enabled server-side.

Could you point us to the next step to enable the org for the Model Mapping API, and the right contact for the commercial/payout setup?

Public API base: https://api.code.umans.ai
Contact: contact@umans.ai

Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment