Connect with us

AI

Approaching.AI’s Pre-A Round Bets on China’s Token Production Gap

Approaching.AI raised a Pre-A round of hundreds of millions of yuan to expand its ATaaS platform, which handles nearly one trillion tokens daily.

Published

on

Approaching.AI has closed a Pre-A round worth hundreds of millions of yuan, co-led by Xinglian Capital and Huakong Technology, to expand an inference platform that already handles nearly one trillion tokens of enterprise AI output every day. The company, whose legal name is Qujing Technology, built its platform around a concept its CEO calls TaaS, short for Token as a Service, positioning the business above the Model-as-a-Service tier where China’s cloud giants compete and focusing on the predictable, high-throughput delivery of each output token.

China’s National Data Administration confirmed in March that the country’s AI ecosystem processes 140 trillion tokens daily, up from 100 billion at the start of 2024, per China Europe International Business School’s analysis of China’s token economy. The capital is priced on who can convert raw compute into the stable, measurable output stream enterprise production depends on.

The Pre-A Round and What It’s Buying

Xinglian Capital, a VC fund focused on China’s large-model ecosystem, co-led the round with Huakong Technology. Follow-on commitments came from Honghui Capital, Tianhao Energy, Shangshi Capital, Tianjin Ren’ai Hongsheng, and Hangzhou Fucheng. GL Ventures, which backed the company at an earlier stage, increased its position.

Approaching.AI says the proceeds go toward two priorities. First, computing-power reserves: securing GPU capacity before competitors bid it up in a domestic chip environment expected to stay tight through the next two years. Second, the underlying inference system that converts hardware capacity into sustained token delivery. Enterprise customers benchmark AI infrastructure on the output rate and latency that hardware produces under production load.

The platform already serves enterprise clients including Zhipu, developer of the GLM model family and one of China’s five leading foundation-model labs, and Kimi, Moonshot AI’s consumer AI assistant. Those two names confirm the platform has cleared the gap from concept to production under demanding real-world workloads.

Honghui Capital, Tianhao Energy, and the other follow-on backers bring capital and strategic reach into sectors, including energy infrastructure and industrial supply chains, where enterprise AI adoption is accelerating and token-production reliability is a harder requirement than it is in consumer applications.

From Model Access to Token Production

Ai Zhiyuan, Approaching.AI’s founder and CEO and a computer science PhD from Tsinghua University, built the TaaS argument around a specific observation: the metrics enterprise engineers track in production are not “which models are available” but “does each call complete reliably, and at what speed?”

The MaaS (Model as a Service) model’s unit of commerce is the API call. That pricing structure leaves the engineering burden of reliability, throughput, and structured output stability with the buyer. An enterprise running a hundred parallel jobs on a MaaS platform deals with TTFT (first-token return latency, the delay between sending a request and receiving the first output token) spikes under load, function-call failures during high-concurrency bursts, and garbled structured outputs when the model runs hot. Those failures show up as business-logic errors in the product.

ATaaS (AI Token as a Service, the full platform name) moves those problems into the infrastructure layer. The billing unit becomes the output token; the contract guarantee is TTFT stability, a TPS (tokens per second) floor, and reliable structured output across concurrent workloads. The enterprise pays per token of useful output delivered.

Approaching.AI’s model strategy follows from this commitment. The company focuses on a short list of high-productivity models and optimizes each one’s inference path continuously, tuning for output quality, TTFT stability, and TPS on the actual job patterns its customers run. The company reports sustained 30-to-50 TPS output with stable TTFT across high-concurrency enterprise workloads.

The Platform Built for Enterprise Throughput

First-Token Latency and Output Speed

ATaaS delivers throughput guarantees through five core capabilities: heterogeneous computing power scheduling (allocating different hardware types to different workload stages), cross-cluster cache sharing (keeping KV caches warm across separate compute clusters), inference link isolation (preventing one enterprise job’s load from degrading another’s), elastic scaling, and full-link quality monitoring. Together these address the four failure modes enterprise engineers encounter in production inference: latency spikes, cold-start delays, resource contention between concurrent workloads, and undetected output degradation.

Dimension Traditional MaaS Approaching.AI’s Platform
Primary billing unit API call Token of useful output
TTFT guarantee Variable, unspecified Stable and predictable
Output throughput No floor guarantee 30-to-50 TPS
Structured output reliability Best-effort Full-link quality monitoring
Concurrency behaviour Shared resource pool Isolated inference links per job
Model breadth Hundreds of models Select few, deeply optimized

Why Fewer Models Beats More Models

A MaaS platform that supports hundreds of models maintains hundreds of inference stacks, each running at generalist performance levels. Approaching.AI focuses on a short roster and optimizes each model’s inference path continuously, tuning output quality, TTFT stability, and TPS for the actual job patterns its customers run. Enterprise buyers whose production environments rely on two or three models are paying for how those specific models perform under their specific workloads, and depth of optimization per model is what delivers that.

The GLM and Moonshot AI deployments sit among China’s most demanding inference workloads by concurrency, context length, and structured-output requirement. Running both at production scale, under contract, puts the platform’s isolation and scheduling capabilities through conditions no internal benchmark replicates.

The Tsinghua Research Transfer

The technical base Approaching.AI is commercializing comes from more than two decades of academic work at Tsinghua’s High-Performance Computing (HPC) Institute, whose MADSys research group built expertise in parallel computing, distributed storage systems, and large-model inference infrastructure well before the current investment cycle began.

Academician Wei-Min Zheng, appointed Approaching.AI’s Chief Scientific Advisor in March 2026, is a globally recognized authority in high-performance computing and scalable storage architectures, with multiple national science and technology awards. Professor Yongwei Wu, the Chief Scientist, is an IEEE Fellow and AAIA Fellow whose career has concentrated on parallel and distributed systems and big data infrastructure. Associate Professor Zhang Mingxing leads the teams behind both KTransformers and Mooncake, focusing specifically on large-model inference architectures.

Approaching.AI completed a capital increase and shareholding with relevant Tsinghua technical achievements, transferring IP from those research programs onto the company’s commercial balance sheet. IP ownership on the balance sheet ties the academic research to the company’s valuation in a way advisory roles don’t.

Ren Xuyang, the chairman, brings commercial experience from the early years of Baidu and from founding iQiyi and Yidian Zixun. Dr. Wu Wenjie, the president, holds a finance PhD and a CFA designation, spent years as a senior executive at industrial and capital institutions, and oversees global strategy and operations.

Open-Source Projects Turned Enterprise Infrastructure

KTransformers, the CPU-GPU heterogeneous inference framework co-developed with Tsinghua’s MADSys group and presented at SOSP ’25 (the ACM SIGOPS 31st Symposium on Operating Systems Principles), has accumulated more than 17,200 GitHub stars. Its specific contribution: enabling efficient inference of MoE (mixture-of-experts) large language models on CPU-GPU hybrid hardware, which lets high-performance inference run on resource-constrained setups that pure-GPU deployments can’t serve. GLM, MiniMax, and Qwen now ship day-zero compatibility with the framework, and it has been integrated into the SGLang inference stack.

The Mooncake distributed inference platform is the actual production infrastructure Kimi runs on, co-built in November 2024 alongside Moonshot AI, Tsinghua’s MADSys Lab, 9#AISoft, Alibaba Cloud, and Ant Group. Its Transfer Engine and Mooncake Store have since been integrated into vLLM, SGLang, TensorRT-LLM, and NVIDIA Dynamo. In July 2025, the platform powered a Kimi K2 deployment across 128 H200 GPUs, sustaining 224,000 tokens per second in prefill throughput.

  • KTransformers: 17,200+ GitHub stars; MoE CPU-GPU hybrid inference; peer-reviewed at SOSP ’25; integrated into SGLang; default inference engine for GLM, MiniMax, and Qwen
  • Mooncake: production serving platform for Kimi; Transfer Engine integrated into vLLM v1, TensorRT-LLM, and NVIDIA Dynamo; 224,000 tokens per second prefill throughput on 128 H200 GPUs in July 2025
  • Active contributions by Approaching.AI team members to SGLang, vLLM, and NVIDIA Dynamo open-source communities

China’s Token Economy Gets Its Infrastructure Layer

JPMorgan projects a 370-fold growth in China’s inference token consumption between 2025 and 2030, a trajectory that sits above any supply-side capacity plan the market currently has. China’s entire MaaS market was worth 710 million yuan in 2024, per IDC, a baseline an Alibaba Cloud senior executive said could grow to 30% or more of total cloud revenue. Both figures point to the same structural gap: high-quality token delivery at enterprise scale is undersupplied.

  • 140 trillion tokens processed daily across China’s AI ecosystem as of March 2026, per the National Data Administration, up from 100 billion at the start of 2024
  • 370-fold projected growth in China’s inference token consumption, 2025 to 2030, per JPMorgan
  • 710 million yuan – China’s entire MaaS market in 2024, per IDC, a number an Alibaba Cloud executive said could grow to 30% or more of total cloud revenue

The AI Infra industry, which can stably provide high-quality Tokens on a large scale, will become the key infrastructure for the booming development of the AI industry, with broad market space and high investment value.

Zhang Yang, chairman of Huakong Fund, made that case at deal close. Xinglian Capital partner Li Wenjue cited “the systematic breakthrough of its ATaaS platform and the company’s ability to quickly transform top-notch academic achievements into large-scale commercial implementation,” naming AI agent workloads as the forward demand driver: autonomous, multi-step workflows consume far more tokens per task than single-query interactions do.

A 370-fold increase in inference consumption over five years leaves room for very few companies to own the token-production layer, and Approaching.AI has spent two years building the case that it should be one of them.

Logan Pierce is a writer and web publisher with over seven years of experience covering consumer technology. He has published work on independent tech blogs and freelance bylines covering Android devices, privacy focused software, and budget gadgets. Logan founded Oton Technology to publish clear, no nonsense tech news and reviews based on real hands on testing. He has personally tested and reviewed dozens of mid range and budget Android phones, written extensively about app privacy, and built and managed multiple WordPress publications over the past decade. Logan holds a bachelor's degree in English and studied digital marketing at a certificate level.

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Trending