As of March 20, 2025, who is the best embedding mo...

Created on: March 20, 2025

Answered using GPT-4o by Chat01

Question

As of March 20, 2025, who is the best embedding model to support Chinese?

Answer

As of March 20, 2025, several embedding models have demonstrated strong performance in supporting Chinese language tasks. Here are some of the top models:

:::contextList README.md · aspire/acge_text_embedding at main

acge_text_embedding
Developed by 合合信息, the acge_text_embedding model has achieved the top position on the Chinese Massive Text Embedding Benchmark (C-MTEB). It supports a maximum of 1024 tokens and has a model size of 0.65GB, making it resource-efficient and suitable for various applications. :::

:::contextList jinaai/jina-embeddings-v2-base-zh · How to use onnx model ? I need to ...

jina-embeddings-v2-base-zh
This bilingual Chinese/English embedding model supports up to 8192 tokens. It has demonstrated exceptional performance on the Chinese MTEB (C-MTEB) leaderboard, particularly excelling in Chinese language tasks. Notably, it outperforms OpenAI's text-embedding-ada-002 in Chinese-specific applications while maintaining competitive performance in English tasks. :::

:::contextList 基于BGE模型实现RAG中查询queue的向量化_rag bge-CSDN博客

bge-large-zh-v1.5
Developed by the Beijing Academy of Artificial Intelligence, this model is part of the BGE family of embedding models. It has been evaluated using the C-MTEB benchmark, covering six tasks and 35 datasets, and has shown strong performance in Chinese text embeddings. :::

:::contextList GitHub - ibbd-dev/m3e-embedding: m3e embedding models api,基于fastapi实现

M3E-base
Developed by Moka AI, M3E-base is a multilingual embedding model that has been evaluated on various benchmarks, including those specific to Chinese language tasks. It offers a balance between performance and model size, making it suitable for applications requiring efficient processing. Amu/tao-8k · Hugging Face

:::

:::contextList Amu/tao-8k · Hugging Face

tao8k
Developed by Hugging Face developer amu, tao8k is an embedding model that supports up to 8192 tokens. It has been evaluated on Chinese language tasks and offers a good balance between performance and token length support. Amu/tao-8k · Hugging Face

:::

When selecting an embedding model for Chinese language tasks, it's essential to consider factors such as model size, token length support, and performance on relevant benchmarks. The models listed above have demonstrated strong capabilities and are among the top choices as of the current date.

Share this Q&A