As of March 20, 2025, who is the best embedding mo...

As of March 20, 2025, several embedding models have demonstrated strong performance in supporting Chinese language tasks. Here are some of the top models:

:::contextList

acge_text_embedding
Developed by 合合信息, the acge_text_embedding model has achieved the top position on the Chinese Massive Text Embedding Benchmark (C-MTEB). It supports a maximum of 1024 tokens and has a model size of 0.65GB, making it resource-efficient and suitable for various applications. :::

:::contextList

jina-embeddings-v2-base-zh
This bilingual Chinese/English embedding model supports up to 8192 tokens. It has demonstrated exceptional performance on the Chinese MTEB (C-MTEB) leaderboard, particularly excelling in Chinese language tasks. Notably, it outperforms OpenAI's text-embedding-ada-002 in Chinese-specific applications while maintaining competitive performance in English tasks. :::

:::contextList

bge-large-zh-v1.5
Developed by the Beijing Academy of Artificial Intelligence, this model is part of the BGE family of embedding models. It has been evaluated using the C-MTEB benchmark, covering six tasks and 35 datasets, and has shown strong performance in Chinese text embeddings. :::

:::contextList

M3E-base
Developed by Moka AI, M3E-base is a multilingual embedding model that has been evaluated on various benchmarks, including those specific to Chinese language tasks. It offers a balance between performance and model size, making it suitable for applications requiring efficient processing.

:::

:::contextList

tao8k
Developed by Hugging Face developer amu, tao8k is an embedding model that supports up to 8192 tokens. It has been evaluated on Chinese language tasks and offers a good balance between performance and token length support.

:::

When selecting an embedding model for Chinese language tasks, it's essential to consider factors such as model size, token length support, and performance on relevant benchmarks. The models listed above have demonstrated strong capabilities and are among the top choices as of the current date.

As of March 20, 2025, who is the best embedding mo...

Question

Answer

Share this Q&A