As of March 20, 2025, who is the best embedding mo...
생성일: 2025년 3월 20일
생성일: 2025년 3월 20일
As of March 20, 2025, who is the best embedding model to support Chinese?
As of March 20, 2025, several embedding models have demonstrated strong performance in supporting Chinese language tasks. Here are some of the top models:
acge_text_embedding
Developed by 合合信息, the acge_text_embedding model has achieved the top position on the Chinese Massive Text Embedding Benchmark (C-MTEB). It supports a maximum of 1024 tokens and has a model size of 0.65GB, making it resource-efficient and suitable for various applications.
:::
jina-embeddings-v2-base-zh
This bilingual Chinese/English embedding model supports up to 8192 tokens. It has demonstrated exceptional performance on the Chinese MTEB (C-MTEB) leaderboard, particularly excelling in Chinese language tasks. Notably, it outperforms OpenAI's text-embedding-ada-002 in Chinese-specific applications while maintaining competitive performance in English tasks.
:::
bge-large-zh-v1.5
Developed by the Beijing Academy of Artificial Intelligence, this model is part of the BGE family of embedding models. It has been evaluated using the C-MTEB benchmark, covering six tasks and 35 datasets, and has shown strong performance in Chinese text embeddings.
:::
M3E-base
Developed by Moka AI, M3E-base is a multilingual embedding model that has been evaluated on various benchmarks, including those specific to Chinese language tasks. It offers a balance between performance and model size, making it suitable for applications requiring efficient processing.
:::
tao8k
Developed by Hugging Face developer amu, tao8k is an embedding model that supports up to 8192 tokens. It has been evaluated on Chinese language tasks and offers a good balance between performance and token length support.
:::
When selecting an embedding model for Chinese language tasks, it's essential to consider factors such as model size, token length support, and performance on relevant benchmarks. The models listed above have demonstrated strong capabilities and are among the top choices as of the current date.