社区首页 >专栏 >将向量提取器用于平行语料对齐的一个小示例


发布2023-12-09 10:12:37
发布2023-12-09 10:12:37
from sentence_transformers import SentenceTransformer 
import numpy as np
from os import path

model_path = (
    if path.isdir('/data/m3e-base') 
    else 'moka-ai/m3e-base'
model = SentenceTransformer(model_path)

zh_list = [
    "国际高等教育研究机构QS Quacquarelli Symonds于2023年6月28日正式发布第20版世界大学排名,首次将就业能力和可持续发展指标纳入排名体系,成为全球唯一一个同时包含这两项指标的排名。",
    "瑞典皇家科学院2022年10月10日在斯德哥尔摩宣布,将2022年诺贝尔经济学奖授予经济学家本·伯南克(Ben Bernanke)、道格拉斯·戴蒙德(Douglas Diamond)和菲利普·迪布维格(Philip Dybvig),以表彰他们在银行与金融危机研究领域的突出贡献。",
    "费曼学习法可以简化为四个单词:Concept (概念)、Teach (教给别人)、Review (评价)、Simplify (简化)。  费曼学习法的灵感源于诺贝尔物理奖获得者理查德•费曼(Richard Feynman),运用费曼技巧,你只需花上20分钟就能深入理解知识点,而且记忆深刻,难以遗忘。知识有两种类型,我们绝大多数人关注的都是错误的那类。第一类知识注重了解某个事物的名称。第二类知识注重了解某件事物。这可不是一回事儿。著名的诺贝尔物理学家理查德·费曼(Richard Feynman)能够理解这二者间的差别,这也是他成功最重要的原因之一。事实上,他创造了一种学习方法,确保他会比别人对事物了解的更透彻。",
en_list = [
    "On November 10th, 2022, Forbes published the 2022 China Mainland Rich List. The total wealth of the people on this list dropped from $1.48 trillion last year to $907.1 billion, a drop of 39%, which was the biggest drop since Forbes surveyed the richest people in mainland China for more than 20 years. " ,
    "New energy refers to various forms of energy other than traditional energy. All its forms come directly or indirectly from the heat energy generated by the sun or the earth. Including solar energy, wind energy, biomass energy, geothermal energy, water energy and ocean energy, as well as energy generated by biofuels and hydrogen derived from renewable energy. It can also be said that new energy includes all kinds of renewable energy and nuclear energy. Compared with traditional energy sources, new energy sources generally have the characteristics of less pollution and large reserves, which is of great significance to solve the serious environmental pollution problem and the depletion of resources (especially fossil energy) in the world today. " ,
    "QS Quacquarelli Symonds, an international higher education research institution, officially released the 20th edition of the World University Rankings on June 28th, 2023, which brought employability and sustainable development indicators into the ranking system for the first time, becoming the only ranking in the world that includes both indicators." ,
    "Feynman learning method can be simplified to four words: Concept, Teach, Review and Simplify. Feynman's learning method is inspired by Richard Feynman, the Nobel Prize winner in physics. With Feynman's skills, you can understand the knowledge points in depth in just 20 minutes, and it is memorable and hard to forget. There are two types of knowledge, and most of us pay attention to the wrong kind. The first kind of knowledge focuses on knowing the name of something. The second kind of knowledge focuses on understanding something. This is not the same thing. Richard Feynman, a famous Nobel physicist, can understand the difference between the two, which is one of the most important reasons for his success. In fact, he created a learning method to ensure that he would know things better than others. " ,
    "The Royal Swedish Academy of Sciences announced in Stockholm on October 10th, 2022 that it would award the 2022 Nobel Prize in Economics to economists Ben Bernanke, Douglas Diamond and Philip Dybvig in recognition of their outstanding contributions in the field of banking and financial crisis research." ,

zh_vecs = model.encode(zh_list)
en_vecs = model.encode(en_list)

def l2_norm(arr, axis=-1):
    return (arr ** 2).sum(axis=axis, keepdims=True) ** 0.5

en_vecs /= l2_norm(en_vecs)
zh_vecs /= l2_norm(zh_vecs)

sim_mat = en_vecs @ zh_vecs.T
sims = np.sort(sim_mat, axis=-1)[:, ::-1]
idcs = np.argsort(sim_mat, axis=-1)[:, ::-1]

idcs_top1 = idcs[:, 0].ravel()
sims_top1 = sims[:, 0].ravel()

for i, (j, sim) in enumerate(zip(idcs_top1, sims_top1)):
    print(en_list[i] + '\n' + zh_list[j] + f'\n相似度:{sim}\n' + '=' * 30)

On November 10th, 2022, Forbes published the 2022 China Mainland Rich List. The total wealth of the people on this list dropped from $1.48 trillion last year to $907.1 billion, a drop of 39%, which was the biggest drop since Forbes surveyed the richest people in mainland China for more than 20 years.
New energy refers to various forms of energy other than traditional energy. All its forms come directly or indirectly from the heat energy generated by the sun or the earth. Including solar energy, wind energy, biomass energy, geothermal energy, water energy and ocean energy, as well as energy generated by biofuels and hydrogen derived from renewable energy. It can also be said that new energy includes all kinds of renewable energy and nuclear energy. Compared with traditional energy sources, new energy sources generally have the characteristics of less pollution and large reserves, which is of great significance to solve the serious environmental pollution problem and the depletion of resources (especially fossil energy) in the world today.
QS Quacquarelli Symonds, an international higher education research institution, officially released the 20th edition of the World University Rankings on June 28th, 2023, which brought employability and sustainable development indicators into the ranking system for the first time, becoming the only ranking in the world that includes both indicators.
国际高等教育研究机构QS Quacquarelli Symonds于2023年6月28日正式发布第20版世界大学排名,首次将就业能力和可持续发展指标纳入排名体系,成为全球唯一一个同时包含这两项指标的排名。
Feynman learning method can be simplified to four words: Concept, Teach, Review and Simplify. Feynman's learning method is inspired by Richard Feynman, the Nobel Prize winner in physics. With Feynman's skills, you can understand the knowledge points in depth in just 20 minutes, and it is memorable and hard to forget. There are two types of knowledge, and most of us pay attention to the wrong kind. The first kind of knowledge focuses on knowing the name of something. The second kind of knowledge focuses on understanding something. This is not the same thing. Richard Feynman, a famous Nobel physicist, can understand the difference between the two, which is one of the most important reasons for his success. In fact, he created a learning method to ensure that he would know things better than others.
费曼学习法可以简化为四个单词:Concept (概念)、Teach (教给别人)、Review (评价)、Simplify (简化)。  费曼学习法的灵感源于诺贝尔物理奖获得者理查德•费曼(Richard Feynman),运用费曼技巧,你只需花上20分钟就能深入理解知识点,而且记忆深刻, 难以遗忘。知识有两种类型,我们绝大多数人关注的都是错误的那类。第一类知识注重了解某个事物的名称。第二类知识注重了解某件事物。这可不是一回事儿。著名的诺贝尔物理学家理查德·费曼(Richard Feynman)能够理解这二者间的差别,这也是他成功最重要的原 因之一。事实上,他创造了一种学习方法,确保他会比别人对事物了解的更透彻。
The Royal Swedish Academy of Sciences announced in Stockholm on October 10th, 2022 that it would award the 2022 Nobel Prize in Economics to economists Ben Bernanke, Douglas Diamond and Philip Dybvig in recognition of their outstanding contributions in the field of banking and financial crisis research.
瑞典皇家科学院2022年10月10日在斯德哥尔摩宣布,将2022年诺贝尔经济学奖授予经济学家本·伯南克(Ben Bernanke)、道格拉斯·戴蒙德(Douglas Diamond)和菲利普·迪布维格(Philip Dybvig),以表彰他们在银行与金融危机研究领域的突出贡献。
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。
原始发表:2023-12-08,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 作者个人站点/博客 前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

0 条评论
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档