Live Updates: COVID-19 Cases
  • World 30,975,579
    World
    Confirmed: 30,975,579
    Active: 7,439,358
    Recovered: 22,575,358
    Death: 960,863
  • USA 6,967,221
    USA
    Confirmed: 6,967,221
    Active: 2,541,168
    Recovered: 4,222,229
    Death: 203,824
  • India 5,398,230
    India
    Confirmed: 5,398,230
    Active: 1,011,732
    Recovered: 4,299,724
    Death: 86,774
  • Brazil 4,528,347
    Brazil
    Confirmed: 4,528,347
    Active: 571,687
    Recovered: 3,820,095
    Death: 136,565
  • Russia 1,097,251
    Russia
    Confirmed: 1,097,251
    Active: 171,450
    Recovered: 906,462
    Death: 19,339
  • Peru 762,865
    Peru
    Confirmed: 762,865
    Active: 123,659
    Recovered: 607,837
    Death: 31,369
  • Mexico 688,954
    Mexico
    Confirmed: 688,954
    Active: 123,959
    Recovered: 492,192
    Death: 72,803
  • South Africa 659,656
    South Africa
    Confirmed: 659,656
    Active: 54,282
    Recovered: 589,434
    Death: 15,940
  • Spain 659,334
    Spain
    Confirmed: 659,334
    Active: 628,839
    Recovered: ?
    Death: 30,495
  • Chile 444,674
    Chile
    Confirmed: 444,674
    Active: 14,319
    Recovered: 418,101
    Death: 12,254
  • France 442,194
    France
    Confirmed: 442,194
    Active: 319,346
    Recovered: 91,574
    Death: 31,274
  • Iran 419,043
    Iran
    Confirmed: 419,043
    Active: 37,293
    Recovered: 357,632
    Death: 24,118
  • UK 390,358
    UK
    Confirmed: 390,358
    Active: 348,599
    Recovered: ?
    Death: 41,759
  • Bangladesh 347,372
    Bangladesh
    Confirmed: 347,372
    Active: 88,073
    Recovered: 254,386
    Death: 4,913
  • Saudi Arabia 329,271
    Saudi Arabia
    Confirmed: 329,271
    Active: 15,383
    Recovered: 309,430
    Death: 4,458
  • Pakistan 305,031
    Pakistan
    Confirmed: 305,031
    Active: 6,572
    Recovered: 292,044
    Death: 6,415
  • Turkey 301,348
    Turkey
    Confirmed: 301,348
    Active: 27,786
    Recovered: 266,117
    Death: 7,445
  • Italy 296,569
    Italy
    Confirmed: 296,569
    Active: 43,161
    Recovered: 217,716
    Death: 35,692
  • Germany 272,308
    Germany
    Confirmed: 272,308
    Active: 19,342
    Recovered: 243,500
    Death: 9,466
  • Canada 142,774
    Canada
    Confirmed: 142,774
    Active: 9,376
    Recovered: 124,187
    Death: 9,211
  • Netherlands 91,934
    Netherlands
    Confirmed: 91,934
    Active: 85,659
    Recovered: ?
    Death: 6,275
  • China 85,269
    China
    Confirmed: 85,269
    Active: 171
    Recovered: 80,464
    Death: 4,634
  • Australia 26,885
    Australia
    Confirmed: 26,885
    Active: 2,079
    Recovered: 23,962
    Death: 844
  • S. Korea 22,893
    S. Korea
    Confirmed: 22,893
    Active: 2,545
    Recovered: 19,970
    Death: 378
  • New Zealand 1,811
    New Zealand
    Confirmed: 1,811
    Active: 67
    Recovered: 1,719
    Death: 25

Facebook in developing single encoder based technology to translate 93 languages

Author at TechGenyz Facebook
Facebook Cross Language Migration

Facebook researchers recently published a paper based on Schwenk (2018a) which proposes an architecture for learning joint multilingual sentence representation in 93 languages using a single BiLSTM encoder and BPE vocabulary shared by all languages. There have been other researches in this area as well but all of them have somewhat been limited in terms of performance primarily because they work on a separate model for each language and a cross connection between different languages is barred.

Facebook researchers are interested in the representation of sentence vectors common to both the input language and NLP tasks. The aim of this research is to help languages with limited resources, to achieve zero-shot migration of NLP models and to implement code conversion. What makes this research different from any other NLPs is that this research sets out to study the joint sentence representation in 93 different languages whereas the common NLP focuses on two languages at the most.

Facebook Language Training Samples

Figure 1: 75 out of 93 languages used to train the proposed model

The study covers a huge number of 34 languages and 28 different writing systems. This herculean task is achieved through the use of zero-shot cross-language natural language inference (XNLI datasets), classification (MLDoc datasets), bitext mining (BUCC datasets), and multilingual similarity searches (Tatoeba datasets). The new data obtained from the research based on Tatoeba Corpus acts as the baseline results for 122 languages.

The architecture of the study works in an encoding-decoding manner. Once a sentence is embedded, it is linearly transformed to initialize the LSTM decoder. There is only one encoder and decoder in the system and the researchers have used a joint byte-pair encoding vocabulary which will make the encoder learn language independent representations. The encoder is limited to 1-5 layers and each layer of every dimension is limited to 512 dimensions. The decoder generates meaning using the language ID and has a 2048 dimensional layer.

Facebook Multi Language Sentence Embedding

Figure 2: Architecture of our system to learn multilingual sentence embeddings.

Moses statistical machine translation system is used for the pre-processing except for Chinese and Japanese texts for which Jieba and Mecab are used to split the texts respectively.

Career

Subscribe