aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--README.md21
1 files changed, 10 insertions, 11 deletions
diff --git a/README.md b/README.md
index eb09836..338c51d 100644
--- a/README.md
+++ b/README.md
@@ -19,14 +19,13 @@ pip install -r requirements.txt
19- [lapjv](https://pypi.org/project/lapjv/) 19- [lapjv](https://pypi.org/project/lapjv/)
20- [POT](https://pypi.org/project/POT/) 20- [POT](https://pypi.org/project/POT/)
21- [mosestokenizer](https://pypi.org/project/mosestokenizer/) 21- [mosestokenizer](https://pypi.org/project/mosestokenizer/)
22- (Optional) If using VecMap 22- NumPy
23 * NumPy 23- SciPy
24 * SciPy
25 24
26<details><summary>We recommend using a virtual environment</summary> 25<details><summary>We recommend using a virtual environment</summary>
27<p> 26<p>
28 27
29In order to create a [virtual environment](https://docs.python.org/3/library/venv.html#venv-def) that resides in a directory `.env` under home; 28In order to create a [virtual environment](https://docs.python.org/3/library/venv.html#venv-def) that resides in a directory `.env` under your home directory;
30 29
31```bash 30```bash
32cd ~ 31cd ~
@@ -35,11 +34,12 @@ python -m venv evaluating
35source ~/.env/evaluating/bin/activate 34source ~/.env/evaluating/bin/activate
36``` 35```
37 36
38After the virtual environment is activated, the python interpreter and the installed packages are isolated. In order for our code to work, the correct environment has to be sourced/activated. 37After the virtual environment is activated, the python interpreter and the installed packages are isolated within. In order for our code to work, the correct environment has to be sourced/activated.
39In order to install all dependencies automatically use the [pip](https://pypi.org/project/pip/) package installer using `requirements.txt`, which resides under the repository directory. 38In order to install all dependencies automatically use the [pip](https://pypi.org/project/pip/) package installer. `pre_requirements.text` includes requirements that packages in `requirements.txt` depend on. Both files come with the repository, so first navigate to the repository and then;
40 39
41```bash 40```bash
42# under Evaluating-Dictionary-Alignment 41# under Evaluating-Dictionary-Alignment
42pip install -r pre_requirements.txt
43pip install -r requirements.txt 43pip install -r requirements.txt
44``` 44```
45 45
@@ -50,7 +50,7 @@ Rest of this README assumes that you are in the repository root directory.
50 50
51## Acquiring The Data 51## Acquiring The Data
52 52
53nltk is required for this stage; 53`nltk` is required for this stage;
54 54
55```python 55```python
56import nltk 56import nltk
@@ -63,8 +63,7 @@ Then;
63./get_data.sh 63./get_data.sh
64``` 64```
65 65
66This will create two directories; `dictionaries` and `wordnets`. 66This will create two directories; `dictionaries` and `wordnets`. Definition files that are used by the unsupervised methods are in `wordnets/ready`, they come in pairs, `a_to_b.def` and `b_to_a.def` for wordnet definitions in language `a` and `b`. The pairs are aligned linewise; definitons on the same line for either file belong to the same wordnet synset, in the respective language.
67Linewise aligned definition files are in `wordnets/ready`.
68 67
69<details><summary>Language pairs and number of available aligned glosses</summary> 68<details><summary>Language pairs and number of available aligned glosses</summary>
70<p> 69<p>
@@ -94,7 +93,7 @@ Romaian | Albanian | 4646
94 93
95We use [VecMap](https://github.com/artetxem/vecmap) on [fastText](https://fasttext.cc/) embeddings. You can skip this step if you are providing your own polylingual embeddings. 94We use [VecMap](https://github.com/artetxem/vecmap) on [fastText](https://fasttext.cc/) embeddings. You can skip this step if you are providing your own polylingual embeddings.
96 95
97Otherwise; 96Otherwise,
98 97
99* initialize and update the VecMap submodule; 98* initialize and update the VecMap submodule;
100 99
@@ -110,7 +109,7 @@ git submodule init && git submodule update
110./get_embeddings.sh 109./get_embeddings.sh
111``` 110```
112 111
113Bear in mind that this will require around 50 GB free space. 112Bear in mind that this will require around 50 GB free space. The mapped embeddings are stored under `bilingual_embedings` using the same naming scheme that `.def` files use.
114 113
115## Quick Demo 114## Quick Demo
116 115