diff options
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 21 |
1 files changed, 10 insertions, 11 deletions
@@ -19,14 +19,13 @@ pip install -r requirements.txt | |||
19 | - [lapjv](https://pypi.org/project/lapjv/) | 19 | - [lapjv](https://pypi.org/project/lapjv/) |
20 | - [POT](https://pypi.org/project/POT/) | 20 | - [POT](https://pypi.org/project/POT/) |
21 | - [mosestokenizer](https://pypi.org/project/mosestokenizer/) | 21 | - [mosestokenizer](https://pypi.org/project/mosestokenizer/) |
22 | - (Optional) If using VecMap | 22 | - NumPy |
23 | * NumPy | 23 | - SciPy |
24 | * SciPy | ||
25 | 24 | ||
26 | <details><summary>We recommend using a virtual environment</summary> | 25 | <details><summary>We recommend using a virtual environment</summary> |
27 | <p> | 26 | <p> |
28 | 27 | ||
29 | In order to create a [virtual environment](https://docs.python.org/3/library/venv.html#venv-def) that resides in a directory `.env` under home; | 28 | In order to create a [virtual environment](https://docs.python.org/3/library/venv.html#venv-def) that resides in a directory `.env` under your home directory; |
30 | 29 | ||
31 | ```bash | 30 | ```bash |
32 | cd ~ | 31 | cd ~ |
@@ -35,11 +34,12 @@ python -m venv evaluating | |||
35 | source ~/.env/evaluating/bin/activate | 34 | source ~/.env/evaluating/bin/activate |
36 | ``` | 35 | ``` |
37 | 36 | ||
38 | After the virtual environment is activated, the python interpreter and the installed packages are isolated. In order for our code to work, the correct environment has to be sourced/activated. | 37 | After the virtual environment is activated, the python interpreter and the installed packages are isolated within. In order for our code to work, the correct environment has to be sourced/activated. |
39 | In order to install all dependencies automatically use the [pip](https://pypi.org/project/pip/) package installer using `requirements.txt`, which resides under the repository directory. | 38 | In order to install all dependencies automatically use the [pip](https://pypi.org/project/pip/) package installer. `pre_requirements.text` includes requirements that packages in `requirements.txt` depend on. Both files come with the repository, so first navigate to the repository and then; |
40 | 39 | ||
41 | ```bash | 40 | ```bash |
42 | # under Evaluating-Dictionary-Alignment | 41 | # under Evaluating-Dictionary-Alignment |
42 | pip install -r pre_requirements.txt | ||
43 | pip install -r requirements.txt | 43 | pip install -r requirements.txt |
44 | ``` | 44 | ``` |
45 | 45 | ||
@@ -50,7 +50,7 @@ Rest of this README assumes that you are in the repository root directory. | |||
50 | 50 | ||
51 | ## Acquiring The Data | 51 | ## Acquiring The Data |
52 | 52 | ||
53 | nltk is required for this stage; | 53 | `nltk` is required for this stage; |
54 | 54 | ||
55 | ```python | 55 | ```python |
56 | import nltk | 56 | import nltk |
@@ -63,8 +63,7 @@ Then; | |||
63 | ./get_data.sh | 63 | ./get_data.sh |
64 | ``` | 64 | ``` |
65 | 65 | ||
66 | This will create two directories; `dictionaries` and `wordnets`. | 66 | This will create two directories; `dictionaries` and `wordnets`. Definition files that are used by the unsupervised methods are in `wordnets/ready`, they come in pairs, `a_to_b.def` and `b_to_a.def` for wordnet definitions in language `a` and `b`. The pairs are aligned linewise; definitons on the same line for either file belong to the same wordnet synset, in the respective language. |
67 | Linewise aligned definition files are in `wordnets/ready`. | ||
68 | 67 | ||
69 | <details><summary>Language pairs and number of available aligned glosses</summary> | 68 | <details><summary>Language pairs and number of available aligned glosses</summary> |
70 | <p> | 69 | <p> |
@@ -94,7 +93,7 @@ Romaian | Albanian | 4646 | |||
94 | 93 | ||
95 | We use [VecMap](https://github.com/artetxem/vecmap) on [fastText](https://fasttext.cc/) embeddings. You can skip this step if you are providing your own polylingual embeddings. | 94 | We use [VecMap](https://github.com/artetxem/vecmap) on [fastText](https://fasttext.cc/) embeddings. You can skip this step if you are providing your own polylingual embeddings. |
96 | 95 | ||
97 | Otherwise; | 96 | Otherwise, |
98 | 97 | ||
99 | * initialize and update the VecMap submodule; | 98 | * initialize and update the VecMap submodule; |
100 | 99 | ||
@@ -110,7 +109,7 @@ git submodule init && git submodule update | |||
110 | ./get_embeddings.sh | 109 | ./get_embeddings.sh |
111 | ``` | 110 | ``` |
112 | 111 | ||
113 | Bear in mind that this will require around 50 GB free space. | 112 | Bear in mind that this will require around 50 GB free space. The mapped embeddings are stored under `bilingual_embedings` using the same naming scheme that `.def` files use. |
114 | 113 | ||
115 | ## Quick Demo | 114 | ## Quick Demo |
116 | 115 | ||