GitHub - deryrahman/word2vec-bahasa-indonesia: Word2Vec untuk bahasa Indonesia dari korpus Wikipedia 📦
Skip to content

deryrahman/word2vec-bahasa-indonesia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word2Vec Bahasa Indonesia

Word2Vec untuk bahasa Indonesia dari dataset Wikipedia

Installation

git clone https://github.com/deryrahman/word2vec-bahasa-indonesia.git
cd word2vec-bahasa-indonesia
pip install -r requirements.txt

Train

python train.py

Some useful arguments

usage: train.py [-h] [--model_path MODEL_PATH]
                [--extracted_path EXTRACTED_PATH] [--dump_path DUMP_PATH]
                [--dim DIM] [--stem STEM]

Word2Vec: Generating word2vec model for bahasa Indonesia

optional arguments:
  -h, --help                        show this help message and exit
  --model_path MODEL_PATH           path for saving trained models
  --extracted_path EXTRACTED_PATH   path for extracting text
  --dump_path DUMP_PATH             path for dump data
  --dim DIM                         embedding size
  --stem STEM                       use stemmer or not. (default false)

Use Pre-Trained Model

You can use a trained model on the folder model or download directly from my drive. Extracted on model folder.

You can use example.py to get a quick insight how to use the model. Please look on gensim documentation as a reference.


References

Medium - diekanugraha

License

Open sourced under the MIT license.

About

Word2Vec untuk bahasa Indonesia dari korpus Wikipedia 📦

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages