1) Download stardict files:
git clone https://github.com/freedict/fd-dictionaries.git2) Download python package to read stardict files:
git clone https://github.com/ilius/pyglossary.git
cd pyglossary/
cp /home/ubuntu/fd-dictionaries/eng-hin/eng-hin.tei .
python3 main.py
# convert eng-hin.tei file to out.txt
Select the first 3 columns:
Change multiple HTML tags to a single pipe | delimiter
and display the first 3 columns
awk '{
gsub(/<[^>]*>/, "|");
gsub(/\|+/, "|");
match($0, /([^|]*\|){3}/);
first_three = substr($0, RSTART, RLENGTH);
print first_three
}' out.txt > test.csv
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.