February 19, 2025


Apply libreoffice styles using a Macro and create PDF

I have this dockerfile that is working as expected. I use it to convert a txt file to pdf after formatting it using a style created by macro.

FROM ubuntu:latest

# Install LibreOffice and scripting dependencies
RUN apt-get update && apt-get install -y libreoffice libreoffice-script-provider-python libreoffice-script-provider-bsh libreoffice-script-provider-js

# Install required dependencies
RUN apt-get update && apt-get install -y wget unzip fonts-dejavu

# Download and install Shobhika font
RUN mkdir -p /usr/share/fonts/truetype/shobhika && wget -O /tmp/ && unzip /tmp/ -d /tmp/shobhika && mv /tmp/shobhika/Shobhika-1.05/*.otf /usr/share/fonts/truetype/shobhika/

# Create necessary directories with proper permissions
RUN mkdir -p /app/.config/libreoffice/4/user/basic/Standard
RUN chmod -R 777 /app/.config

# Set LibreOffice user profile path
ENV UserInstallation=file:///app/.config/libreoffice/4/user

COPY StyleLibrary.oxt /app/
COPY marathi_spell_check.oxt /app/
COPY myfile.txt /app/

RUN unopkg add /app/StyleLibrary.oxt --shared
RUN unopkg add /app/marathi_spell_check.oxt --shared

# Run the LibreOffice macro
CMD soffice --headless --invisible --norestore "macro:///StyleLibrary.Module1.myStyleMacro2(\"/app/myfile.txt\")"

# create an image:
docker build -t shantanuo/mylibre .

# Run the container:
docker run -v .:/app/ --rm shantanuo/mylibre

As you can see I have applied the styles from StyleLibrary to myfile and then created a pdf document successfully.

January 14, 2025


RAG made easy using LLama

# use virtual environment to install python and packages

uv init ai-app2

cd ai-app2

pip install llama-index

# download training data

mkdir data

cd data


cd ..

# start python prompt


import os


from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()

index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

response = query_engine.query("What the author do growing up?")


December 03, 2024


Avoid uploading a file to S3 again

Let's assume I uploaded a file to S3:

aws s3 cp dictionaries.xcu s3://cf-templates-us-east-1/

I need to update that file only if it does not already exist. In that case I use --if-none-match parameter as shown below:

aws s3api put-object --bucket cf-templates-us-east-1 --key dictionaries.xcu --body dictionaries.xcu --if-none-match "*"   

Returns "An error occurred (PreconditionFailed)"

This will help me while trying to upload a large file again.


Following features are available for S3 Express One Zone:

1) In directory buckets, clients can perform conditional delete checks on an object’s last modified time, size, and Etag using the x-amz-if-match-last-modified-time, x-amz-if-match-size, and HTTP if-match headers. 

2) Append data to a file:

aws s3api put-object --bucket cf-templates-us-east-1 --key dictionaries.xcu --body dictionaries.xcu --write-offset-bytes file001.bin

Or use python:

s3.put_object(Bucket='amzn-s3-demo-bucket--use2-az2--x-s3', Key='2024-11-05-sdk-test', Body=b'123456789', WriteOffsetBytes=9)

It can not replace your database or Messaging Queues because only a few thousand updates are possible for each object.

3) You can configure S3 Lifecycle rules for S3 Express One Zone to expire objects on your behalf. For example, you can create an S3 Lifecycle rule that expires all objects smaller than 512 KB after 3 days and another rule that expires all objects in a prefix after 10 days.


November 17, 2024


Language prediction

FastText library by facebook has the language detection feature.

import fasttext
model = fasttext.load_model("/tmp/lid.176.ftz")
model.predict(" विकिपीडिया पर", k=2)

The above code returns Hindi "hi" correctly. Google also has it's own library called langdetect. The following code returns Marathi "mr" correctly.

from langdetect import detect
detect("आत्मा आणि")

The polyglot library has supported this and other language tools since a very long time.

October 28, 2024


awk Case Study - 14

 1) Download stardict files:

git clone

2) Download python package to read stardict files:
git clone
cd pyglossary/
cp /home/ubuntu/fd-dictionaries/eng-hin/eng-hin.tei .

# convert eng-hin.tei file to out.txt

Select the first 3 columns:

Change multiple HTML tags to a single pipe | delimiter
and display the first 3 columns

awk '{
    gsub(/<[^>]*>/, "|");
    gsub(/\|+/, "|");
    match($0, /([^|]*\|){3}/);
    first_three = substr($0, RSTART, RLENGTH);  
    print first_three
}' out.txt > test.csv



