Project Description:
This technical project on lexicography develops an automated parallel alignment tool to boost the efficiency of processing bilingual transcripts from People's Republic of China (PRC) government press conferences. The system automatically detects and retrieves new press releases from ministry websites, extracts Chinese and English sentences, and performs sentence-level semantic alignment using the LaBSE model combined with customized scoring rules. By treating text as structured data, the end-to-end pipeline generates publishable HTML pages directly from raw transcripts.