Natural Language and Document processing

Our research group focuses on the intelligent processing of textual and visual data using advanced deep learning methods. We pay special attention to multilingualism and multimodality - the ability of models to integrate and evaluate information across different data formats such as text, image, or document structure. The goal of our work is to make the analysis of large-scale textual data more efficient, automate routine tasks related to document understanding, and develop tools that help people and institutions work with information faster, more accurately, and in broader contexts.

Our research activities are primarily concentrated in the following key areas:

Document Classification and Categorization

We develop models capable of automatically recognizing and categorizing documents based on their content, structure, or purpose. Whether it involves legal texts, corporate documents, research records, or media reports, our systems can quickly identify the document type and assign the appropriate label or category. This significantly facilitates archiving, retrieval, and further automated processing.

Sentiment and Opinion Analysis

We focus on the automatic detection of emotions, opinions, and attitudes in natural language - across reviews, discussions, media, or internal communication. Our models can determine whether a text is positive, negative, or neutral, and whether it conveys trust, dissatisfaction, or sarcasm. Beyond overall sentiment, we also identify and assess specific aspects of content, i.e., particular properties that are being evaluated. These tools are useful in customer support, public opinion monitoring, and brand crisis management.

Historical Document Processing

We combine language technologies, computer vision, and deep learning to improve access to historical documents. Our work includes the comprehensive processing of scanned printed and handwritten materials - from page segmentation and OCR to transcription and content analysis. We develop tools for automatic summarization and multilingual semantic search that account for historical context and linguistic specifics.

Medical Document Analysis

We place particular emphasis on the processing of medical data, which is often informal, incomplete, and domain-specific. We develop methods for extracting key information from clinical reports, automatic classification of diagnoses and procedures, and sorting documentation by type and severity. The outcome is tools that help doctors, researchers, and insurance companies navigate large volumes of data more effectively, without compromising sensitivity or ethical standards.

Technologies Used

In our work, we use advanced transformer architectures (e.g., BERT, RoBERTa) as well as state-of-the-art large language models (e.g., LLaMA, Mistral) for natural language understanding and generation. For tasks involving visual documents, we integrate computer vision methods, including vision-language models such as LayoutLM, Donut, or CLIP, which enable multimodal processing of text and images simultaneously. We design custom pipelines tailored to specific languages and domain-specific data (e.g., historical or medical documents). A strong emphasis is placed on deploying our solutions effectively in real-world environments.

More detailed information is available on the website of our research group: https://nlp.kiv.zcu.cz.