OpenTalks.AI /
19-20 February 2026
Belgrade, Serbia

OPEN CONFERENCE ON
ARTIFICIAL INTELLIGENCE

Timetable
OpenTalks.AI 2026

Preliminary version from January 12, 2026
Belgrade time, GMT+1

Day 1

Thursday, February 19
09:00 – 09:45
Registration and welcome coffee
10:00 – 10:10
Opening of the conference and the first day
Igor Pivovarov, OpenTalks.AI
What will be at the conference: main ideas, figures, highlights.
10:10 – 11:30
Plenary Session 1 - Reviews
Large conference hall
10.10 – 10.50
AI agents - key trends in 2025
Tatyana Shavrina
Meta
10.50 – 11.30
Evolutionary algorithms
Alexander Novikov
DeepMind (UK)
11:30 – 12:00
Break
12:00 – 13:00
Parallel sessions
AI agents
Hall 3 - Business
Hall 2 - Development
Hall 1 - Academy
The section will be announced later
Hall 4
The section will be announced later
LLM research
Overview: LLM Pre-training in 2025
Vladislav Savinov,
Yandex
Details
To be announced
Moderator
To be announced
Moderator
AI scientist
Andrey Ustyuzhanin,
Constructor University
Details
Ilya Makarov,
AIRI
Cooperative AI Agents in Science and Digital Twin of Human Interactions
Details
Over the past year, we've seen a lot of open-source model releases: DeepSeek V3, Kimi K2, Qwen3-Next, and others. These models are now competitive with GPT-5 and Claude on many benchmarks, and the teams behind them have been openly sharing their methods. The papers describe several breakthroughs that change how we think about pre-training.

In this talk, Vladislav will cover the main ideas that emerged in 2025: FP8 training at 600B-parameter scale, new optimizers like Muon that are finally here to challenge AdamW, and other advances in training efficiency and MoE architecture.
The growing complexity of modern scientific problems and the exponential growth of data pose fundamental challenges to traditional research methods. Human cognitive and temporal limitations are becoming a significant factor hindering the pace of discovery. A promising approach to overcoming these barriers is the development of cooperative AI agents. Such systems are designed to autonomously conduct research, enabling deeper and more systematic analysis of complex subject areas.

Our talk will cover our advances on AI agents for science, our participation in Google DeepMind Concordia NeurIPS'24 challenge on cooperative agents where we scored top-5, and our NeurIPS'24 paper on emotional biases in LLM agents impacting rationality of decision making.
undefined
13:00– 13:15
Break
13:15 – 14:00
Parallel sessions
AI agents
Hall 3 - Business
Hall 2 - Development
Hall 1 - Academy
The section will be announced later
Hall 4
The section will be announced later
LLM research
A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages
Tatyana Anikina,
DFKI
Details
To be announced
Moderator
To be announced
Moderator
Alexander Svetkin, Microsoft
Использование LLM для улучшения обработки инцидентов в Azure​
Details
Кооперативные агенты
Arkady Sandler,
True Click Technologies
Details
Andrey Kuzminykh,
Andre AI Technologies
AI Super Agent: Self-Organizing Multi-Agent System for Complex Task Solving
Details
Large Language Models (LLMs) are increasingly used to generate synthetic textual data for training smaller specialized models. However, a comparison of various generation strategies for low-resource language settings is lacking. While various prompting strategies have been proposed—such as demonstrations, label-based summaries, and self-revision—their comparative effectiveness remains unclear, especially for low-resource languages. In this paper, we systematically evaluate the performance of these generation strategies and their combinations across 11 typologically diverse languages, including several extremely low-resource ones. Using three NLP tasks and four open-source LLMs, we assess downstream model performance on generated versus gold-standard data. Our results show that strategic combinations of generation methods—particularly target-language demonstrations with LLM-based revisions—yield strong performance, narrowing the gap with real data to as little as 5% in some settings. We also find that smart prompting techniques can reduce the advantage of larger LLMs, highlighting efficient generation strategies for synthetic data generation in low-resource scenarios with smaller models.

This presentation explores the application of large language models (LLMs) to improve incident response within Microsoft Azure, one of the world's largest cloud platforms. It reviews several AI-driven initiatives, including incident triage and automated mitigation. While more generic solutions like autogenerated incident summaries improved user experience, measurable reductions in time to mitigate of incidents were primarily achieved through narrowly scoped, team-specific solutions.
Key learnings and outcomes highlight the importance of context and high-quality data, as well as risks of LLM hallucinations. While team-specific agentic approach demonstrated impact, successful adoption requires thoughtful implementation and deep integration with existing workflows.
This paper presents AI Super Agent, a self-organizing multi-agent system designed to autonomously decompose, plan, and execute complex tasks across multimodal domains.

At its core lies a Cognitive Core — a unifying control architecture that integrates perception, reasoning, memory, and goal management within a continuous Plan–Execute–Control (PEC) loop. This core dynamically orchestrates Model Context Protocol (MCP) servers, maintaining coherence between reasoning processes, action execution, and long-term memory.

The framework incorporates a Graph-based Memory (GraphRAG) enhanced with Deep Research Algorithms, enabling contextual retrieval, semantic graph reasoning, and iterative knowledge synthesis. An Action Graph Engine represents and manages causal task dependencies, allowing agents to construct, evaluate, and refine strategies in real time.

Through this architecture, AI Super Agent demonstrates the capability to self-organize, spawn specialized sub-agents, and adaptively learn from multimodal feedback. Experimental evaluations in domains such as business process automation, financial analytics, and research intelligence reveal substantial improvements in reasoning depth, task completion rate, and coordination efficiency compared to conventional multi-agent baselines.

Beyond its technical contributions, AI Super Agent establishes a foundation for autonomous cognitive ecosystems — systems capable of co-evolving with human collaborators, enabling scalable problem-solving, continuous discovery, and the expansion of collective intelligence.
undefined
14:00 – 15:00
Lunch
15:00 – 16:30
Plenary Session 2 - Reviews
To be announced later
Large conference hall
16:30 – 17:00
Break
17:00 – 18:00
Parallel sessions
Poster session
Hall 3 - Business
Hall 2 - Development
Hall 1 - Academy
ML in business
Hall 4
The section will be announced later
Alexander Rassadin, Severstal
Optimization of computations
Computer Vision for Ore Pass Functioning Control
Details
Lightweight Data Transformations for Effective Lossless Compression of Low-Bit LLMs
Alexander Demidovsky, Huawei, HSE
Details
To be announced
Moderator
One GPU, Hundred Eyes: Real-Time Multi-Camera Analytics for Cargo-Drop Detection on the Edge
Mikhail Krasilnikov,
Bia-technologies
Details
To be announced
Moderator
RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation
Yulia Agafonova,
Details
Daniil Anisimov,
Optic
Precursors, Proxies, and Predictive Models for Long-Horizon Tasks
Details
Vladimir Arkhipkin,
Kandinsky Lab
Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework
Details
Large Language Models (LLMs) are an essential part of modern artificial intelligence systems in the fields of natural language processing, computer vision, and reinforcement learning. However, the rapidly growing complexity of LLMs, characterized by a 450x increase, leads to a widening gap between model size and hardware capacity and increased communication overhead during distributed training, fine-tuning, and inference. To tackle these issues, leading hardware vendors are implementing support for low-bit formats in the new generation of hardware. However, such lossy approaches as quantization and sparsification reduce communication but lead to a significant quality drop. Lossless compression is a promising alternative for reducing data volume without degrading quality, but it adds substantial computational resources. Compression of low-bit values is a challenging task due to the high entropy of these data and lack of simple repeating patterns. To address this limitation and increase low-bit data compressibility, we focus on entropy-based data transformations. The grouping of blocks of data with similar entropy properties together and lossless entropy data clustering are perspective approaches in this field. These transformations are computationally inexpensive and consistently improve lossless compression ratios for low-bit LLM data across multiple state-of-the-art compressors, thereby reducing end-to-end communication overhead in distributed workflows.
Ore mining is where metal industry begins. Metal products quality directly depends on the ore quality. Ore mining is the complex process still relying on a manual control. In Severstal Digital we are aiming toward the complete manufacturing digitalization. In this work we are presenting a solution for the intellectual ore passes monitoring. Such a technology gives us a more automated control over the mining, prevents production downtimes and increase the ore quality. Developed system has proved economical efficiency and currently in use on Severstal factories.
This talk presents an end-to-end system for real-time, multi-camera incident analytics that operates under strict latency and resource constraints typical of industrial edge deployments. A key feature is the system’s ability to process 100 live RTSP camera streams simultaneously on a single A100-80GB GPU and to detect a specific incident: cargo thrown by workers or forklifts during loading/unloading operations. The system continuously scans the video stream, locates a 40-frame segment containing a cargo-drop incident, forwards it to an operator for verification, and logs it into a dedicated report for statistical analysis and subsequent decision-making.


We first articulate the practical pain points encountered when ingesting and analysing such streams:
A) A flood of independent camera channels that must be synchronized in real time while respecting network-bandwidth;
B) GPU-memory limits;
C) Discussion of sparse, non-stationary and biased data that preclude known metrics at project kick-off;
D) Camera-drift phenomena that require auxiliary neural networks to re-align sensors;
E) Problem of deffirintiation a strict border between “good” from “bad” frames video quality;
F) A sub-second human-in-the-loop interaction model whose UI wireframes and feedback pipeline are described in detail.

Second, we propose a neural pipeline that slices incoming frames, classifies them for action presence, extracts regions of interest at tile-level granularity and applies channel separated neural network, as alternative fod 3D convolutions for final tile classification, achieving end-to-end latency below 200 ms on a 10G MIG instance on single A100-80G GPU.

Third, we detail the data-engineering workflow: training a segmentator on full frames, labelling tile-level objects via a custom web tool and closing an active-learning loop in closed corparate contour.

Finally, we define and continuously track both business metrics (incident-to-damage regression, false-negative cost, operator NPS) and technical metrics (like frame-drop percentage), while monitoring model drift.

The talk will be interesting as for CV engeneers and for bussiness who wants to prove economic effect from such system.
AI agents show remarkable success at various short tasks, and are rapidly improving at longer-horizon tasks, creating a need to evaluate AI capabilities on dangerous tasks which require high autonomy. Evaluations (evals) comprising long-running "real-world" tasks may be the best proxies for predicting general performance, but they are expensive to create, run, and compare to human baselines. Furthermore, these tasks often rely on a large, interwoven set of agent skills, which makes predicting capabilities development difficult. We hypothesize that precursor capabilities including 'persistence', 'dexterity', and 'adaptability' are upstream of robust autonomous performance on long-horizon tasks, and design simple procedurally-generated 'proxy' evals to target these precursors. We then use agent performance on our proxy evals to calibrate a preliminary method of capability prediction on a more complex task: SWE-Bench. Our preliminary results show that performance on certain proxy evals can be unusually predictive of performance on other evals. We find that a simple adaptability proxy based on developmental psychology correlates with SWE-bench with, and three other proxies correlate with SWE-bench at r > 0,8. A proxy eval which only takes 10 steps is strongly correlated with the performance of many other evals, which otherwise take much longer to terminate (100s of steps). For our predictive model, our initial results correctly predict agent scores on SWE-bench, but have large error bars, suggesting that -- testing more models on more synthetic evals -- we can quickly and cheaply predict performance on important long-horizon tasks.
Text-to-image (T2I) diffusion models are popular for introducing image manipulation methods, such as editing, image fusion, inpainting, etc. At the same time, image-to-video (I2V) and text-to-video (T2V) models are also built on top of T2I models. We present Kandinsky 3, a novel T2I model based on latent diffusion, achieving a high level of quality and photorealism. The key feature of the new architecture is the simplicity and efficiency of its adaptation for many types of generation tasks. We extend the base T2I model for various applications and create a multifunctional generation system that includes text-guided inpainting/outpainting, image fusion, text-image fusion, image variations generation, I2V and T2V generation. We also present a distilled version of the T2I model, evaluating inference in 4 steps of the reverse process without reducing image quality and 3 times faster than the base model. We deployed a user-friendly demo system in which all the features can be tested in the public domain. Additionally, we released the source code and checkpoints for the Kandinsky 3 and extended models. Human evaluations show that Kandinsky 3 demonstrates one of the highest quality scores among open source generation systems.

Text-to-image generation models have gained
popularity among users around the world. However,
many of these models exhibit a strong bias
toward English-speaking cultures, ignoring or
misrepresenting the unique characteristics of
other language groups, countries, and nationalities.
The lack of cultural awareness can reduce
the generation quality and lead to undesirable
consequences such as unintentional insult, and
the spread of prejudice. In contrast to the field
of natural language processing, cultural awareness
in computer vision has not been explored
as extensively. In this paper, we strive to reduce
this gap. We propose a RusCode benchmark for
evaluating the quality of text-to-image generation
containing elements of the Russian cultural
code. To do this, we form a list of 19 categories
that best represent the features of Russian visual
culture. Our final dataset consists of 1250
text prompts in Russian and their translations
into English. The prompts cover a wide range
of topics, including complex concepts from art,
popular culture, folk traditions, famous people’s
names, natural objects, scientific achievements,
etc. We present the results of a human
evaluation of the side-by-side comparison of
Russian visual concepts representations using
popular generative models.

Day 2

Friday, February 20
09:00 – 10:00
Registration
10:00 – 10:10
Opening of the day
Igor Pivovarov, OpenTalks.AI
10:10 – 11:30
Plenary Session 3 - Reviews
Large conference hall
10:10 – 10:50
AI Agents for the Pharmaceutical Industry
Roman Doronin,
EORA
10:50 – 11:30
To be announced
The speaker will be announced later.
11:30 – 12:00
Break
12:00 – 13:00
Parallel sessions
Generative models
Hall 3 - Academy
Hall 2 - Development
Hall 1 - Business
The section will be announced later
Hall 4
The section will be announced later
Predictive Analytics
Thunderstorm Nowcasting: Forecasting Lightnings with 10min Time Discretisation Using Weather Radars and Geostationary Satellites
Petr Vytovtov,
Yandex
Details
To be announced
Moderator
To be announced
Moderator
Distillation of diffusion generative models
Evgeniy Burnaev,
Skoltech
Details
Anton Konushin,
MSU
3D reconstruction into a structured representation (CAD, BIM)
Details
Extreme weather events such as heavy rains, thunderstorms, and hail play a huge role for different parts of human life: aviation, agriculture, everyday life, etc. We decided to focus on the task of thunderstorm nowcasting because of two reasons: (1) thunderstorms are often accompanied with heavy rain and hail, so they have a big impact to industry and everyday life, and (2) thunderstorms are often rapidly developed, so it is useful and necessary to forecast them with a small time discretisation which is 10min in our case. We prepared a visual transformer based model which uses real-time data from weather radars and geostationary satellites to forecast areas with high probability of lightnings along to precipitation nowcasting. The resulted quality of our model is better than classical approaches for that task such as numerical weather forecast and optical flow in terms of F1-measure and IoU, as well as in terms of visual assessment. The proposed model is integrated into Yandex Weather service as a production model.
Learning diffusion bridge models is easy; making them fast and practical is an art. Diffusion bridge models (DBMs) are a promising extension of diffusion models for applications in image-toimage translation. However, like many modern diffusion and flow models, DBMs suffer from the problem of slow inference. To address it, we propose a novel distillation technique based on the inverse bridge matching formulation and derive the tractable objective to solve it in practice. Unlike previously developed DBM distillation techniques, the proposed method can distill both conditional and unconditional types of DBMs, distill models in a one-step generator, and use only the corrupted images for training. We evaluate our approach for both conditional and unconditional types of bridge matching on a wide set of setups, including super-resolution, JPEG restoration, sketch-to-image, and other tasks, and show that our distillation technique allows us to accelerate the inference of DBMs from 4x to 100x and even provide better generation quality than used teacher model depending on particular setup.
13:00 – 13:15
Break
13:15 – 14:00
Parallel sessions
The section will be announced later
Hall 3 - Academy
Hall 2 - Development
Hall 1 - Business
GEN AI for Business
Hall 4
The section will be announced later
Anna-Veronica Dorogush,
Recraft
The section will be announced later
Building the Creative AI Stack
Details
To be announced
Moderator
To be announced
Moderator
Generative AI is already transforming how designers work — accelerating ideation and opening a way to creating with fewer limitations. Yet today's tools still address only part of the creative process. This talk explores how new design workflows are emerging, what AI already does well, and where it still falls short for creative professionals.

At Recraft, we focus on building models and other tech that give creative professionals full control over their vision. Achieving this means solving some hard technological challenges, which we will also discuss during this talk.
14:00 – 15:00
Lunch
15:00 – 15:45
Reinforcement Learning (topic will be updated)
Ruslan Salakhutdinov,
Carnegie Mellon University
15:45 – 16:00
Break
16:00 – 16:45
Parallel sessions
Reinforcement learning
Hall 3 - Academy
Hall 2 - Development
Hall 1 - Business
The section will be announced later
Hall 4
The section will be announced later
Vision for robots and drones
Scene Graph-driven Spatial Understanding and Reasoning
Dmitry Yudin,
MIPT
Details
To be announced
Moderator
To be announced
Moderator
Alexey Kovalev,
AIRI, MIPT
Vision-Language-Action Models: From Foundation to Future
Details
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach
Anton Plaksin,
Nebius
Details
Spatial understanding and reasoning is a fundamental challenge in computer vision and artificial intelligence. Scene graphs are structured representations that capture objects and their relationships, providing a powerful framework for this task. In this talk, we will explore how scene graph-driven methods enable robots and autonomous vehicles to interpret complex 3D dynamic scenes, support reasoning about object interactions, and improve performance in tasks such as visual question answering, navigation, and robotic manipulation. The presentation will cover key concepts, recent advances, and real-world applications illustrating how scene graphs bridge perception and reasoning in intelligent systems.
This lecture provides a comprehensive overview of Vision-Language-Action (VLA) models, the cutting-edge systems that connect visual perception and natural language to physical action. We will explore the current state of the art, including their architecture, training methods, and applications in robotics and autonomous systems. The discussion will then shift to the future, addressing key challenges such as safety, generalization, and real-world deployment, and outlining the exciting prospects for truly general-purpose embodied AI.
Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents’ policies robust to any opponent’s actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs’s condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. Based on these results, we present the Isaacs Deep Q-Network algorithms and
16:45 – 17:15
Break
17:15 – 18:30
To be announced
The speaker will be announced later.
18:30 – 18:45
Closing of the conference
Large conference hall