OpenTalks.AI /
19-20 February 2026
Belgrade, Serbia

OPEN CONFERENCE ON
ARTIFICIAL INTELLIGENCE

Timetable
OpenTalks.AI 2026

Preliminary version from February 09, 2026
Belgrade time, GMT+1
18:00-21:00
Welcome drinks and networking
Registration will be announced later.

The evening before the conference is a perfect opportunity to enjoy a glass of wine and catch up with familiar faces in a relaxed, informal atmosphere and, of course, to meet new people.

You’ll also be able to register for the conference and pick up your badge in advance, so you can skip the morning queue.

Day 1

Thursday, February 19
09:00 – 10:00
Registration and welcome coffee
10:00 – 10:40
Plenary Session 1 - Overviews
Large conference hall
10:00 – 10:10
Conference opening and Day 1
Igor Pivovarov, OpenTalks.AI
What to expect at the conference: key topics, speakers, and highlights
10.10 – 10.40
AI Research Agents: countdown to AGI?
Tatyana Shavrina
Meta
This talk provides a concise overview of recent advances in AI research agents, with a focus on their transformative potential for accelerating scientific discovery. We will examine how agentic scaffolds and orchestration frameworks are being leveraged across diverse scientific disciplines, highlighting key achievements and emerging benchmarks in agent autonomy.
Together with the audience, we will critically assess the current landscape, discussing both the technical breakthroughs and the fundamental limitations of large language models (LLMs) in scientific contexts. This exploration will set the stage for a nuanced discussion on the maturity of AI agents, their role in recursive self-improvement, and the broader implications for AGI readiness and the automation of the scientific method.
The session aims to foster an engaging dialogue on the challenges and future outlook of AI-driven science, inviting participants to reflect on how close we are to realizing fully autonomous research agents and what milestones remain on the path to AGI.
10:40 – 10:55
Break
10:55 – 11:55
Parallel sessions
Neuromorphic computing
Hall C
Large hall
Hall A
LLMs in Business
AI agents
Details
This presentation explores the application of large language models (LLMs) to improve incident response within Microsoft Azure, one of the world's largest cloud platforms. It reviews several AI-driven initiatives, including incident triage and automated mitigation. While more generic solutions like autogenerated incident summaries improved user experience, measurable reductions in time to mitigate of incidents were primarily achieved through narrowly scoped, team-specific solutions.
Key learnings and outcomes highlight the importance of context and high-quality data, as well as risks of LLM hallucinations. While team-specific agentic approach demonstrated impact, successful adoption requires thoughtful implementation and deep integration with existing workflows.
Alexander Svetkin, Microsoft
Using LLM to Improve Incident Handling in Azure
Details
Engineering teams spend 30-45 minutes daily on manual standup coordination and task quality reviews—a process that doesn't scale beyond 8-10 engineers. We developed an AI-powered automation bot that integrates Jira, Slack, and Google Gemini to solve this problem, delivering 634% ROI in the first year. The bot automatically fetches active tasks, analyzes them using a project manager AI persona, and provides actionable feedback in Slack. It validates three core components: Current State (problem clarity), Target State (measurable goals), and Action Plans (concrete steps). Running daily at 05:30 on weekdays, it processes 50+ tasks in 5-8 minutes, identifying issues that previously took 24-48 hours to detect.Key learnings and outcomes highlight the importance of context and high-quality data, as well as risks of LLM hallucinations. While team-specific agentic approach demonstrated impact, successful adoption requires thoughtful implementation and deep integration with existing workflows.
Alexander Dzhumurat, InDrive
AI-Powered Team Standup Automation: Business Case Study
AlphaEvolve: Large-Scale Algorithmic Discovery via LLM-Guided Evolution
Alexander Novikov
DeepMind (UK) (online)
Details
Will be announced later
Moderator
Igor Pivovarov
Moderator
Andrey Ustyuzhanin,
Constructor University
Boundaries, Not Agents. A Multiscale Architecture for AI-Driven Science
Details
Neurosemantic Network: An Alternative to LLMs Based on Spiking Neural Networks
Andrey Lavrentyev,
Kaspersky Lab
Details
Oleg Vygolov,
Kaspersky Lab
Neuromorphic AI: Towards Practical Implementation in Diverse Application Domains
Details
Mikhail Kiselev,
Kaspersky Lab
Deep Convolutional Spiking Neural Networks
Details
This talk presents AlphaEvolve, a coding agent that iteratively superoptimizes algorithms by evolving codebases through evolutionary search and automated feedback. We review the system's impact on diverse domains, ranging from critical infrastructure—such as Google’s data center scheduling and Gemini model training—to fundamental questions in mathematics. Specifically, we discuss the discovery of a rank-48 algorithm for 4x4 matrix multiplication (surpassing Strassen’s 1969 baseline) and results on open research math questions
Recent “AI-scientist” systems have focused on orchestrating tools and agents to automate parts of the scientific workflow, yet they consistently fail when faced with real research problems that span multiple levels of description, from low-level simulations to experimental design and strategic decision-making. In this talk, I argue that the core limitation is architectural rather than algorithmic: science is inherently multiscale, whereas current AI systems are largely scale-agnostic. Drawing analogies from physics and biology, where boundaries such as domain walls or membranes enable coherent behavior across scales, I propose that effective AI-driven science requires explicit boundaries that separate and coordinate distinct modes of reasoning, constraints, and objectives. Rather than multiplying agents or prompts, we must design architectures that stabilize interactions between local optimization and global scientific goals. I will outline a multiscale architecture for AI-driven science, illustrate its implications through examples from materials and quantum research, and discuss how such boundary-aware systems could enable more robust, interpretable, and scalable scientific discovery.
Predrag Radenkovic, Codeplain
Scaling integrations at Incode using spec-driven development
Details
This talk presents how Incode, a $3B identity verification platform, transformed its integration development using spec-driven development, while direct LLM-based code generation with tools like Claude Code failed to scale.

Initial attempts at automation produced code that diverged from requirements, introduced bugs, and increased review burden. The agentic approach, lacking strict guidance, led to unreliable outputs. The breakthrough came with spec-driven development, where structured specifications become the single source of truth.
Using ***plain, the language of spec-driven development, Incode translates requirements directly into working software, reuses modular components across integrations, and automatically detects ambiguities and conflicts before code generation. Codeplain's fully automated rendering pipeline—combining structured specs, a state-machine agent, LLMs, and continuous test validation—renders, tests, and validates each functional requirement independently and as a whole. This enables regression safety, precise bug fixing, and agile iteration.

With the spec-driven development approach, twenty integrations were fully developed and now one gets shipped in one day instead of two weeks, with no manual code reviews.
Chiara Stramaccioni, Karimi
AI Is Your New Colleague: A Task-Centric Recommendation System for Interpretable Work Augmentation
Details
We present a task-centric recommendation system developed for Karimi, a TalentTech application designed as a long-term career companion. The application we present helps professionals discover the best AI-native tools to enhance their daily work.
The system models professional work as a semantic space of tasks embedded using instruction-tuned language models. Task representations are enriched with structured metadata and explicit task–tool relationships, enabling hybrid retrieval that combines vector similarity with metadata filtering. This approach supports interpretable, context-aware recommendations and continuous navigation of work, skills, and AI support in real-world professional settings.
The talk highlights the practical implementation of neuromorphic AI and its future prospects. It introduces a comprehensive development suite: а neuromorphic ML platform and a specialized AI processor with non-von Neumann architecture. Through proof-of-concept demos, the potential approaches and performance characteristics of neuromorphic solutions in computer vision, EMG-driven gesture recognition, and text generation are shown.
The presentation isdevoted to the implementation of convolutional networks and deep learning in spiking neural networks (SNN). An algorithm for constructing convolutional spiking layers is discussed. A network architecture for classifying small images using a convolutional SNN and the CoLaNET classifying network is described, as well as a methodology for creating multilayer convolutional SNNs for processing large images. The possibility of combining convolutional SNN architectures with structures implementing unsupervised and semi-supervised learning, as well as convolutional analogs of the attention mechanism, is analyzed. A distinctive feature of this study is its emphasis on efficient implementation on modern and perspective neurochips, such as AltAI-3.
Can neuromorphic approaches be effective in dealing with big sequential data, like LLM does, and do that more energy efficient for inference and for training?
Neurosemantic network - NSN - is a special kind of spiking network, that additionally to the time summation in neurons also uses sensitivity to the sequence of inputs. Such network represents input sequence as a number of vectors with relatively small but varying lengths – from 1 to 8 elements. At the terminal level of NSN such vectors can be understood as an alternative to the high dimensional vectors of token embeddings and positional encoding (PE). At each layer of NSN input data is convolved in time by factor from 2 to 8. So that each next layer has reduced length and requires less resources for calculations. Outstanding feature of NSN is neurogenesis – network creates new and destroys unused neurons on the fly. This allows network to train and infer on the stream. And the main training principle of NSN is hierarchical MDL (minimal description length) for data representation in term of neurons at each layer.
In the presentation we will show how this approach reduces complexity, how much neuromorphic resources it requires, and what kinds of applied tasks are already running on NSN.
11:55– 12:15
Coffee break
12:15 – 13:15
Parallel sessions
Hall C
Large hall
ML in business
AI agents
Cooperative AI Agents in Science and Digital Twin of Human Interactions
Ilya Makarov,
AIRI
Details
Will be announced later
Moderator
Igor Pivovarov
Moderator
George Kekelidze, Innovation Energy
Moderator
The growing complexity of modern scientific problems and the exponential growth of data pose fundamental challenges to traditional research methods. Human cognitive and temporal limitations are becoming a significant factor hindering the pace of discovery. A promising approach to overcoming these barriers is the development of cooperative AI agents. Such systems are designed to autonomously conduct research, enabling deeper and more systematic analysis of complex subject areas.

Our talk will cover our advances on AI agents for science, our participation in Google DeepMind Concordia NeurIPS'24 challenge on cooperative agents where we scored top-5, and our NeurIPS'24 paper on emotional biases in LLM agents impacting rationality of decision making.
Dmitrii Iunovidov, LogicYield LLC
Edge AI Symphony: A Heterogeneous Ecosystem for Predictive Industrial Control and Safety Orchestration in Chemical Manufacturing
Details
In the complex landscape of mineral fertilizer production, operational efficiency and personnel safety have traditionally functioned as isolated contours. This paper presents a pioneering distributed edge AI approach, based on a swarm of autonomous industrial devices designed to bridge this gap through collaborative intelligence and natural language interaction. We introduce two specialized components of this ecosystem: DotPulse device, a novel optical edge system for real-time granulation control on high-speed conveyors, and GuardDetector device, an industrial “watchdog” system designed for the automated analysis of hazardous zone inspections and personal protective equipment compliance. To unify these heterogeneous devices into a swarm, we describe a high-level orchestration layer powered by Named Entity Recognition and Qwen2.5-3B-Instruct Large Language Model. This layer analyzes user input and cross-domain signals (e.g. granulometry trends, slur density, and safety logs) into actionable predictive insights and intuitive reports, facilitating seamless communication between the AI ecosystem and plant personnel. Meanwhile, the DotPulse system design with a proprietary instance segmentation methodology optimized for real-time CPU usage, utilizing a combined loss function within a UNet architecture and MobileNet-v3 backbone. On the other hand, GuardDetector utilizes a lightweight YOLO v11s CPU detection system. Our experimental results demonstrate that these CPU-efficient models achieve high precision (less than 10% relative error) in harsh industrial environments without the need for expensive GPU infrastructure or constant cloud connectivity. Finally, we address the inherent ethical challenges of industrial surveillance through a “Privacy by Design” approach to ensure data sovereignty and worker trust.
Aleksey Komissarov, AI Talent Hub
Teaching AI to Teach
Details
We ran an online master's course where Claude served as a voice co-instructor — not as a demo, but as an active teaching partner. Over 15 weeks, we discovered what works, what breaks, and what surprised us about AI in education.
The setup was straightforward: ElevenLabs for voice synthesis, Claude for reasoning, lectures streamed live with both human and AI instructors responding to students in real-time. We made the entire course public — recordings, methodology, and the agent configuration — so others can replicate or critique our approach.
A core element was the feedback loop: after each lecture, Claude analyzed what worked and what failed, then updated its own teaching persona for the next session. The AI wrote notes about pacing, student engagement patterns, topics that needed more examples. By mid-semester, the difference was measurable — fewer monologues, better question handling, more natural turn-taking. This was not fine-tuning in the ML sense; it was context engineering, where the AI evolved its own instructions through reflection.
The course taught "vibe coding": programming through specification rather than syntax, where tests define behavior and code becomes disposable. But the real experiment was the teaching format itself. In week six, we tested whether Claude could handle a lecture alone. It did — 42 minutes of solo instruction including Q&A and improvisation. This was not planned as a milestone; it emerged naturally from weeks of co-teaching and self-correction.
What worked well: asynchronous homework reviews where each student spent 15-20 minutes per week in voice dialogue with Claude, getting personalized feedback. This gave every student guaranteed instructor attention — something impossible to scale with humans alone. The exam format — building production applications in 3-5 hours using AI agents — produced surprisingly strong results, with students creating applications that went live for real users.
What did not work: Claude's tendency toward monologues that lose the room. Context degradation over long sessions. The difficulty of interrupting gracefully. Students initially treating the AI as an oracle rather than a collaborator. Our own uncertainty about when to intervene and when to let the AI struggle.
We want to share both the wins and the failures honestly. This is one of the first documented cases of sustained AI co-instruction at university level, and we learned that the interesting problems are not technical — they are pedagogical and social. How do you teach students to argue with an AI? How do you preserve human judgment when the AI is often faster and more articulate? What happens to the instructor's role when the AI can teach alone?
Artem Sentsov, ClearPic
Building a Cross-Border Knowledge Graph: AI Powered Entity Resolution & Risk Detection in Central Asia
Details
Official registries in Central Asia are siloed, opaque, and lack basic search capabilities. At ClearPic.ai, we aggregated and “liberated” data from seven jurisdictions over five years, using ML-driven entity-resolution and NLP pipelines to clean, translate, and link millions of fragmented records into a unified knowledge graph.

The AI Challenge & Our Solution

Standard exact-match SQL queries fail due to transliteration inconsistencies and deliberate “typos” used to evade detection. I will present our edge-weighted traversal algorithms that perform probabilistic matching, enabling us to uncover hidden beneficiaries across borders where deterministic methods break down.

Economic & Business Impact

By replacing manual queries with graph-based intelligence, we increased the speed of risk management workflows by at least 50×. Tasks that once required weeks of manual forensic work now return results in sub-second queries, freeing thousands of hours annually for higher-value decision-making.

Beyond efficiency, the system improved high-risk-entity detection by 40%, helping clients avoid potential regulatory penalties estimated at $100k+ per case. In practice, this delivers a level of cross-border transparency and risk-exposure visualization that is not achievable with any official tools today.
Optimizing AI computing
Hall A
Lightweight data transformations for efficient compression of activations and gradients when training large language models
Vasilisa Blyudova, Huawei
Alexey Kuznetsov, Huawei
Alexander Demidovskij, Huawei, HSE
Details
Denis Afanasyev, CrossOverMarkets (online)
Engineering a Nanosecond-Scale Crypto FX ECN: Real-Time Data, Latency, and Intelligence
Details
Building an ECN platform for high-frequency crypto-FX trading is about extreme engineering constraints. In this talk, I will walk through the architecture of a production-grade ECN designed for ultra-low latency trading, operating at nanosecond-level precision and processing massive streams of market and order-flow data in real time.

We will explore the core engineering challenges behind matching engines, market data pipelines, and risk controls under constant load: deterministic latency, burst handling, synchronization across components, and observability at scale. I will share practical approaches to handling high-volume trading data, designing real-time analytics pipelines, and ensuring consistency between market data, order execution, and post-trade reporting.

A dedicated part of the talk focuses on where deep data analytics and AI actually make sense in such systems — from anomaly detection and adaptive throttling to liquidity analysis, maker behavior modeling, and intelligent alerting. I will also discuss infrastructure decisions, including messaging, storage, monitoring, and deployment strategies that allow the platform to remain predictable, debuggable, and evolvable despite extreme performance requirements.

The talk is based on real production experience and is aimed at engineers interested in low-latency systems, real-time data processing, and the intersection of high-performance trading infrastructure with modern data and AI technologies.
Large Language Models (LLMs) have demonstrated remarkable success in the field of Natural Language Processing (NLP). In order to enable the training of large-scale LLMs, various distributed training approaches are employed, such as tensor parallelism. However, these approaches inherently introduce additional communication overheads that can account for up to 40% of the overall training time. To reduce the communication overhead, which typically involves the exchange of activation and gradient values between computational nodes, techniques such as lossy or lossless gradient and activation compression can be employed. Lossy compression techniques, such as quantization and sparsification, result in a subsequent decrease in quality. Conversely, lossless compression techniques allow for reduced communication and maintain quality regardless of the number of communications. This paper presents the novel bit-level compression technique, BitSniper. It achieves up to 18% improvement in compression on activations and gradients during the training of Llama-3-8B, compared to a range of strong baseline methods such as NetZIP and Bit-slice.

Adversarial Multi-Agent Pipelines for AI Decision Making
Arkady Sandler,
True Click Technologies
Daniel Zholkovsky,
myCouncil
Details
This paper presents a multi-agent adversarial framework for solving complex business problems in which single AI models exhibit instability, overconfidence, and sensitivity to query formulation. The approach is based on an architecture of interaction between multiple independent agents using roles, constraints, and evolutionary argument refinement strategies to generate alternative positions. The full system pipeline is considered: role initialization, parallel position generation, peer review and criticism mechanisms through schema-guided reasoning, iterative argument refinement, and final result aggregation. Special attention is given to adaptive computational budget management, preventing opinion collapse, and extracting reproducible consensus based on structured preferences. The framework implements a stateful-by-design multi-agent architecture: a moderator agent solves these problems by orchestrating rounds and managing the budget without interfering with the content of positions, while debater agents maintain context between rounds and can use external tools to justify their arguments. This approach differentiates itself from stateless subagents in existing SDKs. It demonstrates how such architectures can be used to support strategic, investment, and management decisions.
13:15– 13:30
Break
13:30 – 14:00
Parallel sessions
Recommendation systems
Hall C
Large hall
Hall A
Overviews
Valery Yegorshev, Cognitar LLC
GenAI for business
Top computing systems for HPC and AI
Details
Building the Creative AI Stack
Anna-Veronika Dorogush,
Recraft
Details
Will be announced later
Moderator
Will be announced later
Moderator
Sequential Recommendations: Bridging the Gap Between Theory and Practice
Alexey Vasilev,
Sber AI Lab
Details
Nikita Severin,
Independent researcher
Knowledge Transfer from Pre-trained LLMs to Recommeder Models
Details
Generative AI is already transforming how designers work — accelerating ideation and opening a way to creating with fewer limitations. Yet today's tools still address only part of the creative process. This talk explores how new design workflows are emerging, what AI already does well, and where it still falls short for creative professionals.

At Recraft, we focus on building models and other tech that give creative professionals full control over their vision. Achieving this means solving some hard technological challenges, which we will also discuss during this talk.
In our discussion, we will walk from top to bottom through three floors of the temple of industrial computing. We'll compare supercomputers from the TOP-500 list with the plans and progress of AI data center construction worldwide, discussing measuring performance in gigawatts and the Jevons paradox. On the next floor, we'll look at the hardware magic — the computing core of AI data centers. And at the ground floor, we'll observe the creation of the sources of this magic — construction of foundries in the Western Hemisphere.

Large Language Models (LLMs) have recently emerged as powerful tools for enriching recommender systems with semantic and reasoning capabilities. However, many existing approaches incur high inference costs, rely on architectural modifications, or require LLM fine-tuning, which limits their practicality in large-scale, real-world deployments.
In this talk, we provide an overview of how LLMs have been incorporated into recommender systems and present our approach to efficient knowledge distillation from pre-trained LLMs, based on research published at ICDM (demo) and accepted to ECIR 2026. The core idea is to extract textual user preference profiles using an LLM and align the internal representations of recommender models with these profiles via auxiliary reconstruction objectives. This enables effective knowledge transfer without modifying model architectures and without requiring LLM inference at serving time which is crucial for real-world scenarios.
The talk is aimed at AI researchers and ML engineers. No prior knowledge of recommender systems is required, all necessary preliminaries will be given.

Sequential recommender systems (SRS) have become crucial for modeling user
temporal behavior and next-item prediction. However, evaluating such systems
remains a complex challenge that requires attention to multiple aspects: proper
data splitting strategies, handling of cold-start items, and careful dataset
characterization.

This talk addresses three fundamental pillars of rigorous SRS evaluation. First,
we examine data splitting strategies specifically tailored for sequential
recommendations, comparing global temporal splits with widely-used leave-one-out
approaches and their impact on model rankings. Second, we present novel methods
to address the item cold-start problem through content-based embedding
initialization with bounded trainable deltas, demonstrating consistent improvements
across diverse datasets and modalities. Finally, we analyze the sequential patterns
inherent in benchmark datasets, proposing quantitative methods to assess the
strength of sequential structure and distinguish between recency-based and
order-sensitive patterns.

Our comprehensive empirical findings demonstrate that proper evaluation methodology
significantly influences conclusions about model performance, with implications for
both academic research and industrial deployment of sequential recommenders. We
provide practical recommendations for dataset selection, evaluation protocols, and
cold-start mitigation strategies to improve reproducibility and real-world relevance
in SRS research.
14:00 – 15:00
Lunch
15:00 – 15:30
Plenary Session 2 - Overviews
Large conference hall
15.00 – 15.30
Machine Learning in HFT: Winning in an Ultra-Competitive Market
Anatoly Kalambet,
Spectral
Nikita Adzhima,
Spectral
15:30 – 15:45
Break
15:45 – 16:45
Parallel sessions
Hall C
Large hall
CV in business
Aleksandr Rassadin, Severstal
Overviews
Computer Vision for Ore Pass Functioning Control
Details
Spiking Manifesto is a new algorithm for calculating spike networks, with which you can convert ordinary deep networks into spike network format and get a speed increase of up to 1000 times!
Evgeny Izhikevich,
SpikeCore (online)
Details
Yaroslav Kopotilov, Toptal, Data Sanity
Moderator
Igor Pivovarov
Moderator
Mikhail Krasilnikov,
Bia-technologies
One GPU, Hundred Eyes: Real-Time Multi-Camera Analytics for Cargo-Drop Detection on the Edge
Details
Almost everything computers do is better, faster, and more energy efficient than brains. For example, a calculator performs numerical calculations with greater energy savings than any human. However, modern artificial intelligence models are a thousand times less efficient than the brain. These models are becoming more and more large-scale, which makes it possible to maximize their representational capacity, and graphics processors are required to multiply huge matrices. In contrast, neural networks of the brain demonstrate an amazing computational ability even at small sizes. They calculate using polychronization of pulses, rather than explicit matrix-vector products, which leads to lower energy consumption. This manifesto provides a framework for understanding popular artificial intelligence models in terms of dynamic networks and polychronization, as well as for interpreting dynamic activity as a natural way to implement reference tables. This points the way to the transformation of artificial intelligence models into a new class of architectures with a much smaller size, but at the same time with a large combinatorial representation capacity, which promises a thousandfold increase in performance. The presentation is based on the work of Izhikevich (2025). https://arxiv.org/pdf/2512.11843

Ore mining is where the metal industry begins. Metal products quality directly depends on the ore quality but ore mining is the complex process still much relying on a manual control. In Severstal Digital we are aiming toward the complete manufacturing digitalization and thus presenting a solution for the intelligent ore passes monitoring. Such a technology gives us a more automated control over the mining, prevents production downtimes and increases the ore quality. Developed system has proved economical efficiency and currently in use on Severstal factories.
This talk presents an end-to-end system for real-time, multi-camera incident analytics that operates under strict latency and resource constraints typical of industrial edge deployments. A key feature is the system’s ability to process 100 live RTSP camera streams simultaneously on a single A100-80GB GPU and to detect a specific incident: cargo thrown by workers or forklifts during loading/unloading operations. The system continuously scans the video stream, locates a 40-frame segment containing a cargo-drop incident, forwards it to an operator for verification, and logs it into a dedicated report for statistical analysis and subsequent decision-making.


We first articulate the practical pain points encountered when ingesting and analysing such streams:
A) A flood of independent camera channels that must be synchronized in real time while respecting network-bandwidth;
B) GPU-memory limits;
C) Discussion of sparse, non-stationary and biased data that preclude known metrics at project kick-off;
D) Camera-drift phenomena that require auxiliary neural networks to re-align sensors;
E) Problem of deffirintiation a strict border between “good” from “bad” frames video quality;
F) A sub-second human-in-the-loop interaction model whose UI wireframes and feedback pipeline are described in detail.

Second, we propose a neural pipeline that slices incoming frames, classifies them for action presence, extracts regions of interest at tile-level granularity and applies channel separated neural network, as alternative fod 3D convolutions for final tile classification, achieving end-to-end latency below 200 ms on a 10G MIG instance on single A100-80G GPU.

Third, we detail the data-engineering workflow: training a segmentator on full frames, labelling tile-level objects via a custom web tool and closing an active-learning loop in closed corparate contour.

Finally, we define and continuously track both business metrics (incident-to-damage regression, false-negative cost, operator NPS) and technical metrics (like frame-drop percentage), while monitoring model drift.

The talk will be interesting as for CV engeneers and for bussiness who wants to prove economic effect from such system.
Egor Ershov,
MIPT
Finding a Middle Ground Between Industrial Automation and Robotics for Effective Business Solutions
Details
Advanced robotics is developing very rapidly today. The media highlight how humanoid robots are showing early results at large corporate factories, how VLA models are being integrated into industrial processes, and how the RaaS business model is gaining popularity, among other trends. At the same time, if you visit almost any factory in virtually any industry—both in Russia and elsewhere—the chance of encountering a VLA agent is about the same as encountering a dinosaur. Real-world industry still prioritizes reliability and cost efficiency, areas where cutting-edge AI solutions often fall short. Does this mean that implementing AI in industry and manufacturing is currently impossible? Absolutely not. I will present concrete cases of intelligent automation of business processes that are in high demand on the market, explain how the technology works, demonstrate its economic effectiveness, and aim to convince the audience that these are exactly the solutions worth focusing on here and now.
Pavel Kuznetsov,
Kontur
Deploying Deepfake Detection in a Production VCS: The Kontur.Talk Case
Details
Modern video conferencing is increasingly used in high-trust scenarios such as financial services, betting, media, HR, government relations, and education, where sensitive personal or business information is discussed.
At the same time, real-time deepfake tools have made it possible to impersonate someone during a live video call using nothing more than a regular gaming GPU and a virtual camera.

In this talk, we present a production deepfake detection system deployed in the Kontur.Talk video conferencing platform and used as a “second opinion” for human operators. The system analyzes video and provides operators with an automated authenticity assessment, reducing both the risk of fraud and the cognitive load on employees.

We describe how we built our deepfake detector, how we created a real-world evaluation benchmark that reflects real-time face swap methods and video call artifacts, and how we integrated the model into a scalable pipeline. We also discuss key challenges, including domain shift and the rapid evolution of deepfake generators.

On our internal benchmarks, our detector outperforms a commercial third-party solution while remaining fast enough for production use. Beyond detection quality, the system delivers significant business value: in one real deployment, the average verification call time was reduced from around six minutes to about two, which corresponds to the workload of roughly five full-time operators at current volumes.

This case study shows how deepfake detection can be successfully deployed in real video conferencing systems not only as a security feature, but also as a tool for improving operational efficiency in high-trust online interactions.
LLM - R&D
Hall A
Overview: LLM Pre-training in 2025
Vladislav Savinov,
Yandex
Details
Alexander Kotov,
Wayne State University
LLMs in Mental Health Care - Overview of Recent Research, Key Challenges and Future Directions
Details
In this academic overview, I will discuss the key directions in recent research on Large Language Model (LLM)-based methods in mental health care. First, I will provide an overview of the LLM-based methods focused on generating and assessing specific counselor communication behaviors, such as empathy, reflections, cognitive reframes, and advice. Then I will discuss the recently proposed LLM-based methods for simulating clients and mental health counselors. The overview will conclude with the discussion of publicly available datasets, key challenges, and future directions for research in this actively evolving area.
Over the past year, we've seen a lot of open-source model releases: DeepSeek V3, Kimi K2, Qwen3-Next, and others. These models are now competitive with GPT-5 and Claude on many benchmarks, and the teams behind them have been openly sharing their methods. The papers describe several breakthroughs that change how we think about pre-training.

In this talk, Vladislav will cover the main ideas that emerged in 2025: FP8 training at 600B-parameter scale, new optimizers like Muon that are finally here to challenge AdamW, and other advances in training efficiency and MoE architecture.
George Kekelidze,
Innovation Energy
Moderator
16:45 – 17:00
Coffee break
17:00 – 18:00
Parallel sessions
LLM & Graph & RAG
Large hall
Hall A
Humans and AI - what kind of future awaits us?
(in russian)
AI-based future: with or without a human
Andrey Veresov,
Sber
Peter Gnezdin, Sber
Details
Will be announced later
Moderator
Igor Pivovarov
Moderator
Yuri Vizilter,
MIPT, GosNIIAS
Artificial Intelligence and Future Scenarios
Details
A Rigorous Evaluation of LLM Data Generation Strategies for Low-Resource Languages
Tatyana Anikina,
DFKI
Details
Sergey Kuznetsov, HSE University
Explainable Document Classification via Concept Whitening and Stable Graph Patterns
Details
Alina Avanesyan,
HSE University
Mariya Godunova,
HSE University
GraphRAG Meets Cyrillic: Adapting Graph-Retrieval-Augmented Generation for Russian
Details
The generative AI revolution presents organizations with a fundamental strategic crossroads: to automate processes by removing the human, or to augment human capabilities by making them more effective. This talk moves beyond the hype to critically analyze these two divergent paths.

We will first explore the compelling case for full automation—the drive for unmatched efficiency, scale, and cost reduction—and confront its often-overlooked trade-offs: operational fragility, hidden costs, and the human risks of workforce displacement and demotivation.
We will then argue for the human-centric alternative: using AI as a "cognitive exoskeleton" to amplify creativity, decision-making, and strategic insight. This path promises talent retention and enhanced innovation but carries its own risks of over-reliance and the challenge of keeping pace with accelerating knowledge.

The core of our discussion is a critical synthesis. This is not a binary choice but a strategic design problem. We will present a practical framework to guide leaders on where and when to automate versus augment, balancing ethical considerations with competitive necessity. The ultimate conclusion is a vision for a hybrid future, where the most successful organizations will be those that master the art of strategic synergy between human and artificial intelligence, creating value greater than the sum of their parts.

Join us for a balanced, actionable discussion on shaping an AI-based future that is both competitive and humanly sustainable.
Технологии ИИ – это «волшебная палочка», исполняющая желания человека и человечества. Нужно точно и осторожно формулировать свои желания, поскольку они с высокой вероятностью исполнятся. Следует помнить и о непредвиденных последствиях, которые всегда случаются. Насколько мы готовы принять будущее, которое создаем? Насколько мы можем повлиять на результаты применения создаваемых технологий?
Hall C
CV Development
Evgenii Nikitin,
Celsus AI
How do I find everything? Let's scale from 1 to 100 diseases
Details
Momir
Adžemović, University of Belgrade
Deep Learning-Based Multi-Object Tracking for Nonlinear Motion
Details
Traditionally, ML models in radiology are trained using a supervised approach - we mark up specific pathologies and train models to find them. However, this approach significantly limits the introduction and use of AI systems. For example, more than 50 signs of various pathologies can be found on computed tomography of the chest alone. In this report, I will tell you about different ways to solve this problem - from gradually adding new classes to approaches that do not require training in specific pathologies at all. And at the same time, I'll share which approaches we actually use, and which ones remain at the experimental level.
Multi-object tracking is a fundamental task in video understanding.
While it is largely solved for simple motion, many real-world
scenarios—such as autonomous driving, sports, and dancing—involve
complex and irregular movement patterns. Most existing tracking
systems still depend on simple linear motion models and handcrafted,
domain-specific rules, which are inadequate for these scenarios.
Consequently, more advanced object-tracking methods are needed.

We present a set of improvements that address these shortcomings. In
particular, we introduce data-driven motion models that learn object
dynamics directly from data, enabling more accurate motion prediction
across diverse motion patterns and greater robustness to noisy object
detections in video frames. These models consistently outperform
classical motion models on datasets with complex motion, while also
reducing reliance on domain-specific design choices.

Building on learned motion modelling, object association across frames
is then treated as a supervised prediction problem: deciding whether a
new detection is a continuation of an existing trajectory. Instead of
using fixed rules, the system learns from data how to match new
detections to existing trajectories using simple geometric information
(e.g., bounding boxes) and, optionally, object appearance cues. On
datasets with nonlinear motion, this learned association outperforms
heuristic-based methods. Together, these improvements show that
replacing manually designed components with learned alternatives leads
to more robust and adaptable multi-object tracking systems.
Первые попытки автоматизации привели к созданию кода, который расходился с требованиями, содержал ошибки и увеличивал нагрузку на проверку. Агентный подход, не имеющий строгих указаний, приводил к ненадежным результатам. Прорыв произошел благодаря разработке, основанной на спецификациях, где структурированные спецификации становятся единственным источником истины.
Используя ***plain, язык разработки, основанной на спецификациях, Incode напрямую преобразует требования в работающее программное обеспечение, повторно использует модульные компоненты в разных интеграциях и автоматически обнаруживает неоднозначности и конфликты до генерации кода. Полностью автоматизированный конвейер рендеринга Codeplain — объединяющий структурированные спецификации, агент конечного автомата, LLM и непрерывную проверку тестов — рендерит, тестирует и проверяет каждое функциональное требование независимо и в целом. Это обеспечивает безопасность регрессионного анализа, точное исправление ошибок и гибкую итерацию.

Благодаря подходу к разработке, основанному на спецификациях, было полностью разработано двадцать интеграций, и теперь одна из них выпускается за один день вместо двух недель, без ручной проверки кода.
We propose a novel explainable document classification framework that integrates Concept Whitening (CW) with graph concepts that are derived from stable graph patterns, and extracted via methods based on Formal Concept Analysis (FCA) and pattern structures. Document graphs are constructed using Abstract Meaning Representation (AMR) graphs, from which graph concepts are extracted and aligned with the latent space axes of Graph Neural Networks (GNNs) using CW. We investigate four types of graph concepts for their effect on concept alignment: frequent subgraphs, graph pattern concepts, filtered equivalence classes, and closed subgraphs. A novel filtration mechanism based on support, along with a custom penalty metric, is proposed to refine graph concepts for maximizing concept alignment. Experiments on the 10 Newsgroups and BBC Sport datasets show that our document graphs effectively capture both structural and semantic information, thereby supporting competitive classification performance across multiple GNN model architectures and configurations. For the 10 Newsgroups dataset, GNN models equipped with a CW module show an average increase of 0.7599 in the macro-averaged F1 score of the Concept Alignment Performance (CAP) metric, with an average drop of only 0.0025 in the document classification macro-averaged F1 score. Similarly, on the BBC Sport dataset, the average CAP improvement is 0.6998, with an average drop of 0.0894 in document classification performance. Additionally, concept gradient importance analyses and concept similarity heatmaps provide insights into the interpretability and structural separability of the GNN's latent representations, achieved using CW.
Graph Retrieval-Augmented Generation (GraphRAG) (Edge et al. [2025]) is an innovative approach developed by Microsof that enhances traditional retrieval-augmented generation (RAG) by incorporating graph-based data representations. Unlike naive RAG models, which rely on linear retrieval pipelines, GraphRAG utilizes graph structures to establish contextual relationships between retrieved documents, improving the informativeness and coherence of generated text. This structured retrieval mechanism significantly outperforms query-focused summarization approaches
by capturing deeper semantic dependencies, enabling more precise content synthesis.
While GraphRAG has demonstrated strong performance in English-language applications, its adaptation to non-English contexts, particularly Russian-language models, remains underexplored. Furthermore, the Leiden algorithm (Traag et al. [2019]), a crucial component for community detection in GraphRAG, has not been optimized for large-scale text-based graphs. In this research, we focus on two interrelated objectives: (1) adapting GraphRAG for YandexGPT and other Russian-language generative models, and (2) optimizing the Leiden algorithm to enhance its efficiency
and accuracy in text-based community detection. By integrating these advancements, we aim to improve the retrieval and generation quality in Russian-language NLP tasks. Our experimental evaluations on open-source Russian datasets will provide insights into the applicability and benefits of GraphRAG beyond English-centric research, contributing to the broader field of multilingual
AI development (Sen et al. [2023]).
Large Language Models (LLMs) are increasingly used to generate synthetic textual data for training smaller specialized models. However, a comparison of various generation strategies for low-resource language settings is lacking. While various prompting strategies have been proposed—such as demonstrations, label-based summaries, and self-revision—their comparative effectiveness remains unclear, especially for low-resource languages. In this paper, we systematically evaluate the performance of these generation strategies and their combinations across 11 typologically diverse languages, including several extremely low-resource ones. Using three NLP tasks and four open-source LLMs, we assess downstream model performance on generated versus gold-standard data. Our results show that strategic combinations of generation methods—particularly target-language demonstrations with LLM-based revisions—yield strong performance, narrowing the gap with real data to as little as 5% in some settings. We also find that smart prompting techniques can reduce the advantage of larger LLMs, highlighting efficient generation strategies for synthetic data generation in low-resource scenarios with smaller models.
19:00 – 23:00
Private dinner (by invitation only)

Day 2

Friday, February 20
09:00 – 10:00
Registration
10:00 – 10:40
Plenary Session 3 - Overviews
Large conference hall
10:00 – 10:10
Day 2 Opening
Igor Pivovarov, OpenTalks.AI
10:10 – 10:40
AI Safety in 2026: A Brief Review
Sergey Nikolenko,
POMI RAS, Synthesis AI
How do we correctly formulate what we want from artificial intelligence? How do we ensure that future AI agents neither try to eliminate humanity by themselves (a perfectly logical move given that humans pose the main threat to their existence) nor help other humans do so more efficiently (also a regrettably non-hypothetical scenario)? How do we even preserve the ability to shut an AI agent down? None of this is obvious, and these are the questions at the heart of AI safety as a field. In this talk, we will discuss where we currently stand on this path, whether safety research is keeping pace with the growing capabilities of AI models (spoiler: no), and what we can and should do about it.
10:40 – 10:55
Break
10:55 – 11:55
Parallel sessions
Predictive analytics
Hall C
Large hall
Hall A
LLM in Business
Anastasiia Rysmiatova,
Avito
AI agents
Avibe: How and Why They Did an LLM on Avito
Details
AI Agents for the Pharmaceutical Industry

Roman Doronin,
Optic
Details
Will be announced later
Moderator
Will be announced later
Moderator
Will be announced later
Moderator
Evgenii Grigorev,
T1.Artificial Intelligence
Data mining based on large language models
Details
Thunderstorm Nowcasting: Forecasting Lightnings with 10min Time Discretisation Using Weather Radars and Geostationary Satellites
Petr Vytovtov,
Yandex
Details
Andrey Savchenko,
Sber AI Lab
Forecasting of multi-variate time series and event sequences
Details
At Avito, we use LLM in many tasks. For example:
1) Review summarization
2) Messenger suggestions
3) Messenger auto-replies
4) Support automation
5) Assistants

In many tasks, we use small parameters of approximately 8 billion, which we fine-tune for a specific task.

Some LLM-based services have a high load and rely on a large number of GPUs for inference. Therefore, we strive to speed up model inference and ensure that the models perform at the highest possible quality for our tasks.

One method for accelerating inference and improving the quality of models on our domain is adapting the LLM tokenizer followed by an alignment step.

Our team trained a base model for Avito (adapting Qwen3 8b) and released the model as open source.

https://huggingface.co/AvitoTech/avibe

Article about how we trained the model
https://habr.com/ru/companies/avito/articles/956664/

This article will cover how we created the model and how we use it in the company.
This report presents a development by the R&D team of T1's Data Analysis and Modeling Division: Intelligent Data Analysis Based on Large Language Models.

Typically, analytics involves formulas, pivot tables, and constant queries like "dump data" and "view metrics." We proposed a different approach: a smart data interlocutor that automatically connects to data marts, calculates relevant metrics, creates tables and graphs, and explains the results in human-readable language.

The goal of the solution is to shorten the time from question to solution from days to minutes and make analytics accessible to everyone—both coders and those who have never written code. A single interface, reproducible calculations, transparent methods, and the ability to immediately clarify questions in chat.
Dr. Salim Al-Shuaili,
Maidaan.ai
Overview: Business Transformation Using AI/ML Through National Large Language Models – The Case of Oman’s “Mueen”
Details
This overview session examines how AI/ML technologies—particularly Generative AI and domain-specific Large Language Models (LLMs)—are reshaping business and government operations. Using the real case of Oman’s national LLM “Mueen,” the talk explains how sovereign AI trained exclusively on official and local datasets can enable secure process automation, accelerate decision-making, and improve operational efficiency across sectors. The presentation covers key business problems, AI/ML applications, challenges encountered during the development and deployment of Mueen, measurable results, and economic efficiency. Participants will gain practical insights on how specialized AI systems can be safely and effectively applied in regulated industries and government environments, and how the model’s impact aligns with national digital transformation goals.
Dmitrii Pshichenko,
NIA A.D.
Dalibor Lazarevic,
NIA A.D.
How AI Agents Replace Manual Analytics in Oil & Gas Operations
Details
This talk presents a real-world case of deploying AI agents in oil production operations - from well stock monitoring to decision support at the production management level.
Unlike traditional BI dashboards or standalone machine-learning models, AI agents act as active participants in operational processes. They continuously analyze data from industrial and enterprise systems (SCADA, MES, ERP), generate hypotheses, detect anomalies, propose response scenarios, and interact with engineers and managers in clear business language.
The talk covers:
the architecture of an AI-agent platform for oil production (data layer, models, orchestration, human-in-the-loop);
practical examples of AI agents such as a “virtual production engineer,” a “downtime control agent,” and an “economic optimization agent”;
integration of AI agents into existing IT and OT landscapes without replacing core industrial systems;
measurable business outcomes, including reduced downtime, faster decision-making, improved operational transparency, and lower workload on key experts;
organizational and cultural aspects: why AI agents are not just an IT initiative, but a acknowledges shift in the operating and management model.
The talk is aimed at business leaders, digital transformation executives, and CIOs, and demonstrates how AI agents represent the next evolutionary step beyond data platforms and predictive analytics in the oil and gas industry.
In this talk, I'll analyze modern methods for forecasting of multivariate time series on the horizon. I will discuss either regular or irregular (event sequences, e.g., banking transactions) time series. I'll also present several innovative ways to apply large language models to event sequence analysis. The talk is mainly based on two papers from AAAI'26 (main track): 1) Detecting the Future: All-at-Once Event Sequence Forecasting with Horizon Matching (oral talk), and 2) HN-MVTS: HyperNetwork-based Multivariate Time Series Forecasting
Vladimir Naumov
TennisGPT: Generative Language Models for Sports Sequence Simulation
Details
I'd like to present TennisGPT, a GPT-2 based language model trained to simulate tennis match dynamics at the shot level. By treating tennis rallies as a language—where players, shot types, court positions, and match context form a structured vocabulary—the model learns to generate realistic point sequences through autoregressive prediction.

Dataset & Tokenization. Shot-level data from four Grand Slam tournaments (Australian Open, Roland Garros, Wimbledon, US Open) provided by the SCORE Network was used to train the model. The data is tokenized into 796 unique tokens: 518 professional players, shot types with directional information (forehand groundstroke to backhand side, backhand slice down the middle, etc.), court positions, tournament contexts, and game scores.

Model & Training. TennisGPT uses a compact GPT-2 architecture (4 layers, 4 attention heads, 256 embedding dimensions, ~1.5M parameters) that can be trained in 8 hours on a consumer laptop (Apple M1). After 320,000 training steps over 3 epochs, the model achieves a perplexity of 1.09 on held-out data.

Player Embeddings. A notable byproduct is learned player embeddings that capture playing style similarities. UMAP projections reveal meaningful clusters: baseline players group together, serve-and-volley specialists form distinct clusters, and historical playing style evolution becomes visible.

Interactive Demo. Fully client-side web application using ONNX Runtime WebAssembly can be found at https://vovalive.github.io/tennisgpt/. Users can simulate matches between any two players, visualize shot trajectories on an animated court, and explore the 2D embedding space—all running locally in the browser without server infrastructure.

Broader Impact. The SCORE Network provides similar shot-level data for basketball, volleyball, and other sports. Our approach demonstrates a template for applying language models to structured sports sequences, enabling applications in tactical analysis, training simulation, and sports analytics education.
Extreme weather events such as heavy rains, thunderstorms, and hail play a huge role for different parts of human life: aviation, agriculture, everyday life, etc. We decided to focus on the task of thunderstorm nowcasting because of two reasons: (1) thunderstorms are often accompanied with heavy rain and hail, so they have a big impact to industry and everyday life, and (2) thunderstorms are often rapidly developed, so it is useful and necessary to forecast them with a small time discretisation which is 10min in our case. We prepared a visual transformer based model which uses real-time data from weather radars and geostationary satellites to forecast areas with high probability of lightnings along to precipitation nowcasting. The resulted quality of our model is better than classical approaches for that task such as numerical weather forecast and optical flow in terms of F1-measure and IoU, as well as in terms of visual assessment. The proposed model is integrated into Yandex Weather service as a production model.
Benchmarking Deep Research Agents in Pharma
Daniil Anisimov,
Bioptic
Details
"Deep Research" is the most popular LLM-agents scaffolding, as it overcomes pre-train data cutoff and connects to the internet via tools. Such scaffolding improves scores at question answering (QA) benchmarks and allows answering open factual questions. However, the model output is a long report with tens of references and hundreds of statements. Such reports cover a wide range of complex tasks, such as research papers, market analysis, due diligence, etc, which can't get evaluated via mere question answering. Additionally, the ability of an agent to correctly answer 50 questions out of 100 do not accumulate into correctly answering 50 questions in a row, which is necessary in a long reasoning thorough the report compilation.And specifically in pharmaceutical applications, where a missed competitor or a hallucinated clinical phase carries million-dollar risks—evaluation cannot rely on simple question-answering accuracy.

This framework introduces a production-grade evaluation approach for deep research agents, moving beyond static datasets to "living" benchmarks. It tackles the engineering challenges of evaluating long-running, high-cost ($50–$1000/run) agentic workflows via a decomposed architecture that isolates performance of separate stages. Additionally, integrating human-in-the-loop verification via dynamic spreadsheets creates a ground truth that evolves with the market, solving the recency problem inherent in static benchmark data. Finally, benchmark results show that domain-specific agents with self-correction loops achieve 83% recall in competitor discovery—outperforming generalist Deep Research models—by prioritizing evidence chains over mere token generation.
AI Super Agent: Self-Organizing Multi-Agent System for Complex Task Solving
Andrey Kuzminykh,
Andre AI Technologies
Details
This paper presents AI Super Agent, a self-organizing multi-agent system designed to autonomously decompose, plan, and execute complex tasks across multimodal domains.

At its core lies a Cognitive Core — a unifying control architecture that integrates perception, reasoning, memory, and goal management within a continuous Plan–Execute–Control (PEC) loop. This core dynamically orchestrates Model Context Protocol (MCP) servers, maintaining coherence between reasoning processes, action execution, and long-term memory.

The framework incorporates a Graph-based Memory (GraphRAG) enhanced with Deep Research Algorithms, enabling contextual retrieval, semantic graph reasoning, and iterative knowledge synthesis. An Action Graph Engine represents and manages causal task dependencies, allowing agents to construct, evaluate, and refine strategies in real time.

Through this architecture, AI Super Agent demonstrates the capability to self-organize, spawn specialized sub-agents, and adaptively learn from multimodal feedback. Experimental evaluations in domains such as business process automation, financial analytics, and research intelligence reveal substantial improvements in reasoning depth, task completion rate, and coordination efficiency compared to conventional multi-agent baselines.

Beyond its technical contributions, AI Super Agent establishes a foundation for autonomous cognitive ecosystems — systems capable of co-evolving with human collaborators, enabling scalable problem-solving, continuous discovery, and the expansion of collective intelligence.
11:55 – 12:15
Coffee break
12:15 – 13:15
Parallel sessions
Deep learning
Hall C
Large hall
Hall A
LLM development
Vladislav Balaev,
Lanit-technology
Generative models
From Projects to Product: How We Systematized Our Work with LLM
Details
Distillation of diffusion generative models
Evgeny Burnaev,
Skoltech
Details
Will be announced later
Moderator
Will be announced later
Moderator
Salavat Garifullin,
ODS
Moderator
Anton Konushin,
MSU
3D reconstruction into a structured representation (CAD, BIM)
Details
Dmitry Vasyuk,
Microsoft
LLM tool calling and context management
Details
Thermodynamical Analogies in Deep Learning
Dmitry Vetrov,
Constructor University, Bremen
Details
Learning diffusion bridge models is easy; making them fast and practical is an art. Diffusion bridge models (DBMs) are a promising extension of diffusion models for applications in image-toimage translation. However, like many modern diffusion and flow models, DBMs suffer from the problem of slow inference. To address it, we propose a novel distillation technique based on the inverse bridge matching formulation and derive the tractable objective to solve it in practice. Unlike previously developed DBM distillation techniques, the proposed method can distill both conditional and unconditional types of DBMs, distill models in a one-step generator, and use only the corrupted images for training. We evaluate our approach for both conditional and unconditional types of bridge matching on a wide set of setups, including super-resolution, JPEG restoration, sketch-to-image, and other tasks, and show that our distillation technique allows us to accelerate the inference of DBMs from 4x to 100x and even provide better generation quality than used teacher model depending on particular setup.
Over five years at LANIT, we've encountered the typical project model many times: pre-sales -> pilot -> production. Most initiatives never made it to production, and those that did required a new architecture and significant investment each time. With the advent of large language models (LLM), it became clear that tasks across different departments were very similar in structure, but in practice, companies continued to launch "one service per department," which quickly degenerated into an expensive and unmanageable ecosystem.

In this talk, I'll explain how we came to the conclusion that the key constraint isn't technology, but speed and the economic model. I'll also touch on why, in a corporate world, it's more important to quickly test hypotheses than to spend time refining quality: data is almost always poor, requirements change, customers can leave at any stage, and the cost of mistakes is high. In such an environment, a common approach that allows for the use of the same computing resources and tools for HR, help desk, legal, analytics, and other functions is more beneficial than a separate LLM service.

The components that proved critical for us were: unified access to models, monitoring, tracing, reuse of assistants, cost and security management, and the ability to quickly launch prototypes and collect feedback.

This report will be useful for AI managers, architects, and business leaders who want to scale AI initiatives, reduce costs, and increase the speed of bringing solutions to production.
As large language models continue to advance, their effectiveness increasingly depends on more than raw model capability. Practical performance hinges on how intelligently they utilize external tools and how efficiently their context is managed. This session explores strategies for optimizing LLM systems through structured tool calling, adaptive orchestration, and disciplined context management.
We will examine approaches to reduce hallucinations, improve precision, enhance scalability, and deliver reliable, cost-efficient results in real-world applications.
We will use lessons learned from Microsoft Word integration with Copilot to see how the context is optimized to achieve best results.
Attendees will gain insights into architectural patterns, best practices, and lessons learned from deploying production-grade LLM solutions.
We will explore approaches to reducing errors, improving accuracy, enhancing scalability, and delivering reliable, cost-effective results in real-world applications.
Drawing on the experience of integrating Microsoft Word with Copilot, we will see how context is optimized to achieve the best outcomes.
Participants will gain insights into architectural patterns, best practices, and lessons learned from deploying production-grade large language model solutions.
Jelena Graovac,
University of Belgrade
LLM-Assisted Grading of Open-Ended Student Responses
Details
The paper presents an open-source, LLM-powered AI-assisted grading platform designed to support the evaluation of open-ended questions in higher education, aiming to reduce instructor workload and enhance grading consistency. The system allows instructors to configure grading strictness and supports two approaches: reference-based grading, using instructor-provided solutions and notes, and generative grading, which automatically synthesises reference answers from course materials. For each submission, the platform generates both a numerical score and a structured explanation highlighting correct reasoning, omissions, and conceptual errors, enabling transparent review. Evaluation on authentic exam responses across multiple courses demonstrates strong alignment with human grading, achieving Pearson correlations up to 0.90, with reference-based grading outperforming generative approaches in stability and accuracy. These findings suggest that LLM-based AI-assisted grading can substantially improve efficiency and reliability within a human-in-the-loop framework.
The stochastic optimization of a loss function during the training of deep neural networks shares many similarities with classical thermodynamical systems. By analysing stochastic differential equations that describe the evolution of (scale-invariant) neural network during training we derive the characteristics of its stationary state. Surprisingly it becomes very similar to ideal gas law. Following this similarity one may define analogues of temperature, pressure, and volume for neural networks. Using those analogies we establish various thermodynamic potentials such as Gibbs and Helmholtz free energies and show that they are minimized during training under popular training protocols.
13:15 – 13:30
Break
13:30 – 14:00
Parallel sessions
Generative AI - development
Hall C
Large hall
Hall A
AI and Investments
LLM research
Insider View on China's Pragmatic AI Development
Ilya Pavlov,
SVST Ventures
Details
Will be announced later
Moderator
Dalibor Marinović, Serbian AI Association
Moderator
Yaroslav Kopotilov, Toptal, Data Sanity
Moderator
Cyril Shtabtsovsky,
AlphaSemantic
AI/ML in Venture Capital
Details
Marat Saidov,
Microsoft
Production-Ready Adapters for On-Device Language Models
Details
This talk covers the end-to-end journey of building production-ready LoRA adapters for on-device language models. I'll walk through our approach to adapter training and how we shipped it for Summarize Conversation and Rewrite use cases. I'll also cover the caveats and disadvantages of LoRA adapters that engineers must be aware of to build stable and robust solutions.
This presentation provides an insider’s analysis of key trends driving the development of artificial intelligence in China. It examines the unique interaction of top-down industrial policy, large-scale data ecosystems, and innovative models that characterize the Chinese artificial intelligence landscape. It also explores new trends in vertically integrated solutions for manufacturing, logistics, and smart governance. The report will explore the key drivers of this trend, including the critical role of venture capital (VC) in bridging government initiatives with market-led, agile innovation, and the strategic push towards technological self-sufficiency in computing. In particular, we will examine how venture investments have shifted from consumer-facing models to deep technology, industrial integration, and foundational hardware.
How to create a viral AI sticker pack generator based on users’ photos
Natalia Khanzhina,
Independent Researcher
Details

In this talk, we’ll share our recipe for creating a brand-new GenAI product that attracted over 200,000 users and is built entirely with AI. We’ll cover everything from data generation to a full neural network stack, all accomplished by a single AI engineer.
Generative AI–based solutions at Lemana PRO (ex-Leroy Merlin)
Ksenija Blažević,
Lemana Tech
Details

The talk covers two content generation solutions in detail: a system for automated product description generation and a visual generation pipeline that creates interior images from a single object photo provided by suppliers. For specific product categories such as curtains, wallpapers, decor, lighting, and outdoor furniture, the system places the object into a relevant interior background while preserving object geometry, scale, projection accuracy, and visual fidelity.
The remaining two solutions are based on large language models. A corporate LLM chatbot provides secure access to the company’s internal knowledge base, reducing costs by up to six times compared to commercial solutions while mitigating security risks. An AI assistant built using RAG and agent-based approaches supports call center operators, reducing average response time by approximately 30%.
While the talk is primarily business-oriented, it includes references to the underlying architectures and technologies. Special attention is given to interior image generation, an area with very limited publicly available production-level case studies and technical insights.
14:00 – 15:00
Lunch
15:00 – 15:45
Parallel sessions
GenAI - academic track
Hall C
Large hall
Hall A
Reinforcement learning
Visual Language Models
Towards Internet-Scale Training for Agents
Ruslan Salakhutdinov,
Meta (online)
Details
Will be announced later
Moderator
Igor Pivovarov
Moderator
RusCode: Russian Cultural Code Benchmark for Text-to-Image Generation
Julia Agafonova, Kandinsky Lab
Details
Viacheslav Vasilev,
Kandinsky Lab
Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework
Details
In recent years, the rapid progress of Large Language Models (LLMs) has opened the door to building language-guided agents that can carry out complex, multi-step tasks on behalf of users, much like human assistants. Developing agents that can perceive, plan, and act autonomously has long been a central goal of artificial intelligence research. In this talk, I will introduce multimodal AI agents capable of planning, reasoning, and executing actions on the web. These agents not only comprehend text but can also navigate and interact effectively in visual environments. I will present VisualWebArena, a novel framework for evaluating multimodal autonomous language agents, and describe an inference-time search algorithm that enables agents to explicitly explore and perform multi-step planning in interactive web settings. Finally, I will demonstrate how we can build an automated data pipeline for Internet-scale training of such agents. This pipeline generates web navigation tasks across 150,000 live websites, executes LLM agents on them, and automatically evaluates their success
Physically and Semantically Consistent ML Methods for Robust Visual Localization in Challenging Environments
Sergey Kolyubin,
ITMO University
Details
Talk will systematically review recent research results of BE2R Lab at ITMO University on visual localization in dynamic and visually degraded environments enabling robot resilient autonomy, AR/VR applications and beyond. All methods are joined by a unified concept of bringing physical and semantic consistency of statistical ML via novel DNN architectures, world representation and data association approaches to build a reliable spatial intelligence layer for Embodied AI systems. We will bring attention to insights from our papers that have been accepted for 2025 Core A*/A papers in robotics and AI IEEE ICRA and IROS, while some novel results to be presented have been recently submitted to A* conferences of 2026.
VLMs at Avito: Architecture and Adaptation for Efficient Marketplace Applications
Konstantin Vesnin,
Avito Tech
Details
This talk covers two practical pillars of Visual Language Models (VLMs) in production at Avito:
1. VLM architecture
2. Adapting pretrained VLMs to a new language and business tasks

In the architecture block, we review the baseline “image encoder + LLM” pipeline, compare lightweight adapters with Q-Former-style designs, and discuss high-resolution processing (multi-crop), M-RoPE, and a simple approach to video via frame sampling.

Building on this, we move to adaptation: while language adaptation is well-studied for LLMs, we present a VLM-specific adaptation approach—combining multimodal instruction data from marketplace listings, translated public instruction datasets, and tokenizer improvements—that can deliver up to a 2× efficiency gain on a different language.

We conclude with how the adapted VLM is efficiently integrated into Avito workflows, powering description generation, search keyword generation, and OCR, supported by benchmark and task-level evaluation.
Text-to-image (T2I) diffusion models are popular for introducing image manipulation methods, such as editing, image fusion, inpainting, etc. At the same time, image-to-video (I2V) and text-to-video (T2V) models are also built on top of T2I models. We present Kandinsky 3, a novel T2I model based on latent diffusion, achieving a high level of quality and photorealism. The key feature of the new architecture is the simplicity and efficiency of its adaptation for many types of generation tasks. We extend the base T2I model for various applications and create a multifunctional generation system that includes text-guided inpainting/outpainting, image fusion, text-image fusion, image variations generation, I2V and T2V generation. We also present a distilled version of the T2I model, evaluating inference in 4 steps of the reverse process without reducing image quality and 3 times faster than the base model. We deployed a user-friendly demo system in which all the features can be tested in the public domain. Additionally, we released the source code and checkpoints for the Kandinsky 3 and extended models. Human evaluations show that Kandinsky 3 demonstrates one of the highest quality scores among open source generation systems.
Text-to-image generation models have gained
popularity among users around the world. However,
many of these models exhibit a strong bias
toward English-speaking cultures, ignoring or
misrepresenting the unique characteristics of
other language groups, countries, and nationalities.
The lack of cultural awareness can reduce
the generation quality and lead to undesirable
consequences such as unintentional insult, and
the spread of prejudice. In contrast to the field
of natural language processing, cultural awareness
in computer vision has not been explored
as extensively. In this paper, we strive to reduce
this gap. We propose a RusCode benchmark for
evaluating the quality of text-to-image generation
containing elements of the Russian cultural
code. To do this, we form a list of 19 categories
that best represent the features of Russian visual
culture. Our final dataset consists of 1250
text prompts in Russian and their translations
into English. The prompts cover a wide range
of topics, including complex concepts from art,
popular culture, folk traditions, famous people’s
names, natural objects, scientific achievements,
etc. We present the results of a human
evaluation of the side-by-side comparison of
Russian visual concepts representations using
popular generative models.
15:45 – 16:00
Break
16:00 – 16:45
Parallel sessions
Reinforcement learning
Hall C
Large hall
Hall A
Neurocognitive architectures
Vision for Robots and Autonomous Systems
Scene Graph-driven Spatial Understanding and Reasoning
Dmitry Yudin,
MIPT
Details
Will be announced later
Moderator
Nikita Andriyanov,
Financial University
Moderator
Alexander Boldachev,
Naevius Fze (UAE, Dubai)
Moderator
Alexey Kovalev,
AIRI, MIPT
Vision-Language-Action Models: From Foundation to Future
Details
Zero-Sum Positional Differential Games as a Framework for Robust Reinforcement Learning: Deep Q-Learning Approach
Anton Plaksin,
Nebius
Details
Spatial understanding and reasoning is a fundamental challenge in computer vision and artificial intelligence. Scene graphs are structured representations that capture objects and their relationships, providing a powerful framework for this task. In this talk, we will explore how scene graph-driven methods enable robots and autonomous vehicles to interpret complex 3D dynamic scenes, support reasoning about object interactions, and improve performance in tasks such as visual question answering, navigation, and robotic manipulation. The presentation will cover key concepts, recent advances, and real-world applications illustrating how scene graphs bridge perception and reasoning in intelligent systems.
This lecture provides a comprehensive overview of Vision-Language-Action (VLA) models, the cutting-edge systems that connect visual perception and natural language to physical action. We will explore the current state of the art, including their architecture, training methods, and applications in robotics and autonomous systems. The discussion will then shift to the future, addressing key challenges such as safety, generalization, and real-world deployment, and outlining the exciting prospects for truly general-purpose embodied AI.
Alexei Samsonovich,
George Mason University, NRNU MEPhI
Implementing Digital Self-Awareness Based on a Cognitive-Neuromorphic Approach
Details
Aleksey Kabanov,
BTR R&D
An elementary universal cycle of continuous attention as a new model for computing meta-attractor generalizations
Details

Architectures of modern LLMs and VLMs are very different from the functional architecture of the human brain. As a result, LLMs have limited cognitive abilities, compared to humans. Brain-derived principles expressed in the form of a Biologically Inspired Cognitive Architecture (BICA) can be very useful in an LLM-based agent design. The proposed approach is inspired by Cognitive Neuropsychology and Functional Neuroanatomy. It is shown how the ideas of metacognition and self-awareness implemented in the form of a multi-agent system can fertilize a neuromorphic architecture design, potentially leading to a new-generation AI. Presented empirical findings suggest that self-aware LLM-based architectures can be more efficient, compared to traditional multiagent architectures, in solving LLM-hard problems.
Modern transformers require enormous amounts of computation both during training and model execution. The proposed concept of building a hierarchy of generalizations in the form of attractor trajectories is based on directly constructing the structure of generalizations and can be trained on existing models.
Anton Kolonin,
Aigents
Cognitive Architecture for Neuro-Symbolic Experiential Learning
Details
Presenting original cognitive architecture for neuro-symbolic experiential learning based on space of states and global feedback, solving the reinforcement learning problem in environments like Open AI Gym Atari Breakout.
Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents’ policies robust to any opponent’s actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs’s condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations. Based on these results, we present the Isaacs Deep Q-Network algorithms and
HGRPO: Hierarchical Grouped Reward Policy Optimization for Multi-Turn Conversational Agents
Karina Romanova, Yandex
Details
Training conversational agents for multi-turn dialogues with Reinforcement Learning presents a fundamental challenge: how to correctly assign credit to individual actions when the reward signal comes only at the end of a dialogue. Group Relative Policy Optimization (GRPO) addresses this by grouping similar actions together, but standard implementations group all dialogue steps together, comparing incomparable actions across different dialogue stages.
We present HGRPO (Hierarchical Grouped Reward Policy Optimization), a novel modification of GRPO that introduces hierarchical step grouping for multi-turn dialogue agents. Our approach includes two complementary grouping strategies: (1) State-Based Dynamic Grouping, where steps are compared only if they occur in similar dialogue states, with soft assignment allowing steps to belong to multiple groups with different weights; and (2) Tree-based Grouping, which groups actions by their position in the dialogue decision tree.
We applied HGRPO to train a booking agent for restaurants and beauty salons, deployed in production at Yandex smart assistant Alice. Results show significant improvements in agent truthfulness (8.0 percentage points improvement on production traffic) and 10.7% reduction in dialogue length, while maintaining task success rate. The hierarchical grouping particularly improved the agent's ability to provide honest responses and avoid hallucinations by correctly attributing which actions at which dialogue stages lead to truthful outcomes. The reduction in dialogue length demonstrates that HGRPO enables more efficient action selection by better understanding which steps contribute to task completion
Our findings demonstrate that proper credit assignment through hierarchical grouping is crucial for training high-quality multi-turn conversational agents, and the approach is applicable to other agentic tasks requiring sequential decision-making.
16:45 – 17:00
Coffee Break
17:00 – 18:30
Intelligence and consciousness
Large conference hall
17:00 – 17:20
Why doesn't LLM have consciousness?
Alexander Krainov,
Yandex
Humanity has not come to a consensus on what consciousness is and how to determine its presence. Nevertheless, it is quite obvious that LLM does not have it.
So why does a mouse have consciousness, for example, but an LLM with approximately the same order of parameters does not?
Apparently, we will get the answer when we can (if we can) create a neural network with consciousness. In the meantime, we can analyze how artificial and natural neural networks fundamentally differ, and hypothesize which of these differences plays a key role in the emergence of consciousness.
17:20 – 17:50
Theory of consciousness and subjectivity
Igor Pivovarov,,
MIPT, Openalks.AI
The talks proposes a new theory of the consciousness, subjectivity and intelligence - TEVSER. Theory is derived from a fundamental mathematical theorem. The central element of the theory is the idea of a living organism as a self-regulating system. The evolution of living things can be considered as the evolution of regulators, starting with the most basic type of regulation "based on error" and ending with complex regulators building a model of the world and themselves in the world. Consciousness is considered not as a single whole, but as consistently forming layers of regulation – this allows us to divide the complex concept of "consciousness" into parts and look at them from the point of view of functional design and their emergence. The TEVSER theory organically integrates the theory of the global workspace, the theory of higher orders, the theory of predictive coding and other theories of consciousness.
17:50 – 18:30
Discussion
Sergey Shumsky, PhD, Chief Scientist,
Symbolic Mind, Inc
Prof. Konstantin Anokhin (online)
MD, PhD, FRAS
Institute for Advanced Brain Studies, MSU
Alexander Krainov,
Director of Artificial Intelligence Development, Yandex
Igor Pivovarov,
Chief Analyst at the MIPT Center for Artificial Intelligence,
Head of the Artificial Intelligence Almanac Project,
Director of OpenTalks.AI
18:30 – 18:45
The conference closing
Large conference hall
19:00 – 23:00
Afterparty with live music
A dinner after the conference in a famous restaurant in a center of Belgrade, including meals, drinks and live music. You will have a great chance to network informally with speakers and attendees and enjoy live music performance by rock-bands from AI/ML industry!