OpenTalks.AI /
6-7 March 2024
Tbilisi

OPEN CONFERENCE ON
ARTIFICIAL INTELLIGENCE

Agenda
OpenTalks.AI 2024

version from 15.02.2024
Tbilisi time, GMT+4
19:00-21:00
Welcome drinks and networking
The evening before the conference is a great time to drink a glass of wine and meet familiar faces in an informal setting!) And of course, to meet new people!

Also, this is a place where you can register and get your badge to spend less time in the queue in the morning)

Location will be announced to participants by email a day before.

LLM и GenAI Day

Wednesday, March 6
08:30 – 10:00
Registration and welcome coffee
09:00 – 09:45
Introduction to AI for beginners
Igor Pivovarov, OpenTalks.AI
Before the conference begins - a brief introduction to AI for beginners. In simple words about main technologies and applications, what is Computer Vision and Large Language Models, training and inference, what are Transformers and Attention and much more. And a brief guide to this technologies and applications appearance on the conference.
10:00 – 11:30
Plenary session 1 - NLP & LLM overviews
Main conference hall
10:00 – 10:10
Opening of the conference and first day
Igor Pivovarov, OpenTalks.AI
What will be at the conference, main ideas, numbers, accents.
10.10 – 10.50
From Language understanding to Autonomous agents: the evolving landscape of Large Language Models
Mikhail Burtsev
London Institute for Mathematical Sciences (UK)
In this talk, we will explore the rapid advancements and nuanced limitations of Large Language Models (LLMs) like ChatGPT that have revolutionized AI in the past year. The first part provides a general overview of LLMs, highlighting their proficiency in solving a broad range of natural language understanding problems. However, we will also present data showing that LLMs may lag behind more specialized traditional NLP models in certain specific tasks, illustrating the trade-off between universality and task-specific quality. We then delve into the fundamental limitations of transformer input size and discuss our innovative solution: the development of a recurrent memory transformer that sets a new record for sequence length processed by a neural network. The latter part of the talk shifts focus to the exciting potential of LLMs in creating autonomous agents, capable of independent action and decision-making. We will review popular prompting techniques like the chain of thought and tree of thought, and address the current challenges in enabling LLMs to learn and apply abstract rules, particularly in non-standard domains. This talk aims to provide a comprehensive understanding of where LLMs excel, where they falter, and the exciting possibilities and challenges that lie ahead in AI research and applications.
10.50 – 11.30
Surpassing training data: getting more from LLMs at inference time
Alexander Novikov (online)
DeepMind (UK)
Everyone agrees now that LLMs are here to stay and can interpolate the training data – collective intelligence of the internet – really well. But can they go much beyond?
I'll present an overview of recent ideas on how to approach surpassing human abilities with LLMs in different domains: code generation (things like FunSearch (Nature, 2023), AlphaCode and AlphaCodium), maths (AlphaGeometry), actions (Voyager: the minecraft agent) and reasoning (Tree of thoughts).
11:30 – 12:00
Coffee break
12:00 – 12:45
Parallel sessions
AI in legal practice
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
Recommendation systems - under the hood
Hall 4
Generative AI: multi-modality
To be announced later
Moderator
Alexander Tuzhilin, New York University
LLM Tutorials
When Variety Seeking Meets Unexpectedness: Incorporating Variety-Seeking Behaviors into Design of Unexpected Recommender Systems
Details
Tutorial - How to train large language models
Murat Apishev,
Samokat.tech
Details
Holger Zscheige,
Infotropic Media
Moderator
Exact Algorithms for Boolean Matrix Factorisation of Contranominal Scales and its Applications in Recommender Systems
Dmitry Ignatov, HSE
Details
To be announced later
Moderator
Denis Dimitrov,
MSU
Large multimodal models - the path to AGI?
Details
Irina Abdullaeva,
AIRI
AIRI multimodal event model
Details
Large Language Models are the basis of most AI products, and many companies are constantly competing with each other to train the strongest models. The process of LLM creation is non-trivial and consists of a wide range of steps and subtasks. Although there are no perfect solutions, through many experiments over the past years, researchers and engineers have identified key ideas and techniques that can help one to produce a higher-quality model with less time and resources consumption. This tutorial will cover the main aspects of modern LLMs training (data, architecture, pre-train and fine-tuning scaling and optimization, modifications and evaluation of models) and the most common practices associated with them. It is intended for audience with experience in ML and DL, but without specialization in training LLMs.
Variety seekers are those customers who easily get bored with the products they purchased before and, therefore, prefer new and fresh content to expand their horizons. Despite its prevalence, variety-seeking behavior is hardly studied in recommendation applications because of various limitations in existing variety-seeking measures. To fill the research gap, we present a variety-seeking framework in this paper to measure the level of variety-seeking behavior of customers in recommendations based on their consumption records. We validate the effectiveness of our framework through user questionnaire studies conducted at Alibaba, where our variety-seeking measures match well with consumers' self-reported levels of their variety-seeking behaviors. Furthermore, we present a recommendation framework that combines the identified variety-seeking levels with unexpected recommender systems in the data mining literature to address consumers' heterogenous desire for product variety, in which we provide more unexpected product recommendations to variety-seeking consumers and vice versa. Through off-line experiments on three different recommendation scenarios and a large-scale online controlled experiment at a major video-streaming platform, we demonstrate that those models following our recommendation framework significantly increase various business performance metrics and generate tangible economic impact for the company. Our findings lead to important managerial implications to better understand consumers' variety-seeking behaviors and design recommender systems. As a result, the best-performing model in our proposed frameworks has been deployed by the company to serve all consumers on the video-streaming platform.
In this talk, we examine certain properties of state-of-the-arts algorithms for Boolean matrix factorisation (like GreConD and IterEss), a popular technique in Data Mining with binary relational data. This greedy algorithm was inspired by the fact that the optimal number of factors for the Boolean Matrix Factorisation (BMF) can be chosen among the formal concepts of the corresponding formal context. In particular, we consider one of the hardest cases (in terms of the number of possible factors), the so-called contranominal scales, and show that the output of GreConD is not optimal in this case. Moreover, we formally analyse its output by means of recurrences and generating functions and obtain the closed form for the returned number of factors. An algorithm generating the optimal number of factors and the corresponding product matrices P and Q is also provided by us for the case of contranominal scales. In addition to algorithmic studies, we provide the listeners with a short summary of our previous results on BMF applications for Collaborative Filtering (in collaboration with E. Nenova, M. Ahmatnurov et al.) along with some recent results for Boolean tensors as well. (This is a joint work with Alexandra Yakovleva and Yazag Meziane)
Nowadays, large language models are very popular in the scientific and everyday sphere. We read news about them, see impressive video presentations of large corporations, and learn conspiracy theories that these very language models have already learned the world better than the average person. Let's try to understand what LLMs can do now, what else they can work with besides texts, and how they can help in creating a super-powerful intelligent machine. As part of the talk, I will share my research experience in this area, talk about experiments, benchmarks, and other big human challenges in open questions, as well as our OmniFusion multimodal architecture. We'll also discuss the multi-agent approach, how LLMs "communicate", Chain-of-Thought and Tree-of-Thought mechanisms, shared memory, self-reflection, and other aspects that are already worth looking into. I will also talk a bit about our research in the field of generative AI (namely Kandinsky 3.0, Kandinsky Video) - and most importantly about how to link OmniFusion and, for example, Kandinsky into a single system that can solve almost the entire range of tasks at the intersection of different modalities
In the field of event sequences, unlike computer vision (CV) or natural language processing (NLP), it is not common to use a pre-trained model to solve multiple problems at once and generalize to new ones. Existing approaches have limitations in terms of flexibility, generalization, and computational efficiency. In addition, integrating long sequences of events into neural network-based approaches remains challenging.
To address these challenges, this paper proposes a novel approach called Event Sequences Question Answering (ESQA) based on the Large Language Model (LLM). We present all event sequence based tasks in question-answering form. Moreover, we propose a generic method for encoding event sequences using a trainable coder based on the Transformer architecture. Efficient feature extraction from the coder output and a significant reduction in sequence length are achieved by using the Q-Former model as a connecting layer between the coder and the LLM.
Our empirical results show that applying pre-trained large language models to the event sequence modality in ESQA provides quality comparable to state-of-the-art approaches for a variety of prediction tasks in multi-task environments on various open-source financial datasets. In addition, ESQA has demonstrated adaptability to new tasks with quality that exceeds statistical performance.
Alexey Goncharov, Compress.ai
Effective LLM inferencing for applied tasks
Details
How to make LLM inference fast, cost-effective and customizable when running on company servers in the absence of expensive GPUs? In the talk I will share my development experience and talk about methods of scaling infrastructure under LLM and approaches to efficiency growth sandbox for experimentation.
Topic will be announced later
Thomas G. Martin,
Lawdroid, CA
Details
TBD
Alan Ragueneau,
Denton Nextlaw, SW
Topic to be announced later
Details
TBD
Anna Romanova,
MIPT
Elements of legislation for autonomous artificial intelligence systems
Details
The significant part of the operational context for autonomous company management systems is the regulatory and legal environment in which corporations operate. In order to create a dedicated operational context for autonomous artificial intelligence systems, the wording of local regulatory documents can be simultaneously presented in two versions: for use by people and for use by autonomous systems. In this case, the artificial intelligence system will get a well-defined operational context that allows such a system to perform functions within the required standards. Local regulations that provide basis for the joint work of individuals and autonomous artificial intelligence systems can form the grounds for the relevant legislation governing the development and implementation of autonomous systems.
Alexey Vasiliev, Sber
Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?
Details
Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.
Alexander Rezanov,
Rask AI
Moderator
Ilia Nenakhov,
Yandex Market
Yandex Market neural banners. Under the hood.
Details
I'll be discussing how we tackled an AdTech challenge: creating advertising banners on Yandex Market using neural networks. We'll delve into the origins of this task within advertising systems and its unique characteristics within E-commerce. Our primary focus will be on the technical details of the solution, including YaGPT and its tuning for specific tasks, ptune, SAM architecture, and its optimization for performance. Additionally, I'll cover the runtime design and the entirety of the production process, shedding light on the difficulties we faced and the outcomes we achieved. To address these challenges, this paper proposes a novel approach called Event Sequences Question Answering (ESQA) based on the Large Language Model (LLM). We present all event sequence based tasks in question-answering form. Moreover, we propose a generic method for encoding event sequences using a trainable coder based on the Transformer architecture. Efficient feature extraction from the coder output and a significant reduction in sequence length are achieved by using the Q-Former model as a connecting layer between the coder and the LLM.
Our empirical results show that applying pre-trained large language models to the event sequence modality in ESQA provides quality comparable to state-of-the-art approaches for a variety of prediction tasks in multi-task environments on various open-source financial datasets. In addition, ESQA has demonstrated adaptability to new tasks with quality that exceeds statistical performance.
12:45 – 13:00
Break
13:00 – 14:00
Parallel sessions
Business solutions based on LLM
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
CUDA Tutorial
Mathematics and AI
Hall 4
Poster session
Implementing AI into office software
Anna Pleshakova, OnlyOffice
Details
RAG and its derivatives. Product cases where LLM brings real value to business.
Inna Lizunova,
Speech Technology Center
Details
Custom CUDA Kernels: Practical Approaches to Low-Level Optimizations
Grigorii Alekseev, Perplexity
Details
Mathematics and AI
Ivan Oseledets,
AIRI
Details
Sergey Kuznetsov, HSE
Moderator
Roman Doronin, Bioptic.io
Moderator
Creating applications with an LLM: more than just writing a prompt
Details
Sergey Verentsov, EORA
GigaSearch or Search Engine on GigaChat
Details
Prohor Gladkih, SberDevices
I'll tell you how we combat hallucinations and data obsolescence in GigaChat using the RAG (Retrieval-Augmented Generation) approach
At first glance, it may seem that application creators using LLM simply write prompts and integrate them with a public API. However, when automating scenarios using LLM, developers need to pay special attention to the correctness of responses and security when interacting with the model. At the same time, methods for designing LLM call chains are evolving, allowing prompt engineers to develop not just prompts, but entire scenarios of data retrieval and model calls using techniques such as ReAct, RAG, FLARE, and others. This presentation will cover the main challenges in creating LLM-based applications, the list of required competencies, as well as the peculiarities of planning, development and support of such applications.
Alexander Gasnikov, Innopolis, MIPT
AI wine, AI chocolate, and other new optimization techniques
Details
We will talk about how to solve optimization problems if it is impossible to obtain the gradient of the objective function and it is impossible even to obtain the value of the objective function. But we can compare the values of the objective function at different points. That is, by requesting the values of the objective function in a set of points, one can, for example, tell at which point the value was the smallest (or largest), but one cannot tell what is this value. Such problem arise when developing various food products (for example, chocolate) with the help of Artificial Intelligence. The report will discuss the question of which algorithms are optimal for a class of smooth optimization problems (convex, non-convex) of large dimension and small dimension.
With the support of
With the popularity of LLM, we at Speech Technology Center have gone through a large number of product pilots. Most of them are somehow related to generative search (RAG, Retrieval Augmented Generation) from very different sources of information. In this report, we will share our own experience in solving product cases using LLM:
- What can "vanilla" RAG transform into in product cases? How to recognize it?
- Which RAG-like cases are worth pursuing, and which ones are not, and under what conditions?
- Where is the business value in such cases?
Innovations in artificial intelligence have led to it becoming an integral part of society and finding applications in a variety of fields. In this session, we will cover AI implementation into office software; highlight what benefits AI can bring to users when working with documents; take into consideration various markets in terms of using the leading AI providers.


In this talk, I will highlight several fundamental problems in AI that lack mathematical formalism, for example, alignment of large language models. On the other hand, many concepts in Mathematics can be effectively used to improve quality of ML algorithms. For example, hyperbolic geometry is a vivid example. Replacing ordinary embeddings with hyperbolic ones leads to SOTA in metric learning.
Agenda:
1. How to fuse a QKV Attention Layer into a single CUDA kernel?
2. Step-by-step guide to writing efficient kernels using a basic algorithm

Quick Overview:
1. We will explore kernel-level operations to understand how LLM layers function, specifically focusing on the QKV attention layer from llama_7b. I will present my approach for implementing a fused CUDA kernel, including code snippets. This session will also cover benchmark analysis and potential optimization strategies.
2. This segment involves a deeper examination of CUDA kernels, focusing on optimization techniques and profiling. We'll research the process of enhancing a basic histogram kernel, analyzing its behavior from various aspects and comparing it to a third-party solution.
14:00 – 15:00
Lunch
15:00 – 16:30
Plenary session 2 - overviews
Main conference hall
15:00 – 15:45
Main in Generative AI in 2023
Alexander Notchenko
Co-founder of OpenDataScience, Organizer of ODS London
Last year was monumental for generative AI, and we all probably understand importance of LLMs in that revolution. But in this talk I will outline all the other important developments in generative AI for the past year, specifically in 2D Images, Videos, Audio, 3D models, Animations and much more. I will analyse the main reasons that drive development of these models.
15:45 – 16:05
AI in retail - overview
Mikhail Neverov
X5 Tech
We will show how Data Science and AI are transforming grocery retail, from choosing a store location to personalizing service and optimizing employee performance. Let's look at examples of using analytics to predict trends, manage inventory, and develop loyalty programs that increase sales and improve the customer experience.
Join us to find out how data-driven solutions make retail more adaptive and customer-oriented.
16:05 – 16:30
CTOs' perspective on generative AI

Marina Dorokhova
Yakov & Partners
The talk focuses on the outlook of generative AI for business as seen by CTO's. The results are obtained through our own survey of CTOs in 100 companies in Russia across 15 industries and discuss their expectations from implementing generative AI, most popular use-cases, expected budget to spend on generative AI, and common risks and problems seen in the field. Thus, it synthesizes main lessons that businesses, developers and researchers can take into account when developing generative AI models and use-case specific products for industries.
16:30 – 17:00
Break
17:00 – 18:00
Parallel sessions
Recommendation systems in industry
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
Generative AI: images & video
Reinforcement learning
Hall 4
Tutorials
Tinder in Direct Selling
Elizaveta Pushkareva,
Tochka
Sergey Lukashkin, VTB
Moderator
Yuri Visilter, GOSNIIAS
Moderator
Arkady Sandler,
True Click Technologies
Igor Pivovarov
OpenTalks.AI
Moderator
Recommendation systems in media platforms
Fedor Smirnov,
Glowbyte
Probability programming
Methods for optimizing AI algorithms
Details
Dmitry Ivanov
MSU, Cifrum
Evgeny Burnaev,
Skoltech
From stochastic differential equations to the Monge-Kantorovich problem and back: the path to artificial intelligence?
Ruslan Ermagambetov, Kontur
Pulse-index: dynamic estimation of bankrupcy probability
Details
Vyacheslav Korzun, MIPT
Autoregressive models for Conversational Gestures Generation. The Path through GENEA challenges
Details
Details
Details
Details
Details
Will be announced later
Moderator
Neural network for optimizing the number and cost of conversions in advertising campaigns
Details
Alexey Biryukov,
Andata
Ruslan Salakhutdinov,
Carnegie Mellon
Reinforcement learning - recent advances (talk name will be updated)
Details
(online)
Every day, sales managers in Tochka call to 3000+ customers, but what if we can influence this process and select customers to call for each seller? I'll tell you how we made a machine that ranks companies by the likelihood of converting into a client, and then converted it into a kind of Tinder: cards with contacts are divided between sellers every day based on personal speeds, predicted productivity and urgency of the call. I will touch on the technical side of the personal speed prediction machine + card booking strategy and how to choose the optimal booking time window. I'll also show you how much money we made from this.
Exploitation of neural network on scaling and subsequent optimization of brand advertising campaigns, with the condition of increasing the share of targeted bids and without cannibalization of organic traffic
During my lecture, I'll explain how we at Kontur.Focus developed and integrated a dynamic model for assessing the probability of bankruptcy into our product. This model is designed to help Kontur.Focus users in assessing the reliability of counterparties, as it has a predictive ability to assess the risk of bankruptcy of a company based on financial statements, arbitration claims and other events. We will analyze the details of training and deploying the model, what difficulties we encountered during integration, and how we collected feedback on the new feature.
The development of large language models and speech synthesis systems has led to the emergence of "intelligent" agents in virtual worlds. These agents also require correct gestures during interaction with humans. In this talk I will tell you how these gestures can be generated from speech and more. Here I present our approaches for conversational gestures generation emerged from participation in the GENEA Challenges, which led us to three papers. I will describe the problem itself, the first models for solving it and our approaches. What the main limitation of autoregressive models that we encountered, how we tried to overcome it and how video games helped us.
In the realm of Over-The-Top (OTT) and Video On Demand (VOD) services, two principal challenges significantly impact operational efficiency: churn rate and content utilization. A high churn rate undermines the efforts invested in user acquisition, while suboptimal content utilization can decrease viewer interest, further exacerbating the churn issue. These platforms often allocate the largest portion of their budgets to acquiring new users and securing content rights. Therefore, addressing the critical questions of how to maintain viewer engagement and which content to acquire—or how to better leverage existing content—becomes essential for success in the OTT/VOD industry.
A study by PWC highlights that viewer retention correlates positively with the breadth of content consumed, indicating that a diverse and engaging content library is key to reducing churn rates. Given this, market leaders heavily invest in sophisticated recommendation algorithms, seeking to differentiate themselves in a highly competitive market.
The upcoming conference presentation will explore strategies for companies that may not have the resources to compete directly with industry giants. It will delve into how these organizations can still access cutting-edge technology in recommendation systems to enhance viewer engagement and make informed content acquisition decisions, thereby finding their own path to success in the crowded OTT/VOD marketplace.
The topic is probabilistic programming and generative probabilistic models. Probabilistic programming allows incorporating expert knowledge and assumptions into machine learning models, primarily about the interrelationships of various factors, and taking into account unobservable factors that control the process of data generation. As a practical example, let us consider the task of determining the latent needs that determine the structure of customers' receipts.
A.N. Kolmogorov is the greatest mathematician of the 20th century, the founder of modern probability theory, who also laid the foundations for the theory of Markov random processes with continuous time. These results, which had a huge impact on the development of applied methods of signal processing, filtering, modeling and financial data processing, have again come into the spotlight in the 21st century due to the development of artificial intelligence and its applications. Indeed, to solve such important applied problems as increasing the resolution of images, synthesizing speech from text, generating images based on text descriptions, etc., effective generative modeling methods are required that are capable of generating objects from a distribution specified by a sample of examples. Recent advances in the field of generative modeling are precisely based on diffusion models and use the mathematical foundation laid in the last century by A.N. Kolmogorov and his followers. The report will talk about modern approaches to generative modeling based on diffusion processes and based on solving the Monge-Kantorovich problem. The connection between the solution of the entropy-regularized Monge-Kantorovich problem and the problem of constructing a diffusion process with certain extremal properties will be shown. The operation of the corresponding algorithms will be demonstrated using the example of solving various image processing problems.
Modern neural networks are an extremely resource-intensive algorithms in terms of memory, computation, and energy. This results in additional costs for their use and also limits their use on Edge devices. The report discusses the problems of inferring neural networks from a hardware and software point of view. In the first, we briefly discuss the von Neumann bottleneck problem and how to bypass it. In the second, we discuss the main neural netowrk optimization approaches such as: pruning, quantization, distillation, their variants and their combinations. At the same time, we will compare modern AI systems with the brain and explain the reasons for the greater efficiency of the brain. We will show that the most effective approaches to optimizing AI systems use (in some sense) brain-based principles.
undefined
18:00 – 18:15
Break
18:15 – 19:00
Parallel sessions
AI in industry
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
What can go wrong?
LLM - multiagent models
Hall 4
Startup success prediction and VC portfolio simulation
Cyril Shtabtsovsky,
Aloniq
Details
Mark Potanin,
Xihelm
George Kekelidze, IITech
Moderator
Anastasia Semyonova, Smile2impress
Moderator
Victor Nosko, FractalTech
Multi-agent approach in solving mathematical problems with arithmetic reasoning
Details
Sergey Shumsky,
Adam & Eva Inc.
Neuro-symbolic dialog intelligence for cheap
Details
Machine learning in metallurgy
Details
Dmitry Muravev, MMK Informservice
Ilya Makarov,
MISIS, ITMO
LLM4Anything: Multi-Agent Personalized Large Language Model Agents
Alexander Krainov, Yandex
Moderator
Evgeny Burnaev, Skoltech
Dmitry Vetrov,
Constructor University
Ivan Oseledets,
AIRI
Mikhail Burtsev,
London Institute for Mathematical Sciences
Tatiana Shavrina,
Snapchat
Aleksei Shpilman,
Gazpromneft
Details
Transformers and synthetic data for defect detection on conveyor belts
Details
Oleg Kartashev,
Severstal Digital
We will show you how we use and improve transformers and which kind of algorithms we use to generate synthetic data to predict rare defects on conveyor belts
1. Who are the mathematicians in the MMK? 2. What kind of problems they are solving by using different methods and instruments? 3. What is the economic effect of the application of the mathematical modelling? 4. Why mathematical modelling is related to the machine learning? 5. What kind of the case studies we have already carried out?
We will talk about the ability of Large Language Models (LLMs) to provide personalized business-oriented communication with the help of agents. We also tackle the problem of finetuning and adding new modalities for practical applications. Finally, we formulate core challenges and approaches for building applications over LLMs.
We propose a novel, multi-agent approach to solving mathematical reasoning problems. LLMs have shown significant progress in solving math problems, but they have fundamental limitations and do not achieve high quality solutions consistently. In the proposed approach, agents self-organise to create a strategy for solving the problem on the fly, thus achieving robust solutions to a whole class of mathematical arithmetic reasoning tasks.
A new neuro-symbolic architecture of large language models is presented. It combines unsupervised learning and reinforcement learning and requires several orders of magnitude less computing for learning compared to neural network language models. The complexity of learning in the proposed architecture increases linearly with the size of the data, in contrast to the quadratic dependence in neural network models of the language.
We explore predicting startup success using CrunchBase data and deep learning. Our model forecasts milestones like IPOs, unicorn status, and M&A for Series B and C startups. We achieved 14x capital growth (98th percentile of VC funds), identified high-potential startups, and stress the importance of diverse data for accuracy. This work shows deep learning's promise in startup success prediction.

CV, RL and AGI Day

Thursday, March 7
09:00 – 10:00
Registration
10:00 – 11:30
Plenary session 3 - overviews
Main conference hall
10:00 – 10:10
Opening of the second day
Igor Pivovarov, OpenTalks.AI
10:10 – 10:50
Computer Vision - the main things happened in 2023
Artsiom Sanakoyeu
Senior Research Scientist, Meta AI
In this talk I will spotlight the year's most exciting papers and advancements in Computer Vision. From novel scaled-up architectures that boosted the recognition capabilities, to the strides made in self-supervised pre-training that unlock new levels of understanding without extensive labeled datasets. We'll explore the fusion of vision and language in multimodal systems, demonstrating how these combined inputs enhance machine perception. The talk will also cover the latest in fine-grained tasks, including segmentation, detection, and tracking, showcasing the precision and detail now achievable. Plus, discover the role of generative models in visual representation learning, and their application in tasks like segmentation and depth estimation, setting new research avenues.
10:50 – 11:30
Reinforcement learning - main things happened in 2023
Aleksei Shpilman
Head of AI Development Programs, Gazprom neft
Reinforcement learning in 2023 - a year in review.
We are going to go through the most important, the most interesting and a couple of just fun papers.
11:30 – 12:00
Break
12:00 – 13:00
Parallel sessions
Computer Vision in Healthcare
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
Computation Optimization
LLM: language models
Hall 4
Reinforcement learning
AI decreases time and increases recall during routine CT examination
Anvar Kurmukov,
AUMI.AI
Arkady Sandler,
True Click Technologies
Moderator
Stanislav Moiseev, Tinkoff
Moderator
Details
Large Language Model Fine-Tuning Acceleration with Data Reduction via Losses
Alexander Demidovsky,
Huawei RRI
Fast Implementation of the Node2Vec Algorithm
Polina Plastova,
YADRO
Foundation models in medical imaging.
Evgeny Sidorov,
Third Opinion Platform
Details
Anastasia Semyonova, Smile2impress
Moderator
Overview of Federated Learning Methods
Denis Afanasyev, CrossOverMarkets
Human-AI interaction in healthcare
Ilya Pershin, Innopolis
Details
Automated system for analysis if OCT retina images development and testing
Kirill Aksenov,
LLC PREDICT SPACE
Details
Yury Chernyshov,
CyberLympha
Multi-Agent Reinforcement Learning - overview
Anton Plaksin,
Yandex Research
Reinforcement Learning in Zero-Sum Differential Games.
Details
Details
Details
Details
Details
Andrey Filchenkov,
ITMO University
Moderator
Deep Reinforcement Learning-based Congestion Control for File Transfer
Alexander Blokhin, Huawei
Details
Vitaly Kalev,
Huawei
Pavel Braslavski,
Nazarbayev University
You Told Me That Joke Twice: A Systematic Investigation of Transferability and Robustness of Humor Detection Models
Details
Linguistic and logical structures for text analysis
Dmitry Ilvovsky,
HSE
Sergey Kuznetsov, HSE
Details
Maria Tikhonova,
SberDevices, HSE
mGPT: LLM speaking 61 languages including Georgian and Russian
Details
Neovascular age-related macular degeneration (n-AMD) is a form of AMD that is responsible for most cases of severe vision loss. Anti-VEGF therapy, which is the gold standard for the treatment of this pathology, is accompanied by OCT monitoring. However, this process is hampered by the lack of methods for accurately quantifying OCT images. The aim of this study is to develop and evaluate the accuracy of the automated calculation of the quantitative characteristics of PED, SRF and IRF biomarkers. The study material included OCT B-scans of patients with n-AMD and pigment epithelial detachment who underwent anti-VEGF therapy from 2014 to 2021. OCT B-scans obtained from a CirrusHD-OCT 5000 Carl Zeiss Meditech device. The neural network for OCT image segmentation was trained on a dataset including 251 and 385 images from Experiments 1 and 2, respectively. The images were annotated by experts highlighting PED, SRF and IRF biomarkers using Labelme software. Data preprocessing included image resizing, normalization, and conversion to grayscale format. The data set was divided into training and validation. To segment retinal structures, the UNET architecture with the Adam optimizer and the Categorical Cross-Entropy loss function was used. The algorithm for calculating quantitative biomarker characteristics was based on edge detection using the method of Satoshi Suzuki and KeiichiA be. Testing data set for access the efficiency of system that included algorithms for segmentation and calculation of quantitative characteristics of biomarkers, included 241 images for which the length and height of the PED were measured by a physician using built-in software. Also, the image data were marked with respect to 3 anatomical treatment outcomes: attached PED; non-attached PED; PED tear. The developed method for processing OCT images made it possible to segment the biomarkers PED, SRF and IRF with high accuracy. The segmentation model shows the best results for PED (0.9), but also shows good accuracy for SRF and IRF (0.72 and 0.69) with increasing number of training data in experiment 2. Automated algorithm for calculating quantitative characteristics of biomarkers on the test set data from patients with n-AMD showed no statistically significant difference when comparing measurements with a physician. The study also showed that the attached and non-attached PED groups were statistically significantly different regarding the height, extent and area of the PED. In addition, IRF area may also be a predictor of PED tear, since its values are statistically significantly different for groups 2 and 3. Thus, automated segmentation and calculation of biomarkers can achieve performance comparable to an ophthalmologist in assessing the quantitative characteristics of biomarkers in cases of neovascular macular degeneration.
The main part of the presentation will address the problem of effective planning of radiation therapy. For planning, it is necessary to segment a large number of anatomical structures. The task of segmentation is complicated by the fact that 1) three-dimensional medical images are used and 2) the organs of patients are abnormal. For these reasons, the results of automatic segmentation require manual corrections. An approach will be presented to optimize the segmentation correction process in real time based on information about the doctor's view. In the additional part of the presentation, the problem of interpretability of deep models will be considered.
Radiologists dedicate more than half of their diagnostic time to in- terpreting computed tomography (CT) scans, with chest and abdominal scans being particularly detailed and time-intensive due to the need to meticulously identify and describe a variety of diseases. Our cutting-edge product simultaneously analyzes 10 different diseases in these scans, in- cluding disorders affecting the lungs, heart, bones, and abdominal regions. In this study, we demonstrate how introducing an AI-assisted study pro- vides a substantial time-saving advantage and lessens the heavy workload currently borne by radiologists. Specifically, it saves up to 20% of the time spent on CT examinations (≈ 2.5 mins on average), and increases the average recall by over 29%, while preserving the same level of positive predictive value.
In this talk we will describe the challenges congestion control for file transfer has, propose an implementation of congestion control algorithm based on Reinforcement Learning techniques and show how it was applied in real life
Over the past years, foundation models and LLMs have demonstrated enhancements in measurable aspects and the development of new qualitative features, creating a need for their comprehensive evaluation and analysis of the associated risks. To address these issues, we present MERA, a new instruction benchmark for evaluating foundation models oriented toward the Russian language. The benchmark encompasses 21 evaluation tasks for generative models. The talk presents the new evaluation methodology, an open-source code base for the MERA assessment, a leaderboard with a submission system, and the evaluated baselines' results.
This presentation aims to provide a comprehensive overview of Federated Learning, highlighting its recent developments, applications, and trends as of 2023. Federated Learning, a rapidly evolving field in machine learning, involves training algorithms across decentralized devices or servers while keeping data localized. The talk will commence with a brief introduction to Federated Learning, elucidating its core principles and significance.

Following this, the presentation will delve into various key cases and application areas, demonstrating the practical utility and versatility of Federated Learning in diverse sectors. A significant portion of the talk will be dedicated to discussing the advancements in this domain over the course of 2023. This examination is grounded in a thorough study of the general informational landscape on this topic, encompassing an analysis of thematic conferences, academic publications, updates to open-source tools, and GitHub repositories.

Additionally, the presentation will showcase a curated collection of news from companies developing solutions in this area, aiming to provide insights into the business and technological implications of these developments. A critical evaluation of the maturity level of Federated Learning technology will be offered, assessing its readiness for widespread adoption. This assessment will touch upon the challenges faced, potential risks, and the future prospects of Federated Learning, providing a well-rounded perspective on its current state and future trajectory.
Node2Vec is a widely used algorithm for learning feature representations of the graph nodes. This algorithm is intensivelly used in multiple highload applications. Thus its performance is very important. There are two reference implementations of the Node2Vec in C++ and Python from Stanford Network Analysis Project (SNAP). However, their performance is not optimal. We introduce an optimized implementation of the Node2Vec algorithm, which performance is 2.5-5.1 times higher than the reference ones. We also prove that the accuracy of the optimized algorithm stays the same by solving a multi-label node classification problem on several datasets.
Linguistic and logical text structures are very useful for some applied tasks like dialogue generation, argument mining and fact verification. We will consider several cases of such tasks: multi-party dialogue generation by means of discourse structure and also fact correction based on information retrieval combined with logical reasoning.
Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations, and we also indicate conditions when this Q-function can be decomposed. Based on these results, we present the Isaacs Deep Q-Networks (IDQN) and Decomposed Isaacs Deep Q-Networks (DIDQN) algorithms, respectively. We analyze their performance by comparing them with other baseline RRL and Multi-Agent RL algorithms. We consider both simple environments with known accurate solutions and complex large-dimensional MuJoCo environments. In each experiment, we thoroughly evaluate the agents' policies obtained after learning, training opponents against them using various RL algorithms with various parameters. The experiment results demonstrate the superiority of the presented algorithms in all experiments under consideration.
As industry needs to process growing amounts of training data, reduce the cost of fine-tuning a single model, and minimize the environmental effects, the task of accelerating the fine-tuning of large language models (LLM) has become extremely demanding. DAREL is a novel training data reduction method that operates with training samples based on losses obtained from a currently trained model or a pre-trained one. The proposed method is devoted to Large Language Models fine-tuning and is designed primarily to be combined with Parameter-Efficient fine-tuning methods, such as LoRA. The results of computational experiments provide compelling evidence of the enhancement of the fine-tuning quality and time of Large Language Models. DAREL allows an average 1.26x fine-tuning acceleration for GPT2-S, GPT2-M and GPT2-L on a variety of datasets, including E2E-NLG, DART and WebNLG, with an average BLEU drop of 1.44 p.p.
Automatic humor detection is a highly relevant task for conversational AI. To date, there are several English datasets for this task, but little research on how models trained on them generalize and behave in the wild. To fill this gap, we carefully analyze existing datasets, train RoBERTa-based and Naïve Bayes classifiers on each of them, and test on the rest. Training and testing on the same dataset yields good results, but the transferability of the models varies widely. Models trained on datasets with jokes from different sources show better transferability, while the amount of training data has a smaller impact. The behavior of the models on out-of-domain data is unstable, suggesting that some of the models overfit, while others learn non-specific humor characteristics. An adversarial attack shows that models trained on pun datasets are less robust. We also evaluate the sense of humor of the chatGPT and Flan-UL2 models in a zero-shot scenario. The LLMs demonstrate competitive results on humor datasets and a more stable behavior on out-of-domain data. We believe that the obtained results will facilitate the development of new datasets and evaluation methodologies in the field of computational humor. We've made all the data from the study and the trained models publicly available.
Reinforcement Learning is used for solving of different problems and tasks in different subject areas (traffic control, behavior modelling, SW testing, cybersecurity etc.). There are a lot of real-world tasks when a single agent have to cope with other agents (to coordinate or compete) and multi-agent systems (MAS) is used for such situations. High-dimensional RL-MAS environment causes "curse of dimension" problem and deep learning helps to solve this problem efficiently. This presentation covers some examples of using RL and DeepRL for multi-agent systems.
We will discuss why we decided to combine multimodal networks, unlabelled data, and a fresh perspective on the DICOM format into a single fundamental model. We'll explore what this has brought us and why the future lies in this direction.
Alexey Trutnev,
Huawei RRI
Regina Gareeva,
AUMI.AI
13:00 – 13:15
Break
13:15 – 14:00
Parallel sessions
Neuromorphic Computing
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
Generative AI: LLMs for Science and Industry
Generative AI: diffusion models
Hall 4
Datasets, markups and testing
Stanislav Moiseev, Tinkoff
Moderator
Dmitry Vetrov,
Constructor University
Diffusion models: different viewpoints and perspectives
Nikita Andriyanov, Financial University
Moderator
Details
Neuromorphic Approach to Reinforcement Learning
Mikhail Kiselev, Kaspersky
Details
Spiking Neural Networks on a Neuromorphic Chip: Hardware-Specific Design with Safety in Mind
Oleg Vygolov,
Kaspersky
Details
Andrey Lavrentyev,
Kaspersky
Overview of the direction of neuromorphic AI systems
Denis Larionov,
Cifrum
Details
Overview of methods and tools for generating synthetic datasets
Roman Kutsev, Training Data.Pro
Details
Why do ML system failures come as a surprise?
Evgeny Nikitin, Celsus
Details
Anonymization of personal data. ML methods and approaches.
Alexander Platonov,
Smart Solutions
Details
Mikhail Hushchyn, HSE
Generative AI in Science and Industry

Details
Andrey Ustyuzhanin,
Constructor University, NUS
HypoFinder: Streamlining Scientific Discovery with an AI-Driven Tool for Formalism Selection, Hypothesis Generation, and Automated Research Synthesis
Details
Denis Fedoseev, WorldQuant
Moderator
Diffusion models became state-of-the-art tool for generative modelling. However they remain underexplored and the reasons for their success are not fully understood yet. In the talk we will consider them from different perspectives and discuss how basic model can be extended based on those perspectives. In particular we will try to understand what elements of diffusion model are crucially important and what can be omitted without loss in quality.
Spiking Neural Networks (SNNs) implemented on neuromorphic processors is one of the promising approaches to creating energy-efficient, high-performance, and autonomously adaptive Edge AI. The development of such systems is influenced by the binary and event-driven nature of data in SNNs and the "near-memory computing" principle implemented in neuromorphic processors. This talk explores methods and key implementation features of SNNs on the neuromorphic chip "AltAI" using the Kaspersky Neuromorphic Platform. An interesting aspect of hardware-oriented SNN topologies, such as resilience to common AI attack methods, is discussed. As a proof-of-concept, a neuromorphic detector for adversarial attacks on a facial biometric identification system is presented.
The report will provide an overview of current trends in the direction of neuromorphic artificial intelligence systems. In terms of neuromorphic properties (connectionism, parallelism, asynchrony, pulsed nature of information transfer, on-device learning, local learning, sparsity, analog computing and in-memory computing), the most striking projects in the world will be considered. Particular attention will be paid to new products of the last year - IBM NorthPole, the second generation of Akida, chips based on memristive computing.
Calculate metrics on a test dataset, test API performance - is this enough to ensure reliable operation of medical CV systems in production? Obviously not, especially for offline work scenarios. So in this report I will tell you about the most dangerous and frequent errors in the testing process that we have encountered in 5 years of development
We will talk about the ML approaches we have used to anonymize the personal data of a 15 tb database of a large medical company, which will allow us to train new algorithms in the diagnosis and treatment of diseases.
Since medicine is the most difficult case in the field of anonymization and contains 3 times more heterogeneous unstructured data, complex PD patterns. Our methods may be of interest to a large number of participants in various industries such as banks, retail and telecom.
Spiking neural networks (SNN) and non-von-Neumann massively parallel computers are considered as a theoretical and hardware basis for creation of so-called neuromorphic intelligent systems, which are more similar to human brain functioning than traditional artificial neural networks (ANN) based on deep learning. First of all, instead of the biologically implausible error backpropagation algorithm, SNN learning is based on the local synaptic plasticity rules stipulating that synaptic connection strength dynamics only depend on activity and properties of the neurons connected. The local learning algorithms are possible in completely asynchronous SNNs – when the SNN is a huge ensemble of independent simple computational units. This feature and the explicit inclusion of the temporal dimension in the SNN functioning in the form of neuron state dynamics and non-zero spike propagation time explain the growing attention to SNNs from the viewpoint of reinforcement learning (RL) tasks – because this learning regime requires continuous learning and should work under conditions of significant time gaps between network decisions and their evaluations. In this presentation, the SNN structures, neuron and synaptic plasticity models used for RL are considered. The specific SNN-based mechanisms of the model-free and model-based RL are analyzed. The efficiency of implementation of the SNNs used for RL on modern and future neuroprocessors is estimated.
The emergence of generative AI, improved quality of 3D rendering and modeling significantly expand the use of synthetic data for training ML models. In the overview report, I will tell you how things are at the beginning of 2024: - which methods prevail in the market - pros/cons of each method - examples of generations - assessment of the prospects of methods
In this talk, we introduce HypoFinder, an innovative tool utilizing a state-of-the-art Large Language Model to fundamentally enhance the initial phases of scientific inquiry. We will showcase HypoFinder's robust capabilities, starting with its automated formalism selection, vital for crafting solid hypotheses. Our exploration extends to the tool's ability to compose meticulous research plans informed through the meticulous analysis of a curated body of scholarly articles. We will spotlight HypoFinder's powerful background search function, which automates the extraction and summarization of information from relevant papers, thus equipping researchers with concise, essential knowledge of their field's current and foundational works. The talk will provide insight into the LLM technology propelling HypoFinder, with demonstration cases including the search for new solid ion electrolyte materials—pivotal in battery technology—and the formulation of a winning strategy for pinewood derby, showcasing HypoFinder's versatility across diverse research scenarios. We will discuss practical implementations, reflecting on how automation like HypoFinder could remodel scientific creativity, efficiency, and collaboration, reshaping the future of scientific endeavors.



Generative AI is one of the cutting-edge areas in machine learning. It is primarily associated with images and have gained worldwide popularity thanks to networks for image creation: Dall-E, Stable Diffusion, and Midjourney. But in this talk we will not talk about images. Generative models have also been widely used in natural sciences and industrial applications. We will consider several cases in astronomy and high-energy physics experiments at the Large Hadron Collider. Finally, we will discuss how generative models are used to model the behavior of complex systems and data anonimization.
GigaSearch or Search Engine on GigaChat
Details
Prohor Gladkih, SberDevices
I'll tell you how we combat hallucinations and data obsolescence in GigaChat using the RAG (Retrieval-Augmented Generation) approach
14:00 – 15:00
Lunch
15:00 – 16:00
Parallel sessions
Startups
pitches
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
Intellect for robots and drones: LLM and planning
Computer vision - academic talks
Hall 4
Neuromorfic and neural networks
Alexander Rezanov
Rask AI
Moderator
Roman Doronin, Bioptic.ai
Moderator
Mikhail Koselev, Kaspersky Lab
Moderator
Towards Unified Intelligence: Next-Generation Robotics Brain for Humanoids
Alexey Postnikov,
Sber Robotics Lab
LLM-based agents and their planning capabilities
Aleksey Kovalev, MIPT
Nonlinear dynamics and machine learning for computational neuroscience and vice versa
Oleg Maslennikov,
IAP RAS
The brain's oscillatory system is reconfigured as a whole to perform a different cognitive task
Viktor Vvedensky,
Kurchatovski NRC
Neuromorphic structures
Mikhail Zheludev, Bosch
Efficient Video Classification Algorithms and Facial Expression Recognition
Andrey Savchenko,
Sber AI Lab
Details
JPEG AI and artifacts - problems of implementing neural network algorithms
Dmitry Vatolin,
Institute of AI MSU
Details
Real-time Face Recognition: methods of training fast and accurate models for inference on mobile devices
Vadim Selyutin,
VisionLabs
Details
Details
Details
Details
Details
Details
Athletic Intelligence of an Anthropomorphic Robot and Why It's Difficul
Egor Davydenko,
MIPT
Details
Memory-Driven Robotics: A New Paradigm for Trajectory Planning
Polina Fedotova,
Sber Robotics Lab
Details
Natalia Podsosonnaya, Skoltech
Moderator
A new JPEG AI compression standard will be released in 2024. The author, one of the few in Russia, is a member of the standardization committee. The opportunities and problems of the new standard will be discussed.
In this talk, I will present the high computational complexity problem for the frame-wise video classifiers. I will overview known efficient algorithms, such as AdaFrame, LiteEval, FrameExit, OCSampler, etc. Moreover, I will present the novel approach presented at the ICML 2023 conference based on the ideas of sequential analysis and adaptive frame rate.Текст или голосовое сообщение оценивается по количеству переданной информации. Однако живому читателю или слушателю понятно, что эти сообщения несут определенный смысл. Смысл – это нечетко определенное понятие с неясным масштабом. Минимальным же сообщением передающим смысл принято считать слово. Слова могут быть близкими и далекими по смыслу, причем одно слово может иметь разные смыслы. Мы использовали близость слов по смыслу в качестве меры для построения вмещающего пространства для всех глаголов и прилагательных русского языка. В результате оказалось, что тезаурусы глаголов и прилагательных могут быть отображены на компактные круглые области, расположенные на двумерной поверхности. Каждая из этих областей делится на три примерно равных сектора, вмещающих слова с положительным, нейтральным и отрицательным эмоциональным содержанием. Внутри этих секторов размещаются группы слов, выстроенных в упорядоченные списки, относящиеся к определенной семантической категории. Размеры списков плавно меняются от десятков слов до нескольких слов в группе. Мы полагаем, что этот результат отражение того, как устроена словарная память в коре мозга человека.
The most accurate face recognition solutions are based on large deep learning models. For successful product deployment on mobile platforms in conditions of limited computational resources, these models must not only be precise, but also fast and lightweight. In this talk, we will examine the following issues: 1) How to choose a modern compact architecture with the best balance of speed and accuracy? 2) What difficulties may arise with distributed face recognition model training on datasets with millions of images and hundreds of thousands of classes? 3) What methods of transferring knowledge from large models to smaller ones can minimize the loss of accuracy due to the architecture size reduction?
The talk will give an overview of the state of the art in the field of intelligent LLM-based agents, with special emphasis on the behavioral planning capabilities of such agents. The results of using the LLM-based agent for the task of controlling a real robotic platform will be considered.
The human brain continuously performs elementary cognitive tasks, and it is unclear how the cortical neural networks interact with each other in doing this. We set up two similar experiments on recognition of spoken words simple visual patterns. This can be attributed to the study of the execution of elementary cognitive tasks. We see that in the process of making a decision about which word was heard, about a dozen different places of the cortex synchronously stop running processes at the moment the button is pressed. Most likely, it is just the coordinated activity of these neural populations that triggers this button press. To further advance this research, one needs methods from artificial intelligence which fragment the audio stream into separate words for speech processing systems. The encephalogram can also be presented as a chain of episodes that obviously perform certain functions. The episodes are probably analogues of words for the internal communication between different areas of the brain, workin
This article examines the problems that prevent the full realization of the potential of neural networks, and proposes a new non-traditional learning approach based on the use of neuromorphic structures. Neuromorphic structures do not require an error back propagation algorithm, due to the introduction of a "generalized perceptron" that increases the input dimension according to the data topology of each layer, implementing an anti-diffusion dimensionality increase algorithm. This algorithm does not require the use of gradient descent on each layer, which makes it possible to avoid getting stuck in local minima of the minimized error functional.
In this talk, I am going to review recent results at the intersection between machine learning, nonlinear dynamics and computational neuroscience. Networks of coupled model neurons are a traditional tool for studying emerging phenomena underlying sensorimotor and cognitive processes in computational neuroscience. These models until recently have been designed heuristically and have been usually studied using approaches developed in the nonlinear dynamics community. Machine learning stemming from theoretical neuroscience, has achieved impressive success while developing as an independent field. Nowadays, it influences a variety of disciplines including its predecessor - neuroscience. Next generation models in computational neuroscience take inspiration from machine learning to explain basic principles of neurcognitive phenomena based on traditional and newly developed methods of nonlinear dynamics, network science and data science. I will discuss a series of modern frameworks in computational neuroscience and illustrate them by several models mainly in the form of recurrent neural networks which are used for uncovering dynamic and population mechanisms of neural computations.
Thesis: We will delve into our strides in developing a next-generation general-purpose brain designed for (but not limited to) humanoid robots. The research encompasses next components: 1. Task Planner: Employing LLM/VLM based models, system decomposes human-provided tasks, ensuring integration with the robot's understanding of its surroundings. It dynamically adjusts plans based on real-time video feedback, enhancing adaptability to changing environments. 2.Manipulation Model: The transformer\diffusion based imitation learning model interprets video input from robot sensors and task instructions from the Task Planner. It generates action trajectories, empowering humanoid robots to execute tasks with human-like precision and efficiency. 3.Navigation Model: Our multimodal navigation model, with a Navigation Planner, orchestrates humanoid robot movement from point A to B, incorporating language-conditioned navigation, mapping, localization, target search, global route planning, and waypoint generation. In our presentation, we will delve into recent advances in whole-body control. This component optimizes trajectories from manipulation and navigation models, ensuring stability, preventing falls, and maintaining the required velocity and orientation during movements. In this presentation, we share our vision for the future of robotics intelligence, emphasizing recent breakthroughs in each module. Beyond showcasing individual advancements, our goal is to initiate a discourse on seamlessly integrating these modules into a unified brain. Our vision goes beyond isolating separate models of task planning, manipulation, and navigation, toward a harmonious architecture that transforms humanoid robots into truly intelligent entities.
undefined
This presentation presents a comprehensive exploration of innovative methodologies aimed at refining the capabilities of robotic manipulators through the integration and adaptation of advanced data models. Initially, we delve into the critical role of historical data, emphasizing its significance in formulating behavioral cloning policies. By methodically integrating past sensor-visio-motor data into prediction algorithms, this research demonstrates substantial improvements in the generation of future trajectories for robotic manipulators. Further, we introduce the employment of Recurrent Memory Transformers in robotic models, showcasing their effectiveness in capturing and utilizing historical action data. This approach significantly boosts the predictive accuracy and reliability of the robot's future action policies, marking a substantial advancement in robotic cognitive functions. In a groundbreaking exploration, we adapt diffusion models, primarily designed for image synthesis, to the field of robotic trajectory forecasting. This segment focuses on the innovative integration of covariance matrices within these models, enabling precise predictions of the probabilistic distribution of noise in future robotic trajectory paths. The thesis also addresses the limitations inherent in deploying conventional, pretrained large vision models within robotic contexts. It highlights the inefficiencies in feature extraction, particularly due to the high similarity of sequential robotic imagery, and underscores the necessity for more specialized vision models tailored to the unique demands of robotic applications. Conclusively, the research synthesizes these advanced techniques, illustrating how their integration significantly enhances the generalization capabilities of manipulation diffusion models. This integration fosters substantial advancements, paving the way for more accurate, adaptable, and efficient robotic manipulation across a wide range of practical scenarios. The findings of this thesis are poised to contribute profoundly to the field of robotics, offering novel insights and robust solutions for complex manipulation tasks.
Kirill Shtabtsovsky, Aloniq
Jury
Tamaz Khunjua,
Rpv VC Fund
Jury
Ilya Partin, Brayne.vc
Jury
Dmitry Stepanov, Yandex, Armenia
Jury
16:00 – 16:15
Break
16:15 – 17:00
Parallel sessions
AI
and education
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
Intellect for robots and drones: CV and navigation
Generative AI: manipulation and detection
Hall 4
AI
in internet
Autonomous system for detecting attacks on industrial networks: parsing unknown protocols and anomaly detection using system model construction
Alexey Sinadsky, CyberLympha
To be announced later
Moderator
Details
Integrating machine learning into web firewalls
Aleksander Kozhevnikov,
UDV group
Details
Prohor Gladkih, SberDevices
Moderator
From semantic to multimodal maps in robotics
Dmitry Yudin,
MIPT, AIRI
Details
3D CV in energy systems monitoring
Javid Sadreddinov, Innopolis
Details
Recent Advances in Autonomous Driving
Aleksey Voropaev, SberAutoTech
Details
Roman Gorbachev, MIPT
Roki X - a tabletop educational platform for research on anthropomorphic robots
Details
Nikita Andriyanov, Fin. University
Moderator
Andrey Filchenkov,
ITMO University
Moderator
The path from university to company:
How corporations cultivate talent together with higher education institutions
Alexander Sakhnov,
X5 Tech
Details
The psychometrics of large language models
Irina Piontkovskaya,
Huawei Noah Lab
Details
Social engineering AI: Human Persuasion Techniques in Attacks LLM
A Reworr,
Deteact
Details
Aleksey Korepanov, Kontur
Implementing a realtime audio processing model in the Web
Details
Recently we, at SKB Kontur, have started to work actively with the implementation of various ML models in the Web. It is assumed that the ML model will work on the user's computer. Therefore, I would like to tell you about the challenges and problems we have encountered, as well as how we have overcome them.
In the report we would like to highlight two main topics:
ML topic. How to adapt a regular torch model to run on the Web for realtime audio processing, what are the ways to run the model on the Web.
Frontend topic. What web-architecture to choose for running such models, what each architecture is convenient and what problems they have. To highlight also what limitations are put for ML model in conditions of necessity of realtime audio processing.
Black and white lists are required for WAF to work. White lists are difficult to create, and black lists often cause false developments. We suggest using ML to solve these problems.
The paper presents a method for restoring the structure of network traffic from a stream of bytes to understandable fields. The developed method allows to analyze and protect systems without the need to obtain specifications of used network protocols. A method for detecting anomalous network activity based on modeling the system in terms of hybrid automata is presented. Data transmitted over the network are processed, combined into states for which acceptable boundaries and directions of change are defined. Anomalies are detected as a deviation of the real (observed) behavior of the system from the expected (modeled) one.
The report will present a new approach to the development of driverless driving technologies based on the use of neural networks. Modern advances in artificial intelligence open up opportunities to replace traditional, extensive code bases for autonomous driving with more compact, efficient and adaptive systems. This approach promises not only to simplify and accelerate the development of self-driving cars, but also to significantly improve their safety and reliability.
Currently, in intelligent robotics there is a transition from semantic maps to multimodal ones, which can contain not only information about obstacles and their types, but also data with text descriptions of objects or even the sound they made. The report will discuss modern neural network methods that make it possible to construct such maps. We will also discuss original algorithms and examples of their use on open datasets, data from a photorealistic simulator and a real mobile robot. Let us note the prospects for their use for navigation problems and the challenges for running similar approaches on embedded devices.
Monitoring of energy infrastructure stands as a crucial responsibility for companies, requiring substantial resource allocation to address effectively. Power outages result in financial losses, but AI, combined with drone technologies, can provide early diagnosis of objects. Such solutions are becoming an integral part of the operational assets of energy companies. We will discuss how the implementation of AI has impacted the monitoring of power lines and other objects using 3D Computer Vision algorithms. Additionally, an analysis of the delineation of security zones, including the vegetation, will be presented to enhance the overall system security.
The report will introduce for the first time the advanced educational and research platform for humanoid robotics, Roki X, developed at the MIPT laboratory in collaboration with the Starkit team.
The platform consists of a humanoid robot built on proprietary servomotors, circuitry and architectural solutions, a simulator, and a library of motion control and computer vision algorithms, as well as an API that provides user access to the platform. The platform is designed for high school students and undergraduates, and also allows to participate in international competitions such as RoboCup and FIRA.
In the report, we will examine the complete path from the academic bench to the first job, both from the perspective of the student and the company. We will discuss how much fundamental education meets the needs of companies, how the synergy between corporations and universities helps to bridge the gap. We will also discuss why organizations want to work with universities.
How LLMs are investigated using psychometrics (i.e., human psychology) methods, and what is gained by doing so
This study examines the social engineering and psychology in the LLM security. Drawing from social science research, we consider the parallels between deceiving humans and deceiving LLMs, aiming to understand the potential for such models to be influenced or 'jailbroken'.
17:00 – 17:30
Break
17:30 – 18:45
Plenary Session 4
Emergence in foundation models -
is this a path to AGI?
Main conference hall
17:30 – 18:00
Emergence in artificial neural networks
Tatiana Shavrina
Snapchat
The talk will give an overview of the LLM evaluation techniques used to examine general performance, safety, and various human-centered metrics.
Both AI alignment methods and the experiments on verifying the emergence in LLMs are reviewed through the lens of corpus linguistics and meta-analysis of 20+ existing studies. In the end, is there actually evidence that emergent properties are not just data leaks?
18:00 – 18:30
Emergence in natural neural networks
Prof. Konstantin Anokhin MD, PhD, FRAS
Institute for Advanced Brain Studies, MSU
Emergence is the development of higher-order phenomena from the low-order ones. Classical examples are the emergence of life or the emergence of mind and consciousness in the developing nervous system. Neural hypernetwork theory aims to explain the latter case. From its first principles follow three forms of emergence, the most important of which is the downward emergence. Due to it, the individual elements of the neural network acquire properties inherent to the whole cognitive agent. It is surprising that artificial neural networks, despite their vast differences from natural ones, demonstrate the same emergent property. The talk will examine the mechanisms of deep emergence in the brain neural networks and discuss possible parallels with the appearance of this phenomenon in artificial neural networks.
18:45 – 19:00
Conference closing
Main conference hall
20:30 – 22:00
Networking afterparty & live music
You will have a great opportunity to network with speakers and other participants, have nice meal and wine and listen to live music from rock bands from AI industry companies!

Afterparty will be in the secret location. Transfer will be organized for all participants after conference closing.
The day is dedicated to new experiences and networking!

First half of the day: excursions to the old capital of Georgia - Mtskheta and the temple and Svetitskhoveli or to Tbilisi by your choice.

Afternoon: we will take a real journey into the world of Georgian wines and hospitality! Let's go to the wonderful winery, see how Georgian technology (aging wine in qvevri) differs from European technology (aging wine in stainless steel and/or oak barrels) and try a lot of white and red wines in excellent company! After that you can buy wine you like to take with you.

Both activities are subject to additional payment. Registration for the excursion and journey to the winery will be announced additionally, a newsletter will be sent to conference participants. You can also to go both events +1.

If you did not receive an email from orgcommittee about registration to excursions, please write us to org@opentalks.ai with topic Excursions
After the conference you can join several folks, going for mountain skiing and snowboarding to Goudauri, the best resort for it in Georgia. The great company in the morning and in the evening as well as on the slopes! This is perfect ending of the conference!

Accomodation and skipass are subject to additional payment. To register for this activity, please write to org@opentalks.ai with topic "Mountain skiing"