OpenTalks.AI /
6-7 March 2024
Tbilisi

OPEN CONFERENCE ON
ARTIFICIAL INTELLIGENCE

Agenda
OpenTalks.AI 2024

version from 15.02.2024
Tbilisi time, GMT+4
19:00-21:00
Welcome drinks and networking
The evening before the conference is a great time to drink a glass of wine and meet familiar faces in an informal setting!) And of course, to meet new people!

Also, this is a place where you can register and get your badge to spend less time in the queue in the morning)

Location will be announced to participants by email a day before.

LLM и GenAI Day

Wednesday, March 6
08:30 – 10:00
Registration and welcome coffee
09:00 – 09:45
Introduction to AI for beginners
Igor Pivovarov, OpenTalks.AI
Before the conference begins - a brief introduction to AI for beginners. In simple words about main technologies and applications, what is Computer Vision and Large Language Models, training and inference, what are Transformers and Attention and much more. And a brief guide to this technologies and applications appearance on the conference.
10:00 – 11:30
Plenary session 1 - NLP & LLM overviews
Main conference hall
10:00 – 10:10
Opening of the conference and first day
Igor Pivovarov, OpenTalks.AI
What will be at the conference, main ideas, numbers, accents.
10.10 – 10.50
From Language understanding to Autonomous agents: the evolving landscape of Large Language Models
Mikhail Burtsev
London Institute for Mathematical Sciences (UK)
In this talk, we will explore the rapid advancements and nuanced limitations of Large Language Models (LLMs) like ChatGPT that have revolutionized AI in the past year. The first part provides a general overview of LLMs, highlighting their proficiency in solving a broad range of natural language understanding problems. However, we will also present data showing that LLMs may lag behind more specialized traditional NLP models in certain specific tasks, illustrating the trade-off between universality and task-specific quality. We then delve into the fundamental limitations of transformer input size and discuss our innovative solution: the development of a recurrent memory transformer that sets a new record for sequence length processed by a neural network. The latter part of the talk shifts focus to the exciting potential of LLMs in creating autonomous agents, capable of independent action and decision-making. We will review popular prompting techniques like the chain of thought and tree of thought, and address the current challenges in enabling LLMs to learn and apply abstract rules, particularly in non-standard domains. This talk aims to provide a comprehensive understanding of where LLMs excel, where they falter, and the exciting possibilities and challenges that lie ahead in AI research and applications.
10.50 – 11.30
Surpassing training data: getting more from LLMs at inference time
Alexander Novikov (online)
DeepMind (UK)
Everyone agrees now that LLMs are here to stay and can interpolate the training data – collective intelligence of the internet – really well. But can they go much beyond?
I'll present an overview of recent ideas on how to approach surpassing human abilities with LLMs in different domains: code generation (things like FunSearch (Nature, 2023), AlphaCode and AlphaCodium), maths (AlphaGeometry), actions (Voyager: the minecraft agent) and reasoning (Tree of thoughts).
11:30 – 12:00
Coffee break
12:00 – 12:45
Parallel sessions
AI in legal practice
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
Recommendation systems - under the hood
Hall 4
Generative AI: multi-modality
To be announced later
Moderator
Alexander Tuzhilin, New York University
LLM Tutorials
When Variety Seeking Meets Unexpectedness: Incorporating Variety-Seeking Behaviors into Design of Unexpected Recommender Systems
Details
Tutorial - How to train large language models
Murat Apishev,
Samokat.tech
Details
Holger Zscheige,
Infotropic Media
Moderator
Exact Algorithms for Boolean Matrix Factorisation of Contranominal Scales and its Applications in Recommender Systems
Dmitry Ignatov, HSE
Details
To be announced later
Moderator
Denis Dimitrov,
MSU
Large multimodal models - the path to AGI?
Details
Irina Abdullaeva,
AIRI
AIRI multimodal event model
Details
Large Language Models are the basis of most AI products, and many companies are constantly competing with each other to train the strongest models. The process of LLM creation is non-trivial and consists of a wide range of steps and subtasks. Although there are no perfect solutions, through many experiments over the past years, researchers and engineers have identified key ideas and techniques that can help one to produce a higher-quality model with less time and resources consumption. This tutorial will cover the main aspects of modern LLMs training (data, architecture, pre-train and fine-tuning scaling and optimization, modifications and evaluation of models) and the most common practices associated with them. It is intended for audience with experience in ML and DL, but without specialization in training LLMs.
Variety seekers are those customers who easily get bored with the products they purchased before and, therefore, prefer new and fresh content to expand their horizons. Despite its prevalence, variety-seeking behavior is hardly studied in recommendation applications because of various limitations in existing variety-seeking measures. To fill the research gap, we present a variety-seeking framework in this paper to measure the level of variety-seeking behavior of customers in recommendations based on their consumption records. We validate the effectiveness of our framework through user questionnaire studies conducted at Alibaba, where our variety-seeking measures match well with consumers' self-reported levels of their variety-seeking behaviors. Furthermore, we present a recommendation framework that combines the identified variety-seeking levels with unexpected recommender systems in the data mining literature to address consumers' heterogenous desire for product variety, in which we provide more unexpected product recommendations to variety-seeking consumers and vice versa. Through off-line experiments on three different recommendation scenarios and a large-scale online controlled experiment at a major video-streaming platform, we demonstrate that those models following our recommendation framework significantly increase various business performance metrics and generate tangible economic impact for the company. Our findings lead to important managerial implications to better understand consumers' variety-seeking behaviors and design recommender systems. As a result, the best-performing model in our proposed frameworks has been deployed by the company to serve all consumers on the video-streaming platform.
In this talk, we examine certain properties of state-of-the-arts algorithms for Boolean matrix factorisation (like GreConD and IterEss), a popular technique in Data Mining with binary relational data. This greedy algorithm was inspired by the fact that the optimal number of factors for the Boolean Matrix Factorisation (BMF) can be chosen among the formal concepts of the corresponding formal context. In particular, we consider one of the hardest cases (in terms of the number of possible factors), the so-called contranominal scales, and show that the output of GreConD is not optimal in this case. Moreover, we formally analyse its output by means of recurrences and generating functions and obtain the closed form for the returned number of factors. An algorithm generating the optimal number of factors and the corresponding product matrices P and Q is also provided by us for the case of contranominal scales. In addition to algorithmic studies, we provide the listeners with a short summary of our previous results on BMF applications for Collaborative Filtering (in collaboration with E. Nenova, M. Ahmatnurov et al.) along with some recent results for Boolean tensors as well. (This is a joint work with Alexandra Yakovleva and Yazag Meziane)
Nowadays, large language models are very popular in the scientific and everyday sphere. We read news about them, see impressive video presentations of large corporations, and learn conspiracy theories that these very language models have already learned the world better than the average person. Let's try to understand what LLMs can do now, what else they can work with besides texts, and how they can help in creating a super-powerful intelligent machine. As part of the talk, I will share my research experience in this area, talk about experiments, benchmarks, and other big human challenges in open questions, as well as our OmniFusion multimodal architecture. We'll also discuss the multi-agent approach, how LLMs "communicate", Chain-of-Thought and Tree-of-Thought mechanisms, shared memory, self-reflection, and other aspects that are already worth looking into. I will also talk a bit about our research in the field of generative AI (namely Kandinsky 3.0, Kandinsky Video) - and most importantly about how to link OmniFusion and, for example, Kandinsky into a single system that can solve almost the entire range of tasks at the intersection of different modalities
In the field of event sequences, unlike computer vision (CV) or natural language processing (NLP), it is not common to use a pre-trained model to solve multiple problems at once and generalize to new ones. Existing approaches have limitations in terms of flexibility, generalization, and computational efficiency. In addition, integrating long sequences of events into neural network-based approaches remains challenging.
To address these challenges, this paper proposes a novel approach called Event Sequences Question Answering (ESQA) based on the Large Language Model (LLM). We present all event sequence based tasks in question-answering form. Moreover, we propose a generic method for encoding event sequences using a trainable coder based on the Transformer architecture. Efficient feature extraction from the coder output and a significant reduction in sequence length are achieved by using the Q-Former model as a connecting layer between the coder and the LLM.
Our empirical results show that applying pre-trained large language models to the event sequence modality in ESQA provides quality comparable to state-of-the-art approaches for a variety of prediction tasks in multi-task environments on various open-source financial datasets. In addition, ESQA has demonstrated adaptability to new tasks with quality that exceeds statistical performance.
Alexey Goncharov, Compress.ai
Effective LLM inferencing for applied tasks
Details
How to make LLM inference fast, cost-effective and customizable when running on company servers in the absence of expensive GPUs? In the talk I will share my development experience and talk about methods of scaling infrastructure under LLM and approaches to efficiency growth sandbox for experimentation.
Topic will be announced later
Thomas G. Martin,
Lawdroid, CA
Details
TBD
Alan Ragueneau,
Denton Nextlaw, SW
Topic to be announced later
Details
TBD
Anna Romanova,
MIPT
Elements of legislation for autonomous artificial intelligence systems
Details
The significant part of the operational context for autonomous company management systems is the regulatory and legal environment in which corporations operate. In order to create a dedicated operational context for autonomous artificial intelligence systems, the wording of local regulatory documents can be simultaneously presented in two versions: for use by people and for use by autonomous systems. In this case, the artificial intelligence system will get a well-defined operational context that allows such a system to perform functions within the required standards. Local regulations that provide basis for the joint work of individuals and autonomous artificial intelligence systems can form the grounds for the relevant legislation governing the development and implementation of autonomous systems.
Alexey Vasiliev, Sber
Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?
Details
Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.
Alexander Rezanov,
Rask AI
Moderator
Ilia Nenakhov,
Yandex Market
Yandex Market neural banners. Under the hood.
Details
I'll be discussing how we tackled an AdTech challenge: creating advertising banners on Yandex Market using neural networks. We'll delve into the origins of this task within advertising systems and its unique characteristics within E-commerce. Our primary focus will be on the technical details of the solution, including YaGPT and its tuning for specific tasks, ptune, SAM architecture, and its optimization for performance. Additionally, I'll cover the runtime design and the entirety of the production process, shedding light on the difficulties we faced and the outcomes we achieved. To address these challenges, this paper proposes a novel approach called Event Sequences Question Answering (ESQA) based on the Large Language Model (LLM). We present all event sequence based tasks in question-answering form. Moreover, we propose a generic method for encoding event sequences using a trainable coder based on the Transformer architecture. Efficient feature extraction from the coder output and a significant reduction in sequence length are achieved by using the Q-Former model as a connecting layer between the coder and the LLM.
Our empirical results show that applying pre-trained large language models to the event sequence modality in ESQA provides quality comparable to state-of-the-art approaches for a variety of prediction tasks in multi-task environments on various open-source financial datasets. In addition, ESQA has demonstrated adaptability to new tasks with quality that exceeds statistical performance.
12:45 – 13:00
Break
13:00 – 14:00
Parallel sessions
Business solutions based on LLM
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
CUDA Tutorial
Mathematics and AI
Hall 4
Poster session
Implementing AI into office software
Anna Pleshakova, OnlyOffice
Details
RAG and its derivatives. Product cases where LLM brings real value to business.
Inna Lizunova,
Speech Technology Center
Details
Custom CUDA Kernels: Practical Approaches to Low-Level Optimizations
Grigorii Alekseev, Perplexity
Details
Mathematics and AI
Ivan Oseledets,
AIRI
Details
Sergey Kuznetsov, HSE
Moderator
Roman Doronin, Bioptic.io
Moderator
Creating applications with an LLM: more than just writing a prompt
Details
Sergey Verentsov, EORA
GigaSearch or Search Engine on GigaChat
Details
Prohor Gladkih, SberDevices
I'll tell you how we combat hallucinations and data obsolescence in GigaChat using the RAG (Retrieval-Augmented Generation) approach
At first glance, it may seem that application creators using LLM simply write prompts and integrate them with a public API. However, when automating scenarios using LLM, developers need to pay special attention to the correctness of responses and security when interacting with the model. At the same time, methods for designing LLM call chains are evolving, allowing prompt engineers to develop not just prompts, but entire scenarios of data retrieval and model calls using techniques such as ReAct, RAG, FLARE, and others. This presentation will cover the main challenges in creating LLM-based applications, the list of required competencies, as well as the peculiarities of planning, development and support of such applications.
Alexander Gasnikov, Innopolis, MIPT
AI wine, AI chocolate, and other new optimization techniques
Details
We will talk about how to solve optimization problems if it is impossible to obtain the gradient of the objective function and it is impossible even to obtain the value of the objective function. But we can compare the values of the objective function at different points. That is, by requesting the values of the objective function in a set of points, one can, for example, tell at which point the value was the smallest (or largest), but one cannot tell what is this value. Such problem arise when developing various food products (for example, chocolate) with the help of Artificial Intelligence. The report will discuss the question of which algorithms are optimal for a class of smooth optimization problems (convex, non-convex) of large dimension and small dimension.
With the support of
With the popularity of LLM, we at Speech Technology Center have gone through a large number of product pilots. Most of them are somehow related to generative search (RAG, Retrieval Augmented Generation) from very different sources of information. In this report, we will share our own experience in solving product cases using LLM:
- What can "vanilla" RAG transform into in product cases? How to recognize it?
- Which RAG-like cases are worth pursuing, and which ones are not, and under what conditions?
- Where is the business value in such cases?
Innovations in artificial intelligence have led to it becoming an integral part of society and finding applications in a variety of fields. In this session, we will cover AI implementation into office software; highlight what benefits AI can bring to users when working with documents; take into consideration various markets in terms of using the leading AI providers.


In this talk, I will highlight several fundamental problems in AI that lack mathematical formalism, for example, alignment of large language models. On the other hand, many concepts in Mathematics can be effectively used to improve quality of ML algorithms. For example, hyperbolic geometry is a vivid example. Replacing ordinary embeddings with hyperbolic ones leads to SOTA in metric learning.
Agenda:
1. How to fuse a QKV Attention Layer into a single CUDA kernel?
2. Step-by-step guide to writing efficient kernels using a basic algorithm

Quick Overview:
1. We will explore kernel-level operations to understand how LLM layers function, specifically focusing on the QKV attention layer from llama_7b. I will present my approach for implementing a fused CUDA kernel, including code snippets. This session will also cover benchmark analysis and potential optimization strategies.
2. This segment involves a deeper examination of CUDA kernels, focusing on optimization techniques and profiling. We'll research the process of enhancing a basic histogram kernel, analyzing its behavior from various aspects and comparing it to a third-party solution.
14:00 – 15:00
Lunch
15:00 – 16:30
Plenary session 2 - overviews
Main conference hall
15:00 – 15:45
Main in Generative AI in 2023
Alexander Notchenko
Co-founder of OpenDataScience, Organizer of ODS London
Last year was monumental for generative AI, and we all probably understand importance of LLMs in that revolution. But in this talk I will outline all the other important developments in generative AI for the past year, specifically in 2D Images, Videos, Audio, 3D models, Animations and much more. I will analyse the main reasons that drive development of these models.
15:45 – 16:05
AI in retail - overview
Mikhail Neverov
X5 Tech
We will show how Data Science and AI are transforming grocery retail, from choosing a store location to personalizing service and optimizing employee performance. Let's look at examples of using analytics to predict trends, manage inventory, and develop loyalty programs that increase sales and improve the customer experience.
Join us to find out how data-driven solutions make retail more adaptive and customer-oriented.
16:05 – 16:30
CTOs' perspective on generative AI

Marina Dorokhova
Yakov & Partners
The talk focuses on the outlook of generative AI for business as seen by CTO's. The results are obtained through our own survey of CTOs in 100 companies in Russia across 15 industries and discuss their expectations from implementing generative AI, most popular use-cases, expected budget to spend on generative AI, and common risks and problems seen in the field. Thus, it synthesizes main lessons that businesses, developers and researchers can take into account when developing generative AI models and use-case specific products for industries.
16:30 – 17:00
Break
17:00 – 18:00
Parallel sessions
Recommendation systems in industry
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
Generative AI: images & video
Reinforcement learning
Hall 4
Tutorials
Tinder in Direct Selling
Elizaveta Pushkareva,
Tochka
Sergey Lukashkin, VTB
Moderator
Yuri Visilter, GOSNIIAS
Moderator
Arkady Sandler,
True Click Technologies
Igor Pivovarov
OpenTalks.AI
Moderator
Recommendation systems in media platforms
Fedor Smirnov,
Glowbyte
Probability programming
Methods for optimizing AI algorithms
Details
Dmitry Ivanov
MSU, Cifrum
Evgeny Burnaev,
Skoltech
From stochastic differential equations to the Monge-Kantorovich problem and back: the path to artificial intelligence?
Ruslan Ermagambetov, Kontur
Pulse-index: dynamic estimation of bankrupcy probability
Details
Vyacheslav Korzun, MIPT
Autoregressive models for Conversational Gestures Generation. The Path through GENEA challenges
Details
Details
Details
Details
Details
Will be announced later
Moderator
Neural network for optimizing the number and cost of conversions in advertising campaigns
Details
Alexey Biryukov,
Andata
Ruslan Salakhutdinov,
Carnegie Mellon
Reinforcement learning - recent advances (talk name will be updated)
Details
(online)
Every day, sales managers in Tochka call to 3000+ customers, but what if we can influence this process and select customers to call for each seller? I'll tell you how we made a machine that ranks companies by the likelihood of converting into a client, and then converted it into a kind of Tinder: cards with contacts are divided between sellers every day based on personal speeds, predicted productivity and urgency of the call. I will touch on the technical side of the personal speed prediction machine + card booking strategy and how to choose the optimal booking time window. I'll also show you how much money we made from this.
Exploitation of neural network on scaling and subsequent optimization of brand advertising campaigns, with the condition of increasing the share of targeted bids and without cannibalization of organic traffic
During my lecture, I'll explain how we at Kontur.Focus developed and integrated a dynamic model for assessing the probability of bankruptcy into our product. This model is designed to help Kontur.Focus users in assessing the reliability of counterparties, as it has a predictive ability to assess the risk of bankruptcy of a company based on financial statements, arbitration claims and other events. We will analyze the details of training and deploying the model, what difficulties we encountered during integration, and how we collected feedback on the new feature.
The development of large language models and speech synthesis systems has led to the emergence of "intelligent" agents in virtual worlds. These agents also require correct gestures during interaction with humans. In this talk I will tell you how these gestures can be generated from speech and more. Here I present our approaches for conversational gestures generation emerged from participation in the GENEA Challenges, which led us to three papers. I will describe the problem itself, the first models for solving it and our approaches. What the main limitation of autoregressive models that we encountered, how we tried to overcome it and how video games helped us.
In the realm of Over-The-Top (OTT) and Video On Demand (VOD) services, two principal challenges significantly impact operational efficiency: churn rate and content utilization. A high churn rate undermines the efforts invested in user acquisition, while suboptimal content utilization can decrease viewer interest, further exacerbating the churn issue. These platforms often allocate the largest portion of their budgets to acquiring new users and securing content rights. Therefore, addressing the critical questions of how to maintain viewer engagement and which content to acquire—or how to better leverage existing content—becomes essential for success in the OTT/VOD industry.
A study by PWC highlights that viewer retention correlates positively with the breadth of content consumed, indicating that a diverse and engaging content library is key to reducing churn rates. Given this, market leaders heavily invest in sophisticated recommendation algorithms, seeking to differentiate themselves in a highly competitive market.
The upcoming conference presentation will explore strategies for companies that may not have the resources to compete directly with industry giants. It will delve into how these organizations can still access cutting-edge technology in recommendation systems to enhance viewer engagement and make informed content acquisition decisions, thereby finding their own path to success in the crowded OTT/VOD marketplace.
The topic is probabilistic programming and generative probabilistic models. Probabilistic programming allows incorporating expert knowledge and assumptions into machine learning models, primarily about the interrelationships of various factors, and taking into account unobservable factors that control the process of data generation. As a practical example, let us consider the task of determining the latent needs that determine the structure of customers' receipts.
A.N. Kolmogorov is the greatest mathematician of the 20th century, the founder of modern probability theory, who also laid the foundations for the theory of Markov random processes with continuous time. These results, which had a huge impact on the development of applied methods of signal processing, filtering, modeling and financial data processing, have again come into the spotlight in the 21st century due to the development of artificial intelligence and its applications. Indeed, to solve such important applied problems as increasing the resolution of images, synthesizing speech from text, generating images based on text descriptions, etc., effective generative modeling methods are required that are capable of generating objects from a distribution specified by a sample of examples. Recent advances in the field of generative modeling are precisely based on diffusion models and use the mathematical foundation laid in the last century by A.N. Kolmogorov and his followers. The report will talk about modern approaches to generative modeling based on diffusion processes and based on solving the Monge-Kantorovich problem. The connection between the solution of the entropy-regularized Monge-Kantorovich problem and the problem of constructing a diffusion process with certain extremal properties will be shown. The operation of the corresponding algorithms will be demonstrated using the example of solving various image processing problems.
Modern neural networks are an extremely resource-intensive algorithms in terms of memory, computation, and energy. This results in additional costs for their use and also limits their use on Edge devices. The report discusses the problems of inferring neural networks from a hardware and software point of view. In the first, we briefly discuss the von Neumann bottleneck problem and how to bypass it. In the second, we discuss the main neural netowrk optimization approaches such as: pruning, quantization, distillation, their variants and their combinations. At the same time, we will compare modern AI systems with the brain and explain the reasons for the greater efficiency of the brain. We will show that the most effective approaches to optimizing AI systems use (in some sense) brain-based principles.
undefined
18:00 – 18:15
Break
18:15 – 19:00
Parallel sessions
AI in industry
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
What can go wrong?
LLM - multiagent models
Hall 4
Startup success prediction and VC portfolio simulation
Cyril Shtabtsovsky,
Aloniq
Details
Mark Potanin,
Xihelm
George Kekelidze, IITech
Moderator
Anastasia Semyonova, Smile2impress
Moderator
Victor Nosko, FractalTech
Multi-agent approach in solving mathematical problems with arithmetic reasoning
Details
Sergey Shumsky,
Adam & Eva Inc.
Neuro-symbolic dialog intelligence for cheap
Details
Machine learning in metallurgy
Details
Dmitry Muravev, MMK Informservice
Ilya Makarov,
MISIS, ITMO
LLM4Anything: Multi-Agent Personalized Large Language Model Agents
Alexander Krainov, Yandex
Moderator
Evgeny Burnaev, Skoltech
Dmitry Vetrov,
Constructor University
Ivan Oseledets,
AIRI
Mikhail Burtsev,
London Institute for Mathematical Sciences
Tatiana Shavrina,
Snapchat
Aleksei Shpilman,
Gazpromneft
Details
Transformers and synthetic data for defect detection on conveyor belts
Details
Oleg Kartashev,
Severstal Digital
We will show you how we use and improve transformers and which kind of algorithms we use to generate synthetic data to predict rare defects on conveyor belts
1. Who are the mathematicians in the MMK? 2. What kind of problems they are solving by using different methods and instruments? 3. What is the economic effect of the application of the mathematical modelling? 4. Why mathematical modelling is related to the machine learning? 5. What kind of the case studies we have already carried out?
We will talk about the ability of Large Language Models (LLMs) to provide personalized business-oriented communication with the help of agents. We also tackle the problem of finetuning and adding new modalities for practical applications. Finally, we formulate core challenges and approaches for building applications over LLMs.
We propose a novel, multi-agent approach to solving mathematical reasoning problems. LLMs have shown significant progress in solving math problems, but they have fundamental limitations and do not achieve high quality solutions consistently. In the proposed approach, agents self-organise to create a strategy for solving the problem on the fly, thus achieving robust solutions to a whole class of mathematical arithmetic reasoning tasks.
A new neuro-symbolic architecture of large language models is presented. It combines unsupervised learning and reinforcement learning and requires several orders of magnitude less computing for learning compared to neural network language models. The complexity of learning in the proposed architecture increases linearly with the size of the data, in contrast to the quadratic dependence in neural network models of the language.
We explore predicting startup success using CrunchBase data and deep learning. Our model forecasts milestones like IPOs, unicorn status, and M&A for Series B and C startups. We achieved 14x capital growth (98th percentile of VC funds), identified high-potential startups, and stress the importance of diverse data for accuracy. This work shows deep learning's promise in startup success prediction.

CV, RL and AGI Day

Thursday, March 7
09:00 – 10:00
Registration
10:00 – 11:30
Plenary session 3 - overviews
Main conference hall
10:00 – 10:10
Opening of the second day
Igor Pivovarov, OpenTalks.AI
10:10 – 10:50
Computer Vision - the main things happened in 2023
Artsiom Sanakoyeu
Senior Research Scientist, Meta AI
In this talk I will spotlight the year's most exciting papers and advancements in Computer Vision. From novel scaled-up architectures that boosted the recognition capabilities, to the strides made in self-supervised pre-training that unlock new levels of understanding without extensive labeled datasets. We'll explore the fusion of vision and language in multimodal systems, demonstrating how these combined inputs enhance machine perception. The talk will also cover the latest in fine-grained tasks, including segmentation, detection, and tracking, showcasing the precision and detail now achievable. Plus, discover the role of generative models in visual representation learning, and their application in tasks like segmentation and depth estimation, setting new research avenues.
10:50 – 11:30
Reinforcement learning - main things happened in 2023
Aleksei Shpilman
Head of AI Development Programs, Gazprom neft
Reinforcement learning in 2023 - a year in review.
We are going to go through the most important, the most interesting and a couple of just fun papers.
11:30 – 12:00
Break
12:00 – 13:00
Parallel sessions
Computer Vision in Healthcare
Hall 3 - Academy
Hall 2 - R&D
Hall 1 - Business
Computation Optimization
LLM: language models
Hall 4
Reinforcement learning
AI decreases time and increases recall during routine CT examination
Anvar Kurmukov,
AUMI.AI
Arkady Sandler,
True Click Technologies
Moderator
Stanislav Moiseev, Tinkoff
Moderator
Details
Large Language Model Fine-Tuning Acceleration with Data Reduction via Losses
Alexander Demidovsky,
Huawei RRI
Fast Implementation of the Node2Vec Algorithm
Polina Plastova,
YADRO
Foundation models in medical imaging.
Evgeny Sidorov,
Third Opinion Platform
Details
Anastasia Semyonova, Smile2impress
Moderator
Overview of Federated Learning Methods
Denis Afanasyev, CrossOverMarkets
Human-AI interaction in healthcare
Ilya Pershin, Innopolis
Details
Automated system for analysis if OCT retina images development and testing
Kirill Aksenov,
LLC PREDICT SPACE
Details
Yury Chernyshov,
CyberLympha
Multi-Agent Reinforcement Learning - overview
Anton Plaksin,
Yandex Research
Reinforcement Learning in Zero-Sum Differential Games.
Details
Details
Details
Details
Details
Andrey Filchenkov,
ITMO University
Moderator
Deep Reinforcement Learning-based Congestion Control for File Transfer
Alexander Blokhin, Huawei
Details
Vitaly Kalev,
Huawei
Pavel Braslavski,
Nazarbayev University
You Told Me That Joke Twice: A Systematic Investigation of Transferability and Robustness of Humor Detection Models
Details
Linguistic and logical structures for text analysis
Dmitry Ilvovsky,
HSE
Sergey Kuznetsov, HSE
Details
Maria Tikhonova,
SberDevices, HSE
mGPT: LLM speaking 61 languages including Georgian and Russian
Details
Neovascular age-related macular degeneration (n-AMD) is a form of AMD that is responsible for most cases of severe vision loss. Anti-VEGF therapy, which is the gold standard for the treatment of this pathology, is accompanied by OCT monitoring. However, this process is hampered by the lack of methods for accurately quantifying OCT images. The aim of this study is to develop and evaluate the accuracy of the automated calculation of the quantitative characteristics of PED, SRF and IRF biomarkers. The study material included OCT B-scans of patients with n-AMD and pigment epithelial detachment who underwent anti-VEGF therapy from 2014 to 2021. OCT B-scans obtained from a CirrusHD-OCT 5000 Carl Zeiss Meditech device. The neural network for OCT image segmentation was trained on a dataset including 251 and 385 images from Experiments 1 and 2, respectively. The images were annotated by experts highlighting PED, SRF and IRF biomarkers using Labelme software. Data preprocessing included image resizing, normalization, and conversion to grayscale format. The data set was divided into training and validation. To segment retinal structures, the UNET architecture with the Adam optimizer and the Categorical Cross-Entropy loss function was used. The algorithm for calculating quantitative biomarker characteristics was based on edge detection using the method of Satoshi Suzuki and KeiichiA be. Testing data set for access the efficiency of system that included algorithms for segmentation and calculation of quantitative characteristics of biomarkers, included 241 images for which the length and height of the PED were measured by a physician using built-in software. Also, the image data were marked with respect to 3 anatomical treatment outcomes: attached PED; non-attached PED; PED tear. The developed method for processing OCT images made it possible to segment the biomarkers PED, SRF and IRF with high accuracy. The segmentation model shows the best results for PED (0.9), but also shows good accuracy for SRF and IRF (0.72 and 0.69) with increasing number of training data in experiment 2. Automated algorithm for calculating quantitative characteristics of biomarkers on the test set data from patients with n-AMD showed no statistically significant difference when comparing measurements with a physician. The study also showed that the attached and non-attached PED groups were statistically significantly different regarding the height, extent and area of the PED. In addition, IRF area may also be a predictor of PED tear, since its values are statistically significantly different for groups 2 and 3. Thus, automated segmentation and calculation of biomarkers can achieve performance comparable to an ophthalmologist in assessing the quantitative characteristics of biomarkers in cases of neovascular macular degeneration.
The main part of the presentation will address the problem of effective planning of radiation therapy. For planning, it is necessary to segment a large number of anatomical structures. The task of segmentation is complicated by the fact that 1) three-dimensional medical images are used and 2) the organs of patients are abnormal. For these reasons, the results of automatic segmentation require manual corrections. An approach will be presented to optimize the segmentation correction process in real time based on information about the doctor's view. In the additional part of the presentation, the problem of interpretability of deep models will be considered.
Radiologists dedicate more than half of their diagnostic time to in- terpreting computed tomography (CT) scans, with chest and abdominal scans being particularly detailed and time-intensive due to the need to meticulously identify and describe a variety of diseases. Our cutting-edge product simultaneously analyzes 10 different diseases in these scans, in- cluding disorders affecting the lungs, heart, bones, and abdominal regions. In this study, we demonstrate how introducing an AI-assisted study pro- vides a substantial time-saving advantage and lessens the heavy workload currently borne by radiologists. Specifically, it saves up to 20% of the time spent on CT examinations (≈ 2.5 mins on average), and increases the average recall by over 29%, while preserving the same level of positive predictive value.
In this talk we will describe the challenges congestion control for file transfer has, propose an implementation of congestion control algorithm based on Reinforcement Learning techniques and show how it was applied in real life
Over the past years, foundation models and LLMs have demonstrated enhancements in measurable aspects and the development of new qualitative features, creating a need for their comprehensive evaluation and analysis of the associated risks. To address these issues, we present MERA, a new instruction benchmark for evaluating foundation models oriented toward the Russian language. The benchmark encompasses 21 evaluation tasks for generative models. The talk presents the new evaluation methodology, an open-source code base for the MERA assessment, a leaderboard with a submission system, and the evaluated baselines' results.
This presentation aims to provide a comprehensive overview of Federated Learning, highlighting its recent developments, applications, and trends as of 2023. Federated Learning, a rapidly evolving field in machine learning, involves training algorithms across decentralized devices or servers while keeping data localized. The talk will commence with a brief introduction to Federated Learning, elucidating its core principles and significance.

Following this, the presentation will delve into various key cases and application areas, demonstrating the practical utility and versatility of Federated Learning in diverse sectors. A significant portion of the talk will be dedicated to discussing the advancements in this domain over the course of 2023. This examination is grounded in a thorough study of the general informational landscape on this topic, encompassing an analysis of thematic conferences, academic publications, updates to open-source tools, and GitHub repositories.

Additionally, the presentation will showcase a curated collection of news from companies developing solutions in this area, aiming to provide insights into the business and technological implications of these developments. A critical evaluation of the maturity level of Federated Learning technology will be offered, assessing its readiness for widespread adoption. This assessment will touch upon the challenges faced, potential risks, and the future prospects of Federated Learning, providing a well-rounded perspective on its current state and future trajectory.
Node2Vec is a widely used algorithm for learning feature representations of the graph nodes. This algorithm is intensivelly used in multiple highload applications. Thus its performance is very important. There are two reference implementations of the Node2Vec in C++ and Python from Stanford Network Analysis Project (SNAP). However, their performance is not optimal. We introduce an optimized implementation of the Node2Vec algorithm, which performance is 2.5-5.1 times higher than the reference ones. We also prove that the accuracy of the optimized algorithm stays the same by solving a multi-label node classification problem on several datasets.
Linguistic and logical text structures are very useful for some applied tasks like dialogue generation, argument mining and fact verification. We will consider several cases of such tasks: multi-party dialogue generation by means of discourse structure and also fact correction based on information retrieval combined with logical reasoning.
Robust Reinforcement Learning (RRL) is a promising Reinforcement Learning (RL) paradigm aimed at training robust to uncertainty or disturbances models, making them more efficient for real-world applications. Following this paradigm, uncertainty or disturbances are interpreted as actions of a second adversarial agent, and thus, the problem is reduced to seeking the agents' policies robust to any opponent's actions. This paper is the first to propose considering the RRL problems within the positional differential game theory, which helps us to obtain theoretically justified intuition to develop a centralized Q-learning approach. Namely, we prove that under Isaacs's condition (sufficiently general for real-world dynamical systems), the same Q-function can be utilized as an approximate solution of both minimax and maximin Bellman equations, and we also indicate conditions when this Q-function can be decomposed. Based on these results, we present the Isaacs Deep Q-Networks (IDQN) and Decomposed Isaacs Deep Q-Networks (DIDQN) algorithms, respectively. We analyze their performance by comparing them with other baseline RRL and Multi-Agent RL algorithms. We consider both simple environments with known accurate solutions and complex large-dimensional MuJoCo environments. In each experiment, we thoroughly evaluate the agents' policies obtained after learning, training opponents against them using various RL algorithms with various parameters. The experiment results demonstrate the superiority of the presented algorithms in all experiments under consideration.
As industry needs to process growing amounts of training data, reduce the cost of fine-tuning a single model, and minimize the environmental effects, the task of accelerating the fine-tuning of large language models (LLM) has become extremely demanding. DAREL is a novel training data reduction method that operates with training samples based on losses obtained from a currently trained model or a pre-trained one. The proposed method is devoted to Large Language Models fine-tuning and is designed primarily to be combined with Parameter-Efficient fine-tuning methods, such as LoRA. The results of computational experiments provide compelling evidence of the enhancement of the fine-tuning quality and time of Large Language Models. DAREL allows an average 1.26x fine-tuning acceleration for GPT2-S, GPT2-M and GPT2-L on a variety of datasets, including E2E-NLG, DART and WebNLG, with an average BLEU drop of 1.44 p.p.
Automatic humor detection is a highly relevant task for conversational AI. To date, there are several English datasets for this task, but little research on how models trained on them generalize and behave in the wild. To fill this gap, we carefully analyze existing datasets, train RoBERTa-based and Naïve Bayes classifiers on each of them, and test on the rest. Training and testing on the same dataset yields good results, but the transferability of the models varies widely. Models trained on datasets with jokes from different sources show better transferability, while the amount of training data has a smaller impact. The behavior of the models on out-of-domain data is unstable, suggesting that some of the models overfit, while others learn non-specific humor characteristics. An adversarial attack shows that models trained on pun datasets are less robust. We also evaluate the sense of humor of the chatGPT and Flan-UL2 models in a zero-shot scenario. The LLMs demonstrate competitive results on humor datasets and a more stable behavior on out-of-domain data. We believe that the obtained results will facilitate the development of new datasets and evaluation methodologies in the field of computational humor. We've made all the data from the study and the trained models publicly available.
Reinforcement Learning is used for solving of different problems and tasks in different subject areas (traffic control, behavior modelling, SW testing, cybersecurity etc.). There are a lot of real-world tasks when a single agent have to cope with other agents (to coordinate or compete) and multi-agent systems (MAS) is used for such situations. High-dimensional RL-MAS environment causes "curse of dimension" problem and deep learning helps to solve this problem efficiently. This presentation covers some examples of using RL and DeepRL for multi-agent systems.
We will discuss why we decided to combine multimodal networks, unlabelled data, and a fresh perspective on the DICOM format into a single fundamental model. We'll explore what this has brought us and why the future lies in this direction.
Alexey Trutnev,
Huawei RRI
Regina Gareeva,
AUMI.AI
13:00 – 13:15
Break
13:15 – 14:00
Parallel sessions