Alexey Voropaev,
Evocargo
Autonomous Truck Perception System for Closed Areas
Andrey Kuzminykh,
DocetTI
Synthetic data: Learning self-driving cars in simulation
Nikita Andriyanov,
Fin. University
Computer vision for an agrobot-manipulator for picking apples
Alexander Notchenko, Deepcake
Generative AI for Creative Industries
Alexander Platonov,
Poehali.ru
Vladimir Novoselov, Realweb
Development and practice of using tools based on generative neural networks in the work of a Digital agency
Anastasia Myshkina, Realweb
We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained transformer models, namely, BERT and RoBERTa in NLP area, and HuBERT for Speech data. Our results demonstrate that TDA is a promising new approach for speech and language analysis, especially for tasks that require structural prediction. We also show that topological features are able to reveal functional roles of Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning.
The talk will cover one of the main topics in the international AI community - Creative Artificial Intelligence. First, I will speak about the task itself and its history, how we started with classic CV tasks and proceeded to text2image models. Further I will describe the main trends in multimedia data synthesis in 2022-2023 and observe current SoTA architectures, giving a brief description of our diffusion-based text2image model Kandinsky 2.0. After that we will speak about different applications of Creative AI today and in the nearest future in terms of my vision. And finally I will show how we proceed in Creative AI for high fidelity face swap on images and video, describe our current SoTA solution - the GHOST model, and show our marketing applications in movie production, advertising, etc.
Generative models in business
Natural Language Processing - research & development
Robots and drones - research & development
Creative AI models design. New trends and applications.
Topology meets BERTology: Topological Data Analysis for the understanding of Transformers
Irina Piontkovskaya,
Huawei Noah's Ark Lab
The report presents our experience gained during the development of the apple picking robot. Particular attention is paid to the computer vision system for detecting apples. We will also talk about the positioning system relative to the camera and the robotic arm. This compares several stereo cameras, such as the Intel Real Sense Depth Camera D415/D455 and ZED2. What is the error in estimating the coordinates, why is the Internet of Things here and how did you manage to achieve recall at the level of 95%. It will be about problems, and about difficulties, as well as about the joy of the first picked apple.
Alexey Postnikov,
Sber Robotics Laboratory
What can large sequential models bring to robotics?
This talk will explore the ways in which generative artificial intelligence (AI) is being used to augment and enhance the creative process in a variety of industries. The talk will cover the basics of generative AI, including some history, key concepts, and current state of the art. We will discuss specific applications of generative AI in fields such as music, film, and video games. I'll share some nuances of adapting conventional ML lifecycle to fit the requirements of creative industries, and how we overcame them at Deepcake. Overall, I'll try to provide a comprehensive understanding of the role of generative AI in the creative industries and its potential to shape the future of creativity and innovation from perspective of AI startup in the field.
The development of self-driving cars has been a major focus in the field of artificial intelligence. To achieve this goal, large amounts of data are required for training machine learning algorithms. However, collecting and labeling real-world data can be time-consuming and expensive. To overcome these challenges, this paper proposes using synthetic data for learning self-driving cars, including the ability to generate unlimited amounts of diverse and controllable data. We developed a solution for efficient and stable integration of RLLib with Carla simulator. We present end2end solution for learning self-driving cars in Carla simulation environment with GYM-interface. The results demonstrate the effectiveness of using synthetic data in training RL-agents for autonomous vehicles. The findings suggest that synthetic data has the potential to significantly accelerate the deployment of self-driving cars by providing a cost-effective and scalable solution for training machine learning models.
This presentation provides an overview of the current state of robotics and the latest developments in the application of large sequential models (such as GPT-3) to the field. The focus is on how these models can enhance the capabilities of robots and enable them to perform a wider range of tasks and interact with humans in new ways. The talk covers the latest trends in the field, including new models, such as SayCan, that are designed to enable more natural human-robot interaction, as well as the potential benefits and challenges of using large language models in robotics. The presentation concludes by exploring some of the future directions and opportunities in this rapidly evolving field.
Расскажу про реализацию системы восприятия на основе лидаров и камер в нашем беспилотном грузовике. Расскажу, как мы преодолели ограничения промышленного вычислителя для эксплуатации на объектах заказчиков.
Практическое применение генеративных нейронных сетей в практике работы компаний должно получать конкретные прикладные реализации. В своем докладе мы показываем на примере работы крупного Digital агентства, каким образом современные генеративные нейронные сети, будучи дообученными на исторических, маркетинговых, аналитических и финансовых данных компании, могут стать нативным инструментарием для самых различных ролей внутри компании, будучи интегрированным во внутреннюю ERP систему. Покажем реальный опыт внедрения и постараемся оценить результат и оказанный эффект на бизнес компании, порассуждаем о развитии инструментария.
Svetlana Korobkova, Docet TI
Image generation for social media content
"Capture and share the world's wonderful moments" is the slogan of Instagram, this states that images are the dominant point of communication in contemporary social media.
We present a technology for image generation for social media, which can help bloggers who have to produce huge amount of visual content daily to maintain high level of engagement rate for a blog.
Modern out of the box image generation technologies are mostly based on simple textual (and/or visual) "prompt", that is not able to take in consideration a lot of details, which determine blog style.
Our approach allows performing automatic detailed analysis of blog content and use all extracted details as a complex prompt to produce new content which is semantically close to the original and vary the proximity of the original and generated visual blog content style.
Anastasia Semenova,
CleverData
Disassembly and Modification of TiSASRec
The process of creating a script for a voice robot operator involves a number of routine operations performed by trained specialists. Our experience in creating such scripts allow to confirm that almost the entire path of creating a robot script can be automated to the magic button "Create script", which will allow programming the robot without special knowledge to solve communication problems over the phone. Let's talk about experiments with AI generator to automate the creation of a script based on real dialogues of live operators with subscribers.
Maria Tikhonova,
SberDevices, HSE
Overview of Controllable Text Style Transfer
Text Style Transfer is an important task in NLP, which aims to control certain attributes in the generated text, and to generate or paraphrase text in a specific style. This talk concentrates on a specific style transfer approach known as controllable text style transfer, where one aims to generate a text in a specific style by controlling the generation of a language model so that the generated text is written in a desired style. The presentation gives the broad overview of the controllable text style transfer methods, covering such approaches as CTLR, GeDi, ParaGeDI, FUDGE, DExperts, and CIAF, highlighting possible ways of the developing of this area of research.