Conference On
Emerging Trends in AI and Computing Research

September 30 – October 1 2022, Sofia Bulgaria

World-leading computer and AI scientists from Google, Waymo, AWS, MIT, ETH Zurich, EPFL, Max-Planck, and Yale will present in Sofia the latest and most exciting research set to fundamentally change technology in the coming years.

INSAIT welcomes software engineers, entrepreneurs, academics, students and deep-tech investors to participate in this unique event. The conference will be held annually as part of INSAIT’s mission of establishing the region as a world-class research and deep tech destination.

The conference spans a diverse set of topics including Machine Learning, Computer Vision, Autonomous Driving, Cybersecurity, Natural Language Processing, Programming Languages, Cryptography, Verification, Computer Architecture, and Programmable Networks.



Day 1 – September 30 (Friday)

14:00 – 14:10Prof. Martin VechevETH Zurich / Architect of INSAITOpening
14:10 – 14:40Dr. Kristina ToutanovaGoogleRepresenting text in context
Natural Language ProcessingLanguage models have been shown to provide powerful contextualized representations for a variety of natural language processing tasks. I will talk about some of our work on pre-training transferable representations of text in a rich context of growing scope. Context examples are immediate text passage, background text about mentioned entities, article text about tables, detailed visual context to understand text in images, and retrieved relevant passages from large-scale document collections. I will also touch on open problems in generalization and efficiency when applying large pre-trained representations to real-world applications.
14:40 – 15:10Dr. Dragomir AnguelovWaymoMachine Learning for Autonomous Driving
Computer Vision / Autonomous DrivingComing Soon
15:10 – 15:40Prof. Luc Van GoolETH Zurich / INSAITNow that we know the answer is 42
Computer VisionDeep learning is more popular than ever. Research goes in many directions and progress is made at a very high pace. In this talk, I want to take a bit of a step back and go through some issues that I find intriguing but that still pretty much represent uncharted territory. Let me give two examples. The first is the clearly important role of sensors, which are now taken and used as a given, but should actually themselves be part of a more selective optimization process. Achieving that would give end-to-end training a new meaning. It would also be a lot more true to what nature does. Staying with nature also brings me to the second issue, that of neural network architectures. Engineers play around with a handful of deep neural network architectures that prove successful in a range of signal processing fields. The situation has changed a lot compared to the old days, where very different, mutually incompatible approaches were followed in different fields. The fact that so many subareas of signal processing have turned to deep learning now, creates ideal circumstances to leave our old, siloed approach to the different fields of signal processing behind. That is, just like the human brain, one can start to imagine similar processing principles being applied to multiple modalities, like seeing, touching, hearing, etc. As a matter of fact, the mammal brain has brought about an architecture that seems to be pretty similar across the neo-cortex, and that does indeed manage to successfully process and reason about this plethora of sensory input types. Can we break the code of this shared piece of wetware?
15:40 – 15:50Break
15:50 – 16:20Prof. Virginia Vassilevska WilliamsMITFine-Grained Algorithms and Complexity
AlgorithmsA central goal of algorithmic research is to determine how fast computational problems can be solved in the worst case. Unfortunately, for many central problems, the best known running times are essentially those of their classical algorithms from the 1950s and 1960s. For years, the main tool for explaining computational difficulty have been NP-hardness reductions, basing hardness on P≠NP. However, if one cares about exact running time (as opposed to merely polynomial vs non-polynomial), NP-hardness is not applicable, especially if the problem is already solvable in polynomial time. Over the last decade, a new theory has been developed, based on “fine-grained reductions” that focus on exact running times. In this talk I will give an overview of this area, and will highlight some exciting new developments.
16:20 – 16:50Dr. Rupak MajumdarMax Planck Institute / AmazonHow to Randomly Test Programs and Why are Random Tests any good?
VerificationProgrammers often test complex distributed programs by running them on random data. In this talk, I will describe how the effectiveness of these techniques depend on basic combinatorial insights. That is, one can sometimes prove properties of random testing procedures. I will also discuss how some other problems related to random testing can be computationally hard.
16:50 – 17:20Prof. Martin OderskyEPFLTracking Captures in Types
Programming LanguagesI am reporting on a new project to model resources and effects in a unified framework based on capabilities. Resources and effects are two areas where static typing has lagged behind. Despite many 100s of papers there is no wide-spread adoption yet, because dealing with these issues still requires too much notational overhead. In this project we hope we can shift the cost-benefit ratio by modelling both resources and effects using object capabilities. We have developed a foundational calculus to model scoped capabilities with bounded lifetimes. The project will build on that calculus, extend it further, and implement it in experimental versions of the Scala programming language. The project offers great opportunities to advance the state of the art for people interested in language and compiler design.
17:20 – 17:40Break
17:40 – 18:10Dr. Mariana RaykovaGoogleExposure Notifications Private Analytics
CryptographyThis talk will provide an overview of the Exposure Notifications Private Analytics (ENPA) system developed by Apple, Google, ISRG, MITRE and NCI in conjunction with the Exposure Notifications System (ENS) provided by Apple and Google. The goal of ENPA is to enable health authorities to obtain key epidemiology metrics about the ENS deployment and corresponding indicators about the pandemic. We will motivate the need for the private analytics system in the context of Exposure Notification, describe its functionality and privacy properties, and discuss the practical challenges we encountered in the process of deployment. Finally, we will give examples of uses of the data generated by the ENPA system.
18:10 – 18:40Prof. Mathias PayerEPFLTales of program crashes and vulnerabilities
Software SecurityAll software has bugs and some of these bugs can be exploited by an adversary to gain unintended access to private data and computation. We study vulnerabilities along three dimensions. First, by developing techniques to discover vulnerabilities, allowing developers to fix them before code is being deployed to users. Here we embrace incompleteness to scale to the massive size of current software. Second, by creating mitigations that make exploitation of any remaining bugs harder, increasing the cost for adversaries. Third, by researching novel compartmentalization mechanisms that break large monolithic software into smaller fault domains to further limit adversaries. In this talk, we will give an overview of the software security landscape in general, and our three research dimensions in particular. The overarching goal is to increase security guarantees of software systems by fixing bugs early, prohibiting adversaries form exploiting remaining bugs, and restricting the power they get through any component.
18:40 – 19:10Prof. Srdjan CapkunETH ZurichOn the Security of (GNSS, 5G, UWB, …) Positioning
Computer SecurityProximity, distance and location information is predominantly provided to devices through their radio interfaces. Many systems such as contactless payments, passive keyless entry and start systems, digital contact tracing, autonomous navigation, rely on the correctness of distance and location information. The ability of the attacker to manipulate distance or position information via relays and other physical-layer attacks can, in part or fully, violate the functioning of these systems and lead to theft of property and funds, physical damage or denial of service. A number of such attacks have been demonstrated in the last decade. In recent years, these physical-layer attacks have been integrated into attacker models and have an increasing impact on radio designs and standards, specifically UWB, WiFi and 5G. First distance measurement radios built specifically to resist physical-layer attacks have already been commercialized and are now deployed in the automotive industry. In this talk, I will provide an overview of this subject area, outlining the key challenges, proposed and deployed solutions, as well as ongoing research and standardization efforts. I will particularly focus on the security of the UWB ranging solutions that are currently used in cars and smartphones.

Day 2 – October 1 (Saturday)

09:30 – 10:00Prof. Martin JaggiEPFLTowards Collaborative Decentralized Learning
Machine LearningComing Soon
10:00 – 10:30Prof. Otmar HilligesETH ZurichHuman-Centric 3D Computer Vision for Future AI Systems
Computer VisionFuture AI systems such as self-driving cars, personalized healthcare robots, and AR/VR-based telepresence systems, will only be safe, useful and widely adopted if they are able to perceive and interpret human pose, shape and appearance at levels rivalling our own and if they can interact with us and the world in a human-like and natural fashion. This requires perceiving and analyzing human behavior from images and video and generation, control and synthesis of virtual humans. To this end we propose a novel representation of human pose, shape and appearance that combines the advantages of neural implicit surfaces with those of parametric body models: i) a continuous and resolution-independent surface representation that can capture highly detailed geometry and can naturally model topology changes, ii) coupled with the ease of use and generalization capabilities to unseen shapes and poses of polygonal mesh-based models. We also introduce algorithms to learn such representations without requiring manually specified skinning weights or other forms of direct supervision.  We then discuss how to leverage this representation to reconstruct controllable avatars (full body, faces and more) directly from images, videos or short RGB-D sequences via differentiable rendering. Finally, to make 3D human avatars widely available, we will discuss work towards generative modelling of 3D virtual humans with diverse identities and shapes in arbitrary poses and of interactions with 3D objects in a physically plausible manner.
10:30 – 11:00Prof. Martin VechevETH Zurich / Architect of INSAITProvably Fair and Robust Machine Learning
Trustworthy Machine LearningIn this talk I will present our latest results on building provably fair and robust machine learning systems, a fundamental challenge of increasing importance when it comes to deploying machine learning in the wild. I will discuss both tabular and vision data as well as deterministic and statistical certification methods.
11:00 – 11:10Break
11:10 – 11:40Prof. Peter MüllerETH ZurichAutomated Modular Program Verification
Program VerificationSoftware is notoriously difficult to get right. Testing is useful to discover bugs, but is unable to provide any guarantees of correctness and security. In contrast, program verification proves mathematically that a program satisfies its specification for all possible inputs, thread interleavings, and interactions with the environment and, thus, can guarantee the absence of certain undesirable behaviors. This talk motivates program verification, demonstrates the use of a state-of-the-art verification tool, and outlines open research challenges.
11:40 – 12:10Prof. Laurent VanbeverETH ZurichThe three tales of (correct) network operations
Programmable NetworksHow much do we know about the routing algorithms that sit at the heart of our networks and rule over traffic forwarding? Still relatively little, if you ask me. One thing we do know for sure though is that operating these algorithms is hard, so hard in fact that network operators often make mistakes (and cause downtimes) in doing so. In this talk, I’ll discuss our 10-year (and counting!) journey exploring the operational aspects of routing algorithms, focusing on three aspects in particular: verification, synthesis, and reconfiguration. I’ll explain how we can verify the correctness of a routing algorithm; how we can control its outputs; and how we can adapt its behavior over time. On the way, I’ll mention several open problems together with future research directions we are currently exploring in my group.
12:10 – 12:40Prof. Onur MutluETH ZurichIntelligent Architectures for Intelligent Machines
Computer ArchitectureComputing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications’ performance, efficiency and scalability are bottlenecked by data movement. We describe three major shortcomings of modern architectures in terms of 1) dealing with data, 2) taking advantage of the vast amounts of data, and 3) exploiting different semantic properties of application data. We argue that an intelligent architecture should be designed to handle data well. We show that handling data well requires designing architectures based on three key principles: 1) data-centric, 2) data-driven, 3) data-aware. We give several examples for how to exploit each of these principles to design a much more efficient and high performance computing system. We will especially discuss recent research that aims to fundamentally accelerate machine learning and practically enable computation close to data, with at least two promising novel directions: 1) performing massively-parallel bulk operations in memory by exploiting the analog operational properties of memory, with low-cost changes, 2) exploiting the logic layer in 3D-stacked memory technology in various ways to accelerate important data-intensive applications. We discuss how to enable adoption of such fundamentally more intelligent architectures, which we believe are key to efficiency, performance, and sustainability. We conclude with some guiding principles for future computing architecture and system designs.
12:40 – 13:00Break
13:00 – 13:30Prof. Ce ZhangETH ZurichBuilding Machine Learning Systems for the Era of Data-Centric AI
Machine LearningRecent advances in machine learning (ML) systems have made it incredibly simpler for many people to train their own ML/AI models. However, this does not mean that the job of an “ML developer” is becoming any easier. Many are still struggling to build high-quality and trustworthy ML, which is challenging given its data-centric nature: often the quality of ML is a reflection of the quality of the underlying data. As we sail past the era in which the primary goal of ML platforms was to support the building of models, we have to think about our next-generation ML platforms as systems that also support the enforcement of even higher quality, trustworthiness, and regulatory compliance, possibly via understanding, refining, and cleaning the underlying data. This is a challenging task, and it requires us to take a holistic view of data quality, data management, and ML altogether. In this talk, I will discuss some of our thoughts in this space, illustrated by several recent results about data debugging and data cleaning for ML models to improve their quality and trustworthiness systematically. In addition to these technical results, I will also provide a bird’s-eye view of my group’s past research and a vision of future ML system research.
13:30 – 14:00Prof. Dragomir RadevYale UniversityClosing the Loop in Natural Language Interfaces to Relational Databases: Parsing, Dialogue, and Generation
Natural Language ProcessingNatural Language is a very efficient method of communication among humans. However, when users want to talk to their computers, translating this NL to computer actions is a very challenging task. One possible way for such human-computer interaction is to translate NL sentences to database queries and then to convert the output of these queries back to NL. In order for such an approach to work, one needs to address several challenges: the lack of annotated question-query pairs, the discourse issues present in multi-turn questions, and the issues that arise in a dialogue context.
In this presentation, I will talk about our work on natural language interfaces to databases. As part of the Yale Spider project, we have developed three new datasets and launched three matching shared tasks. Spider is a collection of 10,181 manually created natural language questions on databases from 138 domains, and the 5,693 database queries that correspond to them. SParC (Semantic Parsing in Context) consists of 4,298 coherent sequences of questions and the matching queries. Finally, CoSQL consists of WoZ 3k dialogues and a total of 30k turns, and their translations to SQL.
I will then introduce GraPPa, a pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We used GraPPa to obtain SOTA performance on four popular fully supervised and weakly supervised table semantic parsing benchmarks. I will conclude with some recent work on text generation from hybrid inputs, such as structured+unstructured text.
14:00 – 14:10Prof. Martin VechevETH Zurich / Architect of INSAITClosing


(03. – 04. October)

Due to limited spaces, to participate in the tutorials, you must first register for the conference. Then, please follow the “Apply for Tutorials” link below. If selected, you will be notified.

If you are a high school student, please register for the conference by writing a short mail to [email protected]. Then you can apply for the tutorials.

*There will be 2 tutorials per day, each tutorial around will take 90 min, between 3-8pm.

Tutorial on Trustworthy Machine Learning

This tutorial will cover some of the latest advances in building deep learning models with guarantees, including robustness, fairness and general safety. We will cover certification methods based on convex relaxations, branch-and-bound and their combination, both differentiable and otherwise. We will also cover recent methods for training neural networks which make them more amenable to certification, including methods which combine symbolic and differentiable reasoning. The methods are general and applicable to different data modalities including vision, NLP, tabular, etc. In the process we will also outline interesting open directions of both research and industrial interest. By the end of this tutorial, the student should be familiar with some of the latest advances in creating machine learning models with provable guarantees.

Tutorial on ‘What does cryptography study?’

This talk will overview the subject of study of cryptography. We will talk about how we formally model and prove security and privacy properties in cryptography. We will cover some basic concepts such as encryption and digital signatures which are fundamental to any security system. Then, we will introduce advanced cryptography techniques such as secure multiparty computation, homomorphic encryption, zero knowledge proofs, differential privacy, we will discuss the problems that they aim to solve and some construction approaches in existing solutions.

Tutorial on Learning to build natural language interfaces for database access

This talk will introduce the challenges of building natural language interfaces to structured data. We will start with a quick introduction to natural language processing, semantic parsing, as well as sentence and structured data representations. We will then look at the most popular tasks in the domain-independent text-to-data translation, such as WikiTableQuestions, WikiSQL, Spider, and SQUALL and the approaches that achieve the best performance on these tasks. Then, we will switch to the issue of data-to-text generation. We will look at some common representations such as RDF triples and flattened tables. Then we will spend some time on popular tasks such as WebNLG, Rotowire, ToTTo, DART, FetaQA, and LogicNLG. If time allows, we will also look at recent work on pretraining large models on structured data such as TURL, TUTA, TaPaS, TABERT, Grappa, and TABBIE.

Tutorial on Statistical Machine Learning: Foundations and Present Challenges

This tutorial will provide an introduction to the statistical foundations of machine learning theory and describe some recent advances and open problems in the field. We will cover several classic concepts in statistical learning, such as PAC-learnability, complexity measures and generalization. We will also discuss several topics of particular importance in the context of large-scale machine learning, such as the interplay between generalization and optimization and the transferability of ML models across different domains. In the last part of the tutorial we will cover some recent developments in statistical learning concerning metrics other than accuracy, such as robustness and fairness, and outline several open research directions in the area.


Ticket TypePrice
High School StudentsFree (see below)
University Students10 BGN
Regular40 BGN

High-school students must register for the conference by writing a short mail to [email protected].



For any questions regarding the conference, please, write us at