Team 2 - 2024 | Deutscher Zukunftspreis

Democratization of Generative AI – Stable Diffusion from Development to Practice

Prof. Dr. Björn Ommer (Spokesperson)*
Dr.-Ing. Anna Lukasson-Herzig**
*Ludwig-Maximilian-Universität München
**nyris GmbH, Düsseldorf

(f.l.t.r.) Dr.-Ing. Anna Lukasson-Herzig, Prof. Dr. Björn Ommer

Communication between humans and computers is becoming increasingly simple. Today, computers can be operated by people giving instructions or formulating requests in their natural language. Previous barriers such as tedious programming of software solutions or mastering the intricacies of complex programs are consequently falling, making the computer ever more accessible for the masses. The computer understands us better and better, it is becoming more and more “intelligent”.

This is made possible by the use of generative AI, artificial intelligence, which has become increasingly powerful in recent years. However, the increase in the performance of AI models is largely due to the exponential scaling of AI model size, thus entailing the computing power required for applying the AI to also increase at a prohibitive rate.

This means that generative AI has reached a point where only large technology companies can continue to develop and host AI models, as only they have the necessary computing resources at their disposal.

The goal of the nominated team was to solve the resulting dependencies and create AI models that are just as powerful but require significantly less computing power.

Prof. Dr. sc. ETH Björn Ommer from LMU in Munich and Dr. Anna Lukasson-Herzig from nyris GmbH have created the foundations and applications to democratize generative AI.

With the innovative and powerful AI model “Stable Diffusion” developed at LMU, it is now possible to run complex AI applications on conventional user hardware or even on an ordinary smartphone.

Generative AI learns the semantic detail of a scene by aiming at synthesizing content such as images. The goal is to generate local details of an image and the big picture, the meaningful context, as well as possible.

For an AI to be able to learn these relationships from training data, it usually must be very large, i.e. consist of a large artificial neural network. But that's exactly the catch. Such an artificial neural network requires powerful, expensive computing capacities in the application.

An innovative approach was found to minimize the storage and computing costs: Instead of describing images directly as a set of pixels, a new, efficient image description language for local image regions was first learned. What makes up the image of a dog? Ears, eyes and the fur on the various parts of the body should be consistent with each other. However, it is not necessary to know how each individual coat hair is curved in order to create a good image of a dog. Nevertheless, we can recognize whether the coat is short or long, smooth or curly. Local details are described efficiently, then the long-range context is captured. Stable Diffusion not only sees trees, but also the forest.

Stable Diffusion then learns a robust representation of objects or scenes by first adding noise to the image and then reconstructing it. This noise is removed in many small steps that gradually make more and more image details appear. The AI must therefore learn a robust representation of the image semantics in order to capture the global context and thus reconstruct the original as well as possible.

This process also leads to the name of the model: stable diffusion. The name is based on the physical process of diffusion. If you put a drop of ink in a glass of water, at first it is a sharply defined drop floating in the water. But then the drop dissolves, its outline becomes blurred until it finally colors the water without any structure and completely evenly. The reason for this is the undirected movement of the individual ink and water particles, the diffusion.

If you now take an image consisting of pixels and start to move the pixels slightly at random, this is a kind of digital diffusion process. The more often you move the pixels, the more blurred the image becomes until it is nothing more than noise.

With stable diffusion, the process is reversed during image generation. You start with an image of pure noise, then step by step the image is changed and structures are created from which the desired semantic units and finally the desired image are created. A reverse diffusion process that ends in a stable state.

Having made generative AI compact and efficient turned it into a catalyst for countless applications, which is why it was important for the developers to make the software freely and openly accessible to everyone. This is the only way to democratize the use of generative AI.

nyris GmbH, based in Düsseldorf and Berlin, has developed a successful business model with Stable Diffusion. The following scenario: A complex technical system has a fault. The technical staff can locate and even photograph a defective component. But where does this component come from? It now has to be laboriously searched for and identified in large spare parts catalogs. The nyris technology provides a remedy here. In just a few seconds, the correct component can be identified using a photo - even though the photo was often taken in poor lighting conditions and usually does not even show the entire component. This reduces the time and cost of repairs. For this technology, a visual search engine had to be trained with images of all spare parts. The problem is that the images of the spare parts do not exist, at least not in sufficient quantity. There are many pictures of cats. Images of specific technical spare parts are not. This is where Stable Diffusion came into play: Stable Diffusion was used to generate large quantities of photorealistic images in different lighting conditions from CAD data, i.e. technical drawings of the spare parts, which were used to train the visual search engine. nyris has shown that it is possible to use generative AI, even if there is very little master data available to train corresponding systems. The development of this technology gives nyris a leading position in the market.

The nyris team and the developers at the LMU chair in Munich are working closely together to further develop Stable Diffusion and create new applications. The long-term goal is to expand the possibilities of generative AI and, above all, to make communication between humans and machines more efficient. There is great potential here for the future of all of us, and it needs to be exploited.

The right to nominate outstanding achievements for the Deutscher Zukunftspreis is incumbent on leading German institutions in science and industry as well as foundations.
The project "Democratization of Generative AI – Stable Diffusion from Development to Practice” was submitted by Bundesministerium für Bildung und Forschung.

Impressions

Download link 300dpi

To the image gallery

Resume

Prof. Dr. Björn Ommer

28.10.1979: Born in Cologne, Germany
1998 – 2003: Graduate studies in Computer Science with minor in Physics
Rheinische Friedrich-Wilhelms Universität Bonn
2003: Diplom in Informatik (summa cum laude)
minor: Physics, Rheinische Friedrich-Wilhelms Universität Bonn
2003 – 2007: Ph.D. student and teaching and research assistant, Inst. of Comp. Science, ETH Zurich
2007: Dr. sc. ETH Zürich, Schweiz, Thesis awarded with the ETH Medal
2008 – 2009: Postdoctoral Scholar, Computer Vision, Dept. of EECS, University of California, Berkeley, USA
2009 – 2013: Assistant Professor for Scientific Computing (W1), Heidelberg University, Interdisciplinary Center for Scientific Computing
Since 2011: present Director of the Heidelberg Collaboratory for Image Processing (HCI)
2013 – 2021: Full professor (W3) for Scientific Computing, Heidelberg University at the HCI/IWR, Department of Mathematics and Computer Science and cooptation at the Departments of Philosophy and Physics
2016 – 2021: Chairman (~acting dean) of the MSc Scientific Computing
2016 – 2021: Director of the Interdisciplinary Center for Scientific Computing (IWR) Heidelberg
Since 2021: Full professor (W3) & Head of Computer Vision & Learning Group, LMU Munich
Since 2024: Member of the Bavarian AI council

Patents

: M.N.M. Afifi, M.S. Brown, K. Derpanis, and B. Ommer: Network for Correcting Overexposed and Underexposed Images, US Patent Application, 2020

Publications

: More than 170 publications in den renown international Zeitschriften und Conference-Proceedings regarding Computer Vision and Machine Learning
Research interests: All aspects of semantic image and video understanding based on (deep) machine learning; esp.: generative approaches for visual synthesis (e.g. Stable Diffusion, VQGAN), invertible deep models for explainable AI, deep metric and representation learning, and self-supervised learning paradigms and their interdisciplinary applications in the digital humanities and neurosciences.
Associate Editor, Senior Area Chair and Program Chair of renowned journals and conferences about Computer Vision and AI (e.g. IEEE Transactions on Pattern Analysis and Machine Intelligence, NeurIPS, CVPR, ICCV, GCPR)

Honors and Awards

: PhD-Thesis awarded with ETH Medal
Fellow of ELLIS Society
Falling Walls Science Breakthrough of the Year 2023 in Engineering and Technology: finalist
Best Paper awards on conferences on Computer Vision and AI

Dr.-Ing. Anna Lukasson-Herzig

21.01.1975: Born in Guttentag, Poland
1996 – 2001: Degree in metallurgy and materials engineering at the RWTH Aachen
2001 – 2005: Research assistant at the BFI - VDEh Research Institute, Düsseldorf
2007: PhD in engineering at the RWTH Aachen
2005 – 2014: Employed at Boston Consulting Group GmbH; last as ‘Principal’, projects in various industries, focus on manufacturing industry, several months of assignments in Brazil, Denmark, USA, and India
Since 2014: Preparation and foundation (2015) of nyris GmbH, serves as managing director

Further activities

Since 2018: Founding member of the national KI Bundesverbandes e.V.
Since 2021: Economic Advisory Council of the Green Party NRW

Scholarships

1997 – 2001: Scholarshipf VDEh, Düsseldorf

Patents

2005: Method and device for rolling a metal strip, EP1786577B1, withdrawn due to a lawsuit by Siemens AG

Publications

2008: “Optimization of steel strip geometry to reduce camber formation in hot wide strip mills”

Honors and Awards

2001: Springorum Medal of the RWTH Aachen University
Otto Junker Award of the Otto Junker GmbH,
VDEh Alumni Award
2017: nyris selected for the first batch of the German Google StartUp Programme and the first batch of the German Microsoft Accelerator
2018: Forbes names nyris as one of the ‘100 most innovative start-ups in Germany’
2021: nyris receives a multi-million euro grant from the European Innovation Council for the development of the synthetic data pipeline and completes the project in 2023 with the highest rating of ‘excellent’

Contact

Press

Sascha Lindemann
nyris GmbH
Max-Urich-Str. 3
13355 Berlin
Mobile: +49 (0) 170 / 22 77 224
E-Mail: press@nyris.io
Web: www.nyris.io

Spokesperson

Prof Dr. Björn Ommer
Computer Vision & Learning Group
Ludwig-Maximilians-Universität München
Akademiestr. 7
80799 München
Phone: +49 (0) 89 / 21 80 73 431
E-Mail: b.ommer@lmu.de
Web: https://ommer-lab.com/people/ommer/

A description provided by the institutes and companies regarding their nominated projects

Stable Diffusion und nyris

The Ommer Chair at LMU Munich has developed an approach to democratising generative AI known as Stable Diffusion. Generative AI has quickly become a widely used enabling technology that is applied practically everywhere. Although its performance has continuously increased, its direct practical usability for users has decreased, as the gain in performance was mainly due to an excessive growth of the complexity of the models and the computing power required. As a result, generative AI quickly reached a point where the models could only be developed and operated by the largest (mostly American) technology companies. The possibilities for users and developers to use these models locally, without transferring their data, and to develop them further themselves have decreased more and more.

The Ommer Chair at LMU Munich recognised a critical problem: control over generative AI, which has become a widespread catalyst for new technologies, was in the hands of a few foreign companies. The goal was therefore to democratise generative AI and make the models powerful and at the same time compact enough for conventional, affordable user hardware.

To achieve this, the chair developed the innovative approach of stable diffusion, which was published in the most prestigious AI proceedings. Stable diffusion learns an efficient and compact description language for content, which focuses the AI on the essential details. Furthermore, the AI can implement natural language instructions. This resulted in an AI that is powerful and, at the same time, easy to use without computer knowledge. To promote democratisation, the software was made open source and is not patented. Already in its first two months, millions of users used the AI, which also formed the basis for many other projects, company start-ups and further developments, such as those of nyris GmbH.

nyris is a visual search platform that gives people a more natural way to find what they are looking for. Based in Berlin and Düsseldorf, nyris serves leading companies in more than 50 countries. Founded in 2015, nyris is financially supported by experienced deep tech investors such as the European Investment Bank, eCapital, Axel Springer and FlixFounders, as well as two long-standing customers, TRUMPF and IKEA.

The nyris technology is based on the use of 3D data from CAD files and their transfer files as input for the stable diffusion model to generate high-quality synthetic spare parts images for training and indexing the visual search engine. nyris is the only provider that can derive the necessary data completely from CAD data and index it for use in AI technologies. This capability puts nyris in a leading position in the market, as most OEMs and their suppliers have limited master data, which is a major obstacle to the use of AI in industrial applications.

The nyris technology enables machine operators to reduce unplanned downtime. By giving field engineers access to nyris visual search of their spare parts, they can identify parts from vast product catalogues rapidly and accurately, retrieve information and complete maintenance tasks. Tests show that the time to identify a part can be reduced from roughly 20 minutes on average to a few seconds. The nyris solution helps to minimize follow up visits by enabling these technicians to identify the correct spare part on the spot. Current processes involving multiple 1st and 2nd level service agents can be significantly streamlined and therefore operation costs can be reduced. Sending around emails with product photos or returning to base for checking with your colleagues or product catalogues manually is now a thing of the past.

The nyris team works closely with the Ommer Chair to further develop the Stable Diffusion model and expand its application. The long-term goal is to massively extend the currently very complex human-machine communication to the highly efficient visual level. Machines, like humans, are already capable of capturing and interpreting images very quickly. This is a huge potential that needs to be exploited.

Nominee 2024 · TEAM 3

Power for the energy transition

Learn more

Democratization of Generative AI

Impressions

More Details

Resume

Prof. Dr. Björn Ommer

Patents

Publications

Honors and Awards

Dr.-Ing. Anna Lukasson-Herzig

Further activities

Scholarships

Patents

Publications

Honors and Awards

Contact

Press

Spokesperson

A description provided by the institutes and companies regarding their nominated projects

Stable Diffusion und nyris

Power for the energy transition