Leading the DDU Research to Build an AI That Understands All Documents in the World
Captured source
source ↗Leading the DDU Research to Build an AI That Understands All Documents in the World | by LG AI Research | Medium
Sign up
Get app
Sign up
Leading the DDU Research to Build an AI That Understands All Documents in the World
8 min read
Apr 14, 2025
--
Share
AI is now being applied to various industries, driving groundbreaking changes in our everyday lives. As AI becomes capable of understanding and analyzing specialized content within documents, it can suggest tailored applications for different job roles. It also processes diverse data from industrial sites and swiftly optimizes production scheduling.
LG AI Research has unveiled Deep Document Understanding (DDU), a technology that enables AI to comprehend various types of information, including pictures, graphs, and tables in documents. We aim to apply this technology to conversational AI services and AI prediction services to expand its scope of use and develop unique solutions.
Today, we’re interviewing Ahra Jo from the Vision Lab, who has been leading the development of DDU technology, to learn more about the research and life at LG AI Research.
The Biginning of My First Career
In my third year of undergrad, I developed an interest in HCI(Human-Computer Interaction), which naturally led me to study Vision on my own. I was particularly fascinated by the idea that our visual perception can be converted into RGB and that computers can use this information to provide valuable insights. This sparked my broader interest in AI.
While considering ways to make a positive impact on society with Vision AI, I came up with the idea of sign language recognition. I wondered — if sign language could be recognized and converted into text or speech, wouldn’t it be possible to facilitate communication with visually impaired individuals who don’t know sign language? Motivated by this thought, I aimed to present my work to the world as my undergraduate graduation project.
I pursued graduate school with a strong desire to conduct research on sign language recognition. However, in graduate school, I kept sign language recognition research close to my heart but spent my time working on other projects and staying busy. Instead, this journey allowed me to explore diverse areas of vision research, where I learned about the vast potential for expanding AI technology.
Right after receiving my PhD in 2016, I joined a startup, where I worked on a tax evasion prediction project for about a year and a half. I still remember the CEO saying, “As technology advances, it should serve a greater purpose. By identifying black money through tax evasion prediction, we can return it to the country and make the world a better place.” His vision of contributing to social justice through technology resonated with me, which is why I joined the company.
This was probably the only time I focused on AI models using data rather than images. I dedicated myself to the project for a year and a half, and while we made significant progress, I found myself longing to work with Vision again. That’s when I received an exciting offer from LG CNS and joined the team, marking my first connection with LG.
At LG, I worked on the Smart Store project, where customers could place the products they wanted to purchase on a self-checkout counter, click the checkout button, and a camera would scan the items on the shelf within a second to process the payment. I handled data collection and model development myself, achieving a recognition accuracy of over 99%.
Later I was invited to join LG AI Research(AI Research of LG Science Park, formerly LG AI Research) in 2019. I was confident that at the institute, I would be able to introduce the AI models I developed to the world. In the early days of my tenure, I was in charge of a vision inspection project. After returning from parental leave in 2021, I became fully involved in the DDU project.
The Infinite Expansion Potential of DDU Technology
Currently, the DDU technology is managed by the DDU Squad within the Vision Lab at LG AI Research. Initially, it worked on Basic Structure, Super Atom, and Stereo Chemistry in 2021, gradually expanding its scope to Polymers in 2023 and Vision Markush and Reaction in 2024.
Starting in 2024, we are broadening our research to general document-related areas, developing Vision-based models such as layout detection, Table to HTML, Chart to Table, and OCR. Our major achievements include delivering layout detection, table to HTML, and molecular formula conversion models to LG Chem, as well as layout detection, molecular formula conversion, and Reaction models to global publisher.
Additionally, we packaged all the models into APIs, making them easily accessible for projects that require them. For example, in ChatEXAONE, an AI assistant for enterprise professionals recently released by LG AI Research, the DDU technology has been partially applied to the process of understanding user documents, and we plan to gradually expand its use. In the Financial AI project, it is used to extract and organize numerical data from charts and tables in financial report documents to generate new financial reports.
In EXAONE Discovery, a platform for discovering new materials, novel substances, and drug discovery, DDU technology is utilized to detect and convert molecular structures. To enable efficient application across various projects, we have developed all our DDU models as APIs.
LG’s DDU: What Sets It Apart?
An image explaining LG’s DDU technology
Get LG AI Research’s stories in your inbox
Join Medium for free to get updates from this writer.
Subscribe
Subscribe
Remember me for faster sign in
LG AI Research’s Deep Document Understanding (DDU) technology aims to understand all types of documents worldwide. To achieve this, we focused on developing technology that can analyze and understand all the contents of a document and extract the relevant information. Our competitive advantage lies in two key areas.
The first is accurate information extraction for anti-hallucination, along with the understanding technology that supports it. Recently, the rapid development of LLM generative models has permeated our lives, but there are still hallucination issues based on the data that the model has already learned. To minimize this problem, our DDU Squad aims to extract precise data using a Vision-only model.
The second is independence from document extension. Since we process all documents as…
Excerpt shown — open the source for the full document.
Notability
notability 4.0/10Unproven research claim, no traction