Spotlight: Data Quality – Dimension 4, Data Updates

Data Quality June 1, 2023 Ángel Agudo, Patricia Pina, Juan Diego Martin, Ron Potok, Chris Ciompi

An Interview Series with Clarity AI Executive Team on the 8 Dimensions of Data Quality

How does Clarity AI ensure its data is of the highest quality?

Clarity AI uses an 8-dimension framework to ensure data is of the highest quality. Those dimensions are coverage, freshness / timeliness, accuracy, data updates, explainability, consistency, point-in-time, and feedback. In this series of interviews with Clarity AI executives, each of these dimensions is explored and explained.  Clarity AI’s expert team creates scientific- and evidence-based methodologies that then leverage powerful, scalable artificial intelligence (e.g., machine learning) to collect, clean, analyze and expand existing data sets to power its sustainability tech platform or to integrate directly into users’ existing workflows.

Dimension 4 – Data Updates

Clarity AI’s VP of Product, Ángel Agudo, Head of Product Research & Innovation, Patricia Pina, Head of Data Strategy, Juan Diego Martin, and Head of Data Science, Ron Potok, discuss – with Chris Ciompi, Clarity AI’s Chief Marketing Officer – the critical dimension of data updates and its relationship to data quality. 

Chris Ciompi: Welcome, everyone. Let’s start with you, Ángel. Can you please define data updates, as it relates to data quality?

Ángel Agudo: The data updates are the processes that we use to incorporate new information into the tool. This is a critical aspect of data quality: ensuring that we have the right frequency to provide the freshest data to the clients, so they can access the most up to date information available.

Chris Ciompi: Understood. Patricia, why are data updates important for consumers of sustainability data?

Patricia Pina: The infrastructure and the processes that enable data updates are critical for consumers of sustainability data. Our clients need to be able to access the most accurate and latest information relevant for their decisions at any moment in time. Clarity AI has invested significant resources in ensuring our data updates can be deployed at different frequencies based on client demand without disrupting their experience and the service we provide to them. For example, our clients need to know immediately if there is a new controversy about a corruption case that has been leaked to the press, but a company will only report GHG data on an annual basis. This requires having the ability to modularize data updates in a flexible and timely manner. Another very important aspect of our data updates is that we don’t need to wait for all companies to report an updated value to reflect it in our product. Companies will report at different moments, Clarity AI will be collecting that data and make it available to our clients as it is published. We are continuously collecting and updating the data to ensure our customers can make use of the best available information on each specific day.

Chris Ciompi: Thanks for that. Okay, Juan Diego. How does Clarity AI ensure its data is up to date?

Juan Diego Martín: Clarity AI has a very streamlined process to, first, detect when companies update specific information in their reports and public sources. Second, to process that information as quickly as possible, and third, to have an additional layer of data quality combining AI and humans, to make sure this information is delivered to customers in the best possible way. Everything is connected. When the company updates certain information, we are already looking for that information. We usually know when it is going to be updated: there is an alert or signal that lets us retrieve this information as soon as possible and automatically ingest it in the pipeline of processing that we use. Therefore, it is run through AI models that are built by the Data Science team, led by Ron. Then, it is included in specific review quality checks. If there is something that needs to be reviewed by a human, we have the right person with the right skills to review this data. After everything is done, there is an additional layer that allows us to build specific information that we want to include in the product. So, this is what I referred to as a streamlined process: detection, ingestion, processing, human algorithms, processing, and delivery to the tool.

Chris Ciompi: Can you say a little bit more about how our data updates at Clarity AI are influenced by artificial intelligence and how does AI influence data updates?

Ron Potok: There are really two components to it. One is efficiency on the quality side, ensuring that the data updates that we make are of high quality and frequent. The second point is our automated process for flagging data points that look suspect, so that they never make it to the product. Those are all automated processes. They’re caught by sophisticated machine learning models, including our reliability model that we’ve spoken about in a previous conversation. That’s all fully automated. In roughly real-time, we can see whether a data point is suspect or not and flag it for further manual review. Another differentiator we have in terms of data updates is being a technology company. We’ve leveraged AI to enable a much more efficient news coverage process than many of our competitors. This allows us to very frequently update our news module to see ESG controversies very quickly and increase the update frequency of controversies, which is very different compared with the majority of our competitors in the ESG space. This ability to frequently update news information is all based on our unique human and AI loop large language model controversies process.

Chris Ciompi: Okay, let’s talk a little bit more specifically about controversies and NLP. How is that working?

Ron Potok: Every day we ingest over 1.4 million articles from more than 30,000 trusted news sources, which are automatically processed through our AI-based news engine. Our AI-engine efficiently attributes, detects, and assigns severity to ESG controversies. Our analysts then verify the results and put them into production. Using AI makes the process much more efficient and enables us to find the needles in the haystack of articles. 

Chris Ciompi: I just want to make one point of clarification. So the AI is increasing efficiency on finding things that need to then still be reviewed by a (human) expert on our team.

Ron Potok: Correct.

Chris Ciompi: Thanks for that. Ok, Patricia, how do Data Updates help drive Product Innovation at Clarity AI?

Patricia Pina: As a digitally native company, we are committed to implementing transparency in a way that is embedded into our digital product and can be accessed and self-served directly by our customers. We apply the same principle to our data updates. Any client at any moment can navigate through our data update log and understand what changes were made on which date and their effects. Furthermore, if our clients want to challenge any of our assessments or data points, they can do that directly through our tool, which will automatically launch a review process and if deemed appropriate, will trigger a data update. End-to-end fully connected processes, including data updates, that are enabled by technology is part of our philosophy.

Ángel Agudo: Let me just add one thing here. There are two pieces to the software platform. One is the technology that enables us to process all the data. Once the information is there, we provide transparency, combining a top-down and bottom-up view, and displaying the ability to help clients understand and use the new data. Transparency means letting clients know every time the data is updated. We also provide a top-down view of all the pieces that have been updated consistently across the products, as well as a bottom-up view that allows them to understand what has happened in individual data points. We also provide alerts and reports to help them keep track of the data. Finally, we provide something that allows clients understand what has motivated the data update. There could be different reasons. Maybe it’s because one company has published a new sustainability report that motivates an update on the information for any given year. It might be that the company has corrected a data point, from a previous report. Whatever the reason is there is a capability that is connected to the data log. This not only signals that the data has been updated but also, the reason why a data point has been updated.

Chris Ciompi: Okay. Let me push in a little bit on the angle that you just put forward about the clients, right? The clients who are using this, do you think they see the same amount of value across those three dimensions that you were just talking about?

Ángel Agudo: Clients find the top-down view helpful because it aggregates what has been updated and makes it easier to digest the information. At the same time, they need the capability to go to individual data points when they are in a more granular analysis. Prioritizing the top-down view is very relevant because it provides a flavor of what has been updated. The granular explainability is helpful when clients need to do a more in-depth analysis.

Chris Ciompi: Got it. Okay, all right. Thanks, everyone.

Enter your email address to read more