Spotlight: Data Quality – Dimension 2, Freshness
An Interview Series with Clarity AI Executive Team on the 8 Dimensions of Data Quality
How does Clarity AI ensure its data is of the highest quality?
Clarity AI uses an 8-dimension framework to ensure data is of the highest quality. Those dimensions are coverage, freshness / timeliness, accuracy, data updates, explainability, consistency, point-in-time, and feedback. In this series of interviews with Clarity AI executives, each of these dimensions is explored and explained. Clarity AI’s expert team creates scientific- and evidence-based methodologies that then leverage powerful, scalable artificial intelligence (e.g., machine learning) to collect, clean, analyze and expand existing data sets to power its sustainability tech platform or to integrate directly into users’ existing workflows.
Dimension 2 – Freshness
Clarity AI’s VP of Product, Ángel Agudo, Head of Product Research & Innovation, Patricia Pina, Head of Data Strategy, Juan Diego Martin, and Head of Data Science, Ron Potok, discuss – with Chris Ciompi, Clarity AI’s Chief Marketing Officer – the importance of data freshness, or the timeliness and relevance of data, in driving accurate and impactful decision-making. The group highlighted the impact of data freshness on various industries and use cases. They also discussed the challenges of maintaining data freshness, such as data silos, data infrastructure limitations, and technical debt.
The panelists stressed the need for a data management strategy that prioritizes data freshness, including investing in data infrastructure, establishing clear data governance policies, and incorporating machine learning and AI technologies to automate data processing and ensure data accuracy. The participants also shared insights on the different dimensions of data freshness, including data coverage, frequency, and latency, and discussed strategies to optimize each dimension. Overall, the panel emphasized the critical role of data freshness in enabling organizations to make informed decisions and drive positive impact and underscored the need for ongoing investment in data management and technology to ensure data freshness is maintained over time as a critical component of data quality.
Chris Ciompi: Hello again, everyone. This time we’ll focus on freshness and how it relates to data quality. Ángel, can you start by defining freshness as it pertains to data quality?
Ángel Agudo: Sure. To me, freshness means having the most up-to-date and clear data available in Clarity AI. The specific service level agreement (SLA) for freshness should be defined by the market, but it should be as soon as possible, so we can offer relevant and timely information to users.
Patricia Pina: I agree with Ángel. In decision-making, having access to the latest information is critical, especially in a world where things are constantly changing. With sustainability data, for example, climate change is happening rapidly, and we’re running out of time. Therefore, freshness is essential. Additionally, companies are making commitments to reduce emissions, and it’s crucial to track their progress and hold them accountable for their promises. So, getting quick and fresh data on their performance is crucial to ensuring that they’re following through on their commitments.
Chris Ciompi: Juan Diego, can you elaborate more on how Clarity AI ensures the freshness of the data it provides?
Juan Diego Martín: Certainly. We have streamlined processes in place to ensure the freshness of the data. Firstly, we continuously monitor when companies update their public information and report it. Secondly, we extract and process information using a combination of technology and experts. We also apply quality controls in the same loop to prevent any suspicious data from being processed. Thirdly, we have an automatic ingestion data pipeline that enables us to make the information available to our customers through their preferred means, such as data feeds API. Additionally, we offer a service terminal to our customers with very frequent updates, with the most frequent updates being when a new controversy is detected. We process structured and unstructured controversy information from more than 1.4 million news articles from over 33,000 trusted news sources daily.
Chris Ciompi: Thanks for the explanation, Juan Diego. I’d like to focus on the second point you mentioned earlier, about the right combination of technology and experts. Could you provide more details on this?
Juan Diego Martín: Sure. While we use AI to automate most of the work, such as spotting the right information and extracting it, there are instances where the information is spread throughout the report. For example, employees’ data may be included in different subsidiaries, and emissions may be disclosed per business line in various sections. In such cases, we need experts to make sense of the automatically extracted information and ensure that the aggregated data is accurate and reflects what our customers expect it to mean.
Chris Ciompi: Thank you. Ron, how does artificial intelligence influence data freshness at Clarity AI?
Ron Potok: Well, as Juan Diego mentioned earlier, computers can read and see fairly well nowadays. We can leverage these technologies to help us quickly and efficiently collect data. However, we provide financial data to make financial decisions, which means that the data needs to be highly accurate. Statistical models can never achieve 100% accuracy, so a combination of computers and people is necessary to ensure both efficiency and accuracy. We need both to ensure that fast and high-quality data reaches our customers.
Chris Ciompi: That makes sense. Do you have any interesting cases to share about how Clarity AI uses machine learning techniques in data extraction?
Ron Potok: Yes, we have another case related to our estimation models. For companies that don’t report their sustainability information, we can’t use AI to extract their data. However, we can get financial information quite quickly for each fiscal year because companies are generally quick in disclosing this information. We could estimate emissions for these companies quickly, but we choose not to do that. We wait for companies to start disclosing their sustainability data first, so we can ensure that our models are properly calibrated every new year of data that comes in. We wait for newly reported data to be reported before we roll out our new estimates to ensure that the world hasn’t changed, and the estimation model needs to be revised. We have an additional quality control every year.
Chris Ciompi: That’s interesting. So, the reason for waiting is that the models learn from history, and if the past is no longer representative of today, we need to take that into account?
Ron Potok: Exactly. The models learn from history, and if the past is no longer a perfect predictor of the future, we need to modify our models continually to predict today accurately. We’re expecting innovations around environmental components, so we expect the world to change over time, and the past won’t always be a perfect predictor of the future. We need to modify our models continually as we move forward to ensure that they predict today accurately.
Chris Ciompi: That’s great. Can the models learn and adapt themselves over time?
Ron Potok: Yes, the models can learn about what features are driving the changes. We can have the model be smart enough to understand the changes that we know are coming. We can leave room for new technologies, and apply AI to make things more efficient. We can predict features that are forward-looking in nature. For instance, if a country says that they are going to phase out coal plants in three years, we can add that information into our models to know what’s going to happen in three years.
Chris Ciompi: Makes sense. Thanks, Ron. Patricia, how does data freshness help drive product innovation at Clarity AI?
Patricia Pina: At Clarity AI, we strive to find alternative sources of relevant data that are available earlier than the data published by companies once per year, typically a few months after the corresponding reporting period is over. For example, we use real-time satellite data to infer how much companies are emitting, rather than waiting for one and a half years to know what happened today. This allows us to provide fresher data to our users.
Chris Ciompi: How does data freshness at Clarity AI influence the capabilities of the tech platform?
Ángel Agudo: Data freshness is critical for our users to make informed decisions. It enables us to show that the most up-to-date information has been included, which is important for explainability. Our goal is to be as efficient as possible in publishing data and making it available to users. As we capture data and update it faster, we can show users how quickly new information is available and help them be more proactive in their decision-making process.
Juan Diego Martín: Our ability to detect when new information is published every year allows us to predict when the information will be available and streamline our processes for updating it. This is valuable to our customers because they can plan around when the information they need will be available in our product.
Ángel Agudo: All the dimensions we mentioned, such as data accuracy, completeness, and timeliness, are essential for our users to make informed decisions. While it may be complex to achieve, we are innovating and providing value because these dimensions are important for decision-making. Without the right data or with errors in the data, users may not make the right decisions.
Chris Ciompi: That makes sense of course, but let’s bring it back to the dimension of freshness as it relates to data quality for the time being. Anything else to wrap us up?
Patricia Pina: The last point I wanted to make is that some clients have had to invest significant resources, effort, and time with sustainability data providers to clean the data and ensure its freshness, to make sure they were buying the right data to fit their needs. Whereas us, we proactively ensure our clients don’t need to go through costly and unnecessary processes.
Chris Ciompi: Thanks, everyone!