Spotlight: Data Quality – Dimension 8, Feedback

Data Quality July 20, 2023 Ángel Agudo, Patricia Pina, Juan Diego Martin, Ron Potok, Chris Ciompi

An Interview Series with Clarity AI Executive Team on the 8 Dimensions of Data Quality

How does Clarity AI ensure its data is of the highest quality?

Clarity AI uses an 8-dimension framework to ensure data is of the highest quality. Those dimensions are coverage, freshness / timeliness, accuracy, data updates, explainability, consistency, point-in-time, and feedback. In this series of interviews with Clarity AI executives, each of these dimensions is explored and explained. Clarity AI’s expert team creates scientific- and evidence-based methodologies that then leverage powerful, scalable artificial intelligence (e.g., machine learning) to collect, clean, analyze and expand existing data sets to power its sustainability tech platform or to integrate directly into users’ existing workflows.

Dimension 8 – Feedback

Clarity AI’s VP of Product, Ángel Agudo, Head of Product Research & Innovation, Patricia Pina, Head of Data Strategy, Juan Diego Martin, and Head of Data Science, Ron Potok, discuss – with Chris Ciompi, Clarity AI’s Chief Marketing Officer – the critical dimension of feedback and its relationship to data quality. 

Chris Ciompi: Hi again, team. We’re talking about feedback today as it relates to data quality. Ángel, can you please explain the relationship between feedback and data quality?

Ángel Agudo: Of course. Improving data quality is an iterative process, and we need to ensure that we capture any potential feedback about our data. It’s essential to maintain an open channel with our clients, so they can provide additional information or raise concerns about a data point. That way, we can learn from them and provide additional explanations or quickly address any issues.

Chris Ciompi: I understand. Could you explain the process for receiving feedback?

Ángel Agudo: Sure, within our tool, there is an option to report feedback for every data point. Users can select a predetermined answer or include their own message to raise concerns or provide additional information. Once we receive feedback from the client, we put in place our internal processes to act on it, or provide an explanation to help the user better understand the data. If we identify a mistake, we immediately react to correct the data point.

Chris Ciompi: Thank you for the clarification. Patricia, why is feedback important for consumers of sustainability data?

Patricia Pina: As mentioned in previous interviews, consumers of sustainability data need to trust the data to feel comfortable using it in their decision making processes. In the asset management industry, for example, we often talk to fundamental research analysts who are deeply familiar with the companies we cover. They will assess our data against their priors on that company and they will challenge everything that doesn’t sound right to them. It is critical that they have an easy-to-access channel, where they can share their questions and get answers. In some cases we encounter differences on how clients define some sustainability- related concepts. Let’s use the example of exposures: are companies engaged in typically controversial activities such as tobacco, alcohol, or fossil fuels? Different clients have different ways of defining what constitutes an exposure when it comes to financial subsidiaries, operating subsidiaries or minority investments. Those nuances are often uncovered when specific datapoints for specific companies are challenged through the feedback mechanisms. These discussions help us ensure that our products meet the needs of the clients. The output of these conversations is often expanding and adding further flexibility and customization to our products.

Chris Ciompi: Great. Juan Diego, can you explain how Clarity AI ensures its data incorporates feedback?

Juan Diego Martín: Yes, in addition to channels like our customer advisory board and ongoing conversations with our customers, we have a feature that allows instant feedback on missing data or data discrepancies. We immediately address any issues and often find that it’s due to different methodological approaches. We sit with our customers to ensure alignment and we learn from any feedback we receive.

Chris Ciompi: Can you give me an example of something that has come through recently that somehow illustrates about what you just said?

Juan Diego Martín: Yes, there are instances where, for example, our customer is using another provider and they identify discrepancies in the data. We have been able to identify that the data was pulled from a previous year and attributed to the current year without notice nor explanation. Or maybe the information was reported for CO2 total, but the split between Scope 1 and Scope 2 wasn’t reported. In that case, the other provider wasn’t transparent, whereas we make sure it is explicitly mentioned. Once, we reached out to a well-known data provider, and we identified 400 data points that had methodological errors. This is something that happens continuously. Going back to my example of data that was reported as global but missing the emissions: this is a big thing and it has an impact on the score of this company. This is a conversation that I’d say we have every week.

Chris Ciompi: Great, perfect. Thank you. Ron over to you. How does feedback that we receive about data quality influence AI, or how does AI influence that?

Ron Potok: First, I think maybe reframing feedback as responsiveness might be useful. So in my eyes, how responsive are we to a request or a change compared to our competitors and compared to the industry standards? We do have some channels for response, and we’ve mentioned a little bit about being able to challenge or ask questions about specific data points. That’s because in our web app, you can access the raw data associated with it and challenge it. Those challenges are very fast. One of the unique things we have is, for instance, in our controversies news engine, we have a very high responsiveness to controversies that are missed or misclassified. Those examples actually improve our model. So we will use those examples so that the model does not only correct them immediately through humans, but then the model will learn from those examples and won’t make the same mistake again. That’s a very important example of how we include customer, user, and human feedback into improving so that the next time, the model will learn a little bit more about the controversy. What is the severity of it? How is it attributed? Another example of responsiveness has been our advantage in responding to regulatory changes. So that’s an aside, but something that Clarity AI has been very good at is keeping up with the latest changes in regulation, and examples of that are our SFDR product.

Chris Ciompi: Let’s talk about feedback that’s coming from regulators and feedback that’s not directly related to our data but to the ecosystem where we’re playing. How does our system work on that?

Ron Potok: I think there are many ways of looking at this. I’ll start with how we do it. Some of the SFDR PAIs [Principal Adverse Impact], for instance, were related to controversies from companies that are undergoing, for instance, bribery, corruption, or putting biodiversity at risk. Those are good examples of Principal Adverse Impact indicators where the regulator said: “Yes, news can provide information on companies being in violation of those PAIs.” But it was very specific on the language of the violations, right? The regulators rather than mapping it exactly to Clarity AI’s definition of a bribery and corruption incident, used a very specific language. Instead of needing to train thousands of analysts to reread all of the articles associated with those incidents and tag them whether they’re a PAI violation or not, we retrained the model to tag articles appropriately, aligned with the latest regulatory guidelines. But then, on top of that, we had to assess the level of violation. We were able to do that very quickly and provide very high coverage and very timely information in a very short period of time.

Chris Ciompi: It’s a really interesting angle to feedback and not one that I think comes to mind right away when we hear feedback. We think it’s person to person rather than, you know, the ecosystem is actually changing and giving feedback on what we need to do. Okay, any other ways that customer feedback would influence what we then see actually in the software platform itself.

Ángel Agudo: I believe the most clear connection is through the process that I described before. We capture the feedback of all of our clients for every single feature all the time that is included in our process of product validation, discovery of new features, and ultimately the incorporation of new functionalities.

Chris Ciompi: Is there anything about feedback in one module within the platform that may affect all of the modules?

Ángel Agudo: Coming back to the point around consistency, I believe it is clear that, for example, when there’s one dimension that is affected by feedback, we immediately make the necessary changes across all the modules. It could be on the data per se, on the methodology… 

Chris Ciompi: What I’m hearing is that, from a customer perspective, there’s an assurance that if one module is changed, it’s calibrated fairly quickly, if not instantly, with other modules. Versus if we had different human teams doing each module or pieces of each module, that may not come up from a governance standpoint until next quarter. All clear. I want to ask one more question about this. There’s something about scale, right? Whether it’s one piece of feedback from one customer or feedback that’s coming from the ecosystem, with the regulations changing. How can our platform address all of these things at scale?

Ángel Agudo: We can split this into pieces. One is the operational one. When we are capturing different views, for example, when we are talking to clients, we might be receiving pieces of feedback coming from multiple clients at the same time and about the same topic. We use technology to help us put everything together in the most efficient manner, to aggregate the information and to evaluate the different points of views to make a timely decision. From a process standpoint, it’s about consistently translating that feedback into the tool. We have built the tool in a way that connects all the pieces, and the technology helps us to immediately incorporate feedback into every single place where that specific data point, metric, or company is found within the platform.

Chris Ciompi: Thanks for the insights, everyone!

Enter your email address to read more