Spotlight: Data Quality – Dimension 1, Coverage
An Interview Series with Clarity AI Executive Team on the 8 Dimensions of Data Quality
How does Clarity AI ensure its data is of the highest quality?
Clarity AI uses an 8-dimension framework to ensure data is of the highest quality. Those dimensions are coverage, accuracy, freshness / timeliness, data updates, explainability, consistency, point-in-time, and feedback. In this series of interviews with Clarity AI executives, each of these dimensions is explored and explained. Clarity AI’s expert team creates scientific- and evidence-based methodologies that then leverage powerful, scalable artificial intelligence (e.g., machine learning) to collect, clean, analyze and expand existing data sets to power its sustainability tech platform or to integrate directly into users’ existing workflows.
Dimension 1 – Coverage
Clarity AI’s VP of Product, Ángel Agudo, Head of Product Research & Innovation, Patricia Pina, Head of Data Strategy, Juan Diego Martín, and Head of Data Science, Ron Potok, discuss – with Chris Ciompi, Clarity AI’s Chief Marketing Officer – the critical dimension of coverage and its relationship to data quality, and the importance of having the right capabilities in AI to manage large and complex data sets. One key insight from the conversation was the need for methodologies and tools to move from data to information, enabling insights to be drawn from vast amounts of data.
Another important point emphasized by the panelists was the impact of AI in increasing data quality as it relates to data coverage, highlighting the need for AI capabilities to navigate and manage high volumes of information effectively.
Chris Ciompi: Let’s talk specifically about coverage. Ángel, can you please define coverage as it relates to data quality?
Ángel Agudo: The must-have regarding coverage is to ensure that we offer the right combination of metrics that are relevant for our clients.
Chris Ciompi: Why is coverage important for consumers of sustainability data?
Patricia Pina: Offering a broad coverage is important to ensure different users (with different needs) have access to the data for companies that are relevant for them. Additionally, the universe of relevant companies is dynamic and may change, so coverage helps to identify potential gaps and ensure clients are not missing anything important.
Chris Ciompi: Juan Diego, how broad is the coverage that Clarity AI can provide?
Juan Diego Martín: Clarity AI currently provides coverage that ranges among different modules. We have modules that include 40,000 to 70,000 companies. We also have the ability to do portfolio aggregation, where we cover 80,000 primary funds and over 360,000 share classes. The combination of those is super powerful for our customers to assess their portfolios from different perspectives.
Chris Ciompi: Can you talk a little bit more about why that’s so powerful?
Juan Diego Martín: Customers have problems with data that partially covers either the information on the funds or the information on the companies. Clarity AI is able to combine and aggregate that information, looking at the ultimate composition of a portfolio, and provide information at company level, at fund level, at portfolio level, or even at entity level, through the aggregation of all the portfolios within a specific asset manager.
Chris Ciompi: Ron, how is the amount of coverage that Clarity AI has influenced by artificial intelligence?
Ron Potok: When we talk about coverage, we first think of reported data. Companies publish sustainability reports, and those reports are typically PDF documents with unstructured data, including text, pictures, and tables. We can leverage AI to extract information in a fast and normalized fashion from those documents. The second way we can impact coverage with machine learning and AI is by modeling certain metrics that not all companies report. We can greatly enhance coverage of certain metrics by building machine learning models that correlate things a company does with their emissions. For instance, we can provide CO2 estimates for a large fraction of the companies that do not report their CO2 emissions.
Chris Ciompi: In the machine learning piece, can you say a little bit more about exactly how that’s happening and if it improves over time?
Ron Potok: When we started in 2018, around 5,000 companies reported emissions, but now we have 8,000 companies reporting. As more companies report, it allows us to learn more about these companies and improve the accuracy of our model. Secondly, we constantly improve the models by giving them more interesting features. For instance, we are now working with satellite imagery techniques to enhance our estimation models, which expand our coverage. Knowing which power plants and cement factories are producing emissions helps us get more granular and timely data into the models to predict more accurately.
Chris Ciompi: Thanks, Ron. Patricia, how does coverage help drive product innovation at Clarity AI?
Patricia Pina: We strive for efficiency, scalability, and leveraging technology to innovate at the product level. For reported data, we offer full traceability of data points to their source and links to reports. We also leverage estimation models and machine learning techniques to fill in the gaps, and we provide confidence intervals for each model, as well as any additional details on the type of model and estimate we offer. This allows clients to understand whether the estimate is fit for purpose for their use case. We also innovate in terms of data collection and related services for coverage, depending on changing market needs.
Chris Ciompi: Can you give me a specific example?
Patricia Pina: Sure. For example, we have built an end-to-end sustainable investment product with broad coverage because we had the necessary building blocks in place. We have the first building block in our EU Taxonomy or SDGs solution, which measures and quantifies a client’s contribution to environmental and social objectives. We also have a solution that provides values for the Principal Adverse Impact indicators required for SFDR reporting. And we have a controversies module or an exposures module that gives clients a good sense of governance practices.
Chris Ciompi: Thank you, Patricia. Ángel, can you clarify how the amount of coverage that Clarity AI has affects the capabilities of its tech platform?
Ángel Agudo: In my opinion, what is crucial here is to move from data to actionable information. Especially, when you manage large volumes of data across different organizations, metrics, and industries. This is where methodologies and tech-based tools come in. There are two pieces to this. First, we need methodologies that can help extract insights from the data, and second, tools that can help navigate and sift through that information. The tools need to be able to break down the information into highlights and details, or vice versa, depending on the user’s needs.