How Data Science Can Enable SFDR Reporting

AI December 10, 2021

Delivering an SFDR reporting solution which overcomes SFDR PAIs coverage gaps with reliable data

The European Union’s Sustainable Finance Disclosure Regulation (SFDR) could revolutionize sustainability reporting—and, in turn, rescope the data that companies track to measure their ESG performance.


The purpose of SFDR is to improve the transparency of ESG disclosures by financial product and service providers. As not every piece of relevant data is available at scale yet, compliance with SFDR also requires financial market participants to somehow run before they walk. In the intermediary phase, when data providers are in the process of ramping up their offer, this could be counterproductive and result in poor or misreporting. Data science can help.

Making the right data available at scale requires overcoming two main obstacles:

  • Low reliability of reported data. This can occur because reported company data are fragmented and non-standardized, and conflicting or unreliable values exist across different providers.
  • Incomplete data coverage of metrics and industry sectors. This occurs because of partial or nonexistent reporting.

Figure 1 illustrates SFDR principal adverse impacts (PAI) coverage gaps by geography, size, and selected indicators based on a Clarity AI data science–enabled analysis of 29,000 companies.1 Globally, only 3% of companies analyzed reported more than 70% of the 14 mandatory PAI. Europe leads the way with 10% of firms meeting this coverage threshold, while just 3% of US firms and 1% of APAC firms reported the same. One in five large-cap firms met the threshold, but just one in 50 small-cap firms did. Coverage also varies widely by indicator: 20% of the 29,000 firms disclose carbon emissions data, but just 3% do the same on gender pay gap data.

Our analysis indicates significant coverage gaps in SFDR principal adverse impacts (PAIs).

Companies reporting more than 70% of PAIs (n = 29,000 companies), %

coverage caps in SDFR PAIs chart

In addition to meeting the PAIs, SFDR requires evidence of inclusion of good governance practices and that companies account for the Do Not Significant Harm (DNSH) test—each of which comes with its own data coverage challenges. The implementation of the European Commission’s Corporate Sustainability Reporting Directive (CSRD) will help bridge the gap, eventually compelling close to 50,000 companies to report sustainability performance on a comprehensive set of metrics. CSRD will be fully implemented by 2025, and non-European jurisdictions are likely to lag even further behind.

Clarity AI is a sustainability tech firm and platform with the mission of bringing societal impact to markets.

SFDR reporting requirements add another layer of data to the 200 metrics that Clarity AI already provides to evidence performance on ESG risk and impact on the world, as well as alignment with climate targets (including those of the Task Force on Climate-related Financial Disclosures) and alignment the UN’s Sustainable Development Goals (SDGs). As a one-stop shop, Clarity AI also provides clients with robust and comprehensive solutions to meet their SFDR disclosure and product design requirements, leveraging our data science capabilities.

In this paper, we provide specific examples of the merits of our data science approach for SFDR
reporting. We address the prerequisite for sound modeling and recommendations on how to use the data. We also highlight current limitations and how we intend to further develop the SFDR analysis and reporting module in the coming months. The paper is framed around three specific use cases leveraging different data science techniques, illustrated through SFDR requirements:

  1. How data science can improve reliability of reported data
  2. How machine learning can expand data coverage
  3. How natural language processing can inform metric development

Whereas SFDR covers several asset classes, including sovereign bonds and real estate, we will focus on corporates.

Clarity AI achieves optimized outcomes thanks to three key differentiators.

Assemble the largest collection of structured and unstructured sources to cover all key topics and industry sectors

Aggregate, clean, and standardize assembled database to improve data quality, and continuously improve models through in-house and external expertise (partnerships with academia and consulting firms)

Implement state-of-the-art machine learning and data science techniques with scalability in mind for automatic best source selection and to obtain accurate estimates for non-reported data, increasing reliability and coverage

To what extent can modeled data be used in SFDR reporting?

The first question on our journey was to figure out to what extent SFDR allows financial market participants to use modeled data. To answer this fundamental question, we reached out to our sustainable regulation external advisor and partner Eco:Fact.

One of the key aims of the SFDR, and other sustainable finance regulations, is to reduce an asymmetry of information between financial market participants and investors. Consequently, financial market participants are expected to support their reporting and make decisions based on data that still might not be unavailable. For example, this can be noted when the EU’s innovative sustainable finance regulations introduced requirements for reporting on sustainability risks and adverse impacts on sustainability factors.

Although these two categories are closely related, they require financial market participants to assess sustainability topics, such as climate change and human rights violations, from different perspectives:

  • a “sustainability risk” analysis focuses on potential material negative impacts on the value of an investment that stem from sustainability factors (e.g., the impact of sea level rise on property values).
  • consideration of “adverse sustainability impacts” concentrates on an investment’s negative effects on sustainability impacts (e.g., investment in highly polluting companies that negatively impact ecosystems and individuals’ health).

Appropriate data is needed to conduct assessments such as those described above; data availability, accessibility, and reliability are central to financial institutions’ efforts to answer questions about sustainability risks and adverse impacts, and thus meeting the SFDR’s expectations. Regulators are aware of the data-related challenges institutions face, and they are currently designing solutions to bridge this data gap via, for example, the proposed Corporate Sustainability Reporting Directive (CSRD). In the meantime, the European Union regulator provides tools that financial market participants can use to address the issue of data availability.

One tool is mentioned in Article 7(2) of the Regulatory Technical Standards (RTS) (commonly referred to as level 2). (The RTS under the SFDR are expected to be become applicable sometime in 2022.) This provision is relevant for situations where financial market participants are requested to disclose data on principal adverse impact indicators but that data is not readily available. In this context, financial market participants are expected to use “best efforts” to obtain the information they need, either directly from investee companies or by carrying out additional research. They can also cooperate with third-party data providers or external experts or make reasonable assumptions. It should be noted that financial market participants must also report what constitutes their best efforts.

In this situation, financial market participants’ use of modeled data is one solution to tackle the challenge of data gaps—this strategy fulfills the criteria of the expectation to use best efforts. For example, modeled data enables financial market participants to base their disclosures on an approach that is verifiable and that is used by other market participants to make reasonable assumptions about impacts on sustainability factors.

Download the Full Report

1. For key SFDR PAI, our data science expertise has allowed us to multiply data coverage by approximately five times, on average. In some cases, when no data were available, we developed entire new data sets. Alongside coverage issues, financial market participants also face reported-data reliability issues, which is another area where our data science approach provides significant improvements.

Enter your email address to read more