Putting the rigour into
data and AI engineering practices.

Technology leaders, particularly CTO/CIOs come in different guises and offer focus in different areas; let’s just generalise to say that some gravitate more to the business or the politics, others to the creation process or the code and every bit in between.

Failed or difficult to deliver/support software systems or worse still cyber security incidents are a direct consequence of neglecting to care enough about engineering practices often manifested from poor business ethics and management dysfunctions.

Through my own experiences with data, I have found that shifting testing to the left (starting testing from the outset) and testing upstream data sources can help you to detect issues early and overcome hidden issues in data sources, transformation, or ingestion logic. Approaches like Extreme Programming (XP) and Test-Driven Development (TDD) are more important than ever despite being underappreciated when compared to popular agile approaches such as Scrum, Kanban or dare I say it SAFe.

With the lower barriers to entry to being an engineer – provided by Gen AI – are we asking for more trouble?

XP is often synonymous with TDD, but it is about so much more than that in terms of its underlying philosophy. XP values communication, simplicity, feedback, courage, and respect.

Introducing XP values and practices

Coming from a cloud and software engineering background (I’ve done this for longer than I’d like to admit), I’m used to working with teams that take a ‘write tests before code’ approach as part of the development lifecycle and it’s always been a non-negotiable to me regardless of business pressure or management dysfunctions.

XP is often synonymous with TDD, but it is about so much more than that in terms of its underlying philosophy. XP values communication, simplicity, feedback, courage, and respect.

Each value contributes to a cohesive, quality inducing development environment:

  1. Communication: Ensures clear, concise, and consistent understanding among team members, typically working in small, multidisciplinary teams.
  2. Simplicity: Encourages simple, straightforward solutions to complex problems.
  3. Feedback: Offers continuous insights into the development process to improve morale, cohesion, and productivity.
  4. Courage: Empowers engineers to make necessary changes without fear, blame or assigning fault.
  5. Respect: Promotes a collaborative, diverse, and supportive team environment.

From a practical perspective, teams that embrace XP will typically adopt a series of practices that are relevant to them, such as Continuous Integration (CI), Pair Programming, and Coding Standards for example; although TDD is one of the key practices.

A straightforward way to describe TDD is through three steps ‘Red, Green, and Refactor’:

  1. Red: Create a test that fails e.g., define functional needs.
  2. Green: Write code that passes the test e.g., satisfy functional needs.
  3. Refactor: Clean up and improve the code without breaking functionality.

How XP-style approaches apply to data and AI

A TDD based approach is not relevant for every aspect of data or AI projects because by its nature parts of data science work, for example, is more exploratory in nature and there is little value in writing tests if you don’t know what the expected output is.

However, I think it’s applicable in ways that might not be obvious at first and it still is underutilised in the data industry and like in software it will take time to become more commonplace.

Here are some of the ways XP practices, such as TDD, can be applied to data and AI work:

Data engineering

TDD can help to avoid breaking changes in data engineering work, which is a common issue in the data industry. To create robust data pipelines:

  • Data validation: Creating tests that validate data quality, ensuring accuracy and consistency as part of a data quality strategy.
  • Pipeline reliability: Running automated tests can quickly find issues in data pipelines, reducing downtime and enhancing reliability.
  • Code quality: By taking a test first approach, engineers are encouraged to write simple, maintainable code that is continuously improved.

Data Science

While data science emphasises exploratory analysis, TDD can still offer significant advantages:

  • Model validation: Checks that models perform as expected under different scenarios, promoting greater trust in data outcomes.
  • Reproducibility: Tests traceability and repeatability of model development, making experiments reproducible.
  • Integration testing: Ensures that the components of a data science pipeline work seamlessly together.

Artificial intelligence

AI systems, especially those based on machine learning, can benefit from TDD:

  • Model accuracy: Continuous testing supports model accuracy by finding data drifts and other issues early on.
  • Scalability: When AI/ML models grow larger, robust testing practices ensure they are still robust and scalable (promoting production ready solutions).
  • Ethics: Tests that ensure AI/ML models adhere to ethical guidelines/standards, preventing unwanted and unexpected, biased outcomes.

Integrating DevSecOps and Cyber practices

As data and AI systems become increasingly popular and important to businesses (resulting in an increase in cyber-attacks), the need for robust security measures grows.

Note; data platforms or centralised data stores being attacked could have more devastating consequences than data that is held across several different, independent systems, which is why this is an increasingly important concern.

DevSecOps integrates security practices into a typical DevOps/DataOps workflow, ensuring that security is a core component of the development process:

  • Data protection: Ensures sensitive data is encrypted and access is controlled, protecting against breaches.
  • Threat mitigation: Regular vulnerability assessments and threat modelling to help find and mitigate risks.
  • Compliance: Helps satisfy stringent regulatory requirements, avoiding legal issues and supporting customer trust.

Note; The testing approaches described focus on business or user requirements; however non-functional requirements (NFRs), such as performance, monitoring, and data quality need to be considered as part of a holistic, left shifted approach to testing.

data platforms or centralised data stores being attacked could have more devastating consequences than data that is held across several different, independent systems, which is why this is an increasingly important concern.

Expanding on testing approaches, good practice, and anti-patterns

Maintaining reliable, accurate data pipelines is crucial in data projects. In software engineering, teams strive for close to full test coverage of code; however, this is not realistic in data engineering because of the complexities of data pipelines – you need to be a lot more focussed, specific, and selective for testing approaches to add real value.

This doesn’t apply or map in a linear way to data work because data work requires both code and data testing and a much more selective approach to where you place effort e.g., placing greater emphasis on data sources and interdependencies.

In data most of the testing effort should be on the data integration and validation testing types (flow, source, and contract tests, for example). This is because these areas involve dependent data sources and carry the highest risk if data is missing once migrated or mappings are wrong, for example, or data integrity is in question, and it helps to avoid breaking changes if a source system changes its schema.

Note; other tests are still important and applicable, but it is about risk-based prioritisation and return on effort.

Testing core data architectural layers

Higher risk or critical areas should be identified/prioritised according to business priorities, risk assessments and domain knowledge. Engineers can therefore focus their efforts on ensuring the most critical aspects of the data pipeline are thoroughly tested.

To explain approaches to distinct types of testing, in simple terms we can think of a data platform as being decomposed into three core architectural layers:

  • Source: Receiving, validating, and storing data from various sources, such as data files, APIs, or other mechanisms.
  • Integration: Processing, validating, and integrating different datasets.
  • Functional: Presenting data to business intelligence systems and dashboards.

Data quality starts and ends with data ingestion from source systems. I’ve seen lots of issues with broken data pipelines or systems caused by an interface being changed where there is a dependency.

The key to effective integration testing is to define interface contracts or data contracts (data contracts call for a dedicated blog post) early on in collaboration with other teams.

Development should be avoided without a clear contract in place that outlines the expectation for data structure, quality, and business rules.

Data integration is critical to the success of an organisation, it enables interoperability, actionable insights, and outlines business risk. The integration layer is where data is processed, transformed, and combined.

Effective integration testing is about writing tests that ensure that ingested data matches the expected formats and agrees to defined schemas. This helps to find inconsistencies or issues that could affect downstream analytics or applications. Tests should be triggered when a pipeline changes to underlying schema changes, for example.

The integration layer can become large and complex, so tests should be based on predefined business logic rather than user experience alone or gut feel. Integration logic and presentational logic should also be delineated as mixing these two different concerns can create complexities.

The last stage of the lifecycle is the presentational layer where data is presented to users in reports or dashboards. Functional tests are used to verify the quality of the delivered output.

Functional tests should be defined upfront with end users and are used to find expectations in terms of data quality, acceptable thresholds, and expected behaviour.

Where functional testing can lose value is if the tests become too technically focused and don’t consider user needs and functionality, or where tests become too generic e.g., where they broadly try and check that everything works.

Higher risk or critical areas should be identified/prioritised according to business priorities, risk assessments and domain knowledge.

A case study of where it can pay dividends

Testing practices with data platforms

In my last blog post, I introduced data platforms, and one of the benefits of data platforms when adopting robust engineering practices is standardisation and consistency.

Platforms like Azure, Databricks, and Snowflake each have standardised ways and integrated tools for writing unit tests for notebooks or data pipelines, which is inherently more efficient than having multiple teams doing it in diverse ways with different tools.

In some cases, using specialist tools for data testing can be useful, and DBT is one such tool that is used for data testing and is often supported natively in popular data platforms.


Latest Insights