Data Trends for 2025
Data has often been described as the new oil because the effective use of it can revolutionize our society just like fossil fuels have in the past. Consumers continue to create and share data at a faster pace and businesses continue to accumulate this information.
The convergence of artificial intelligence (AI) breakthroughs, such as the implementation of large language models (LLMs) and the adoption of generative AI tools, are changing consumers’ perspective on big data. They are asking to solve complex questions and demand a more engaging user experience with results they can trust. This is possible when you can connect vast amounts of disparate data sources to create a fuller context and leverage technology to analyze the data.
The returns on successful implementations are very high: Companies can build an intimate understanding of their consumer profiles, purchasing patterns and responses to incentives. This can lead to better customer loyalty and ultimately more revenue.
The Case for Data Ethics
We all use AI to some degree, from asking personal assistants like Siri to find directions to the nearest gas station or asking zoom to summarize the meeting notes. I assume we don’t worry too much about the “technology” that makes it happen.
Now, imagine your insurance company leveraging AI to understand your driving history for quoting a policy on your new car, or your bank leveraging AI to score your account history and credit for a new home loan. What happens if your premium doubles or you get denied for the new mortgage? And how much is at stake when AI is used in healthcare to scan medical images to look for disease and used for drug discovery?
Data ethics is rooted in moral principles and promotes values such as privacy, fairness, transparency and accountability. These tenants are the foundational make-up for privacy laws such as the Federal Data Strategy or the California Consumer Privacy Act (CCPA). For data practitioners, an ethics framework offers best practices associated with collecting, processing and using information for software development. Some obvious benefits include:
- Consistent application of ethics: Companies can design architectures that meet federal, state or international policies. Everyone should be abiding by the same regulations so consumers can expect the same level of protection across systems.
- Reduce risks: By assessing impacts at every phase of the project lifecycle, developing proactive solutions and coming up with mitigation plans. Companies can adapt better to changing ethical regulations and environments
- Increase transparency: With clear communication to stakeholders and documentation on the collection, testing and use of data, organizations can share how they make decisions. This can include explaining limitations with datasets, as well as the steps taken to remove biases.
- Improve trust: Collectively, these are some of the steps that will contribute to increasing public trust and a willingness for consumers to continue engaging with the company.
Data Ethics Tips to Consider
Data ethics may seem like a distant, academic theme for bureaucrats to argue about. The reality is we all have some ownership and responsibility and can certainly influence the application of good ethical practices with our data sources. We’ve provided a list of questions for you to consider before starting any projects. The goal is to ask the question, be thinking about it and increase awareness. We understand that some of the policies are baked into your user agreement.
1. Did We Get User Consent?
User consent is the authorization to collect, store and use customer personal data. It’s described as the cornerstone of legal and ethical business practices. How do you know if they have consented? Perhaps your customer was presented with a banner asking them to check a box agreeing to the privacy policy. Informed user consent is critical to establishing trust between the consumer and your organization, implying they are guarded by data protection laws.
Other considerations with user consent:
- Does it provide clear descriptions of the consent? Are there checkboxes that give them additional permission?
- How will the data be used and will it be shared with other parties?
- Does it provide an ability to revoke consent at any point?
- What regulations bind the consent, e.g. GDPR?
- What happens to user data if the company is acquired?
2. How Will We Protect User Privacy and Confidentiality?
According to a 2023 Pew Research Center study, 81% of Americans familiar with AI are not comfortable with how information collected from them will be used in the future. The bar is high, and companies must be deliberate with embedding privacy and confidentiality into the business lifecycle. This means integrating into products and services seamlessly, giving it the same priority as features or functionalities.
Privacy
Privacy is the ability for someone to determine when, how and to what extent their personal information is shared. The data attribute can be the name, date of birth or location. The challenge for organizations is to balance the use of privacy data, for example providing location-specific recommendations, and stay within user expectations. The consequences of a data privacy breach are significant, leading to identity theft or unrestricted monitoring on digital platforms.
The organization should conder the following when planning or processing privacy data:
- How are storing we privacy data, who has access to it, how is it encrypted?
- Are we complying with regulations such as General Data Protection Regulation (GDPR) or California Consumer Privacy Act (CCPA)?
- How much personal data are we collecting. Is it necessary for performing core functions?
- Do consumers or stakeholders understand what data is being collected and how the data will be disposed of when it’s no longer needed?
- Does the organization have robust policies, procedures and a mitigation plan for dealing with cyber threats or bad actors?
Confidentiality
Confidentiality ensures that only authorized parties can access organizational data. It starts with designing a secure environment where data can be separated by expected usage and roles are defined with specific permissions. The organization should consider the following:
- Do we have proper access control measures such as passwords or multi-factor authentication?
- Are sensitive data such as personal identifier information encrypted?
- Do we have a process and tools to manage data loss prevention?
- Do we have proper training to educate employees on data confidentiality?
Confidentiality considerations would allow the organization to adhere to protection laws such as FERPA or GDPR. It can help prevent harm and legal risks from unauthorized access and increase trust with stakeholders.
3. How Are We Processing Our Data?
When selecting data for an upcoming project, we should ask ourselves if information is free of bias. Bias can manifest in many different forms and some examples are provided later in the blog. Organizations should consider a robust review process, starting with peer reviews of datasets to a formal review by an established board for bigger projects that are sourced mostly from human data. The board members should consist of a diverse set of stakeholders including ethics specialists.
Some considerations to include:
- Is there an established governance board or team approving the scope of data for projects?
- Provenance of the data: What is the source of the data, and what is acquired with proper consent?
- Purpose of the data: What will the data be used for and would the original source approve this re-use?
- Protection of data: What steps have been taken to protect the data? How long will it be needed. Will it be destroyed after the project is completed?
- Privacy: Will the data be aggregated or anonymized? Who will have access to the raw and processed data?
- Preparation: How is the data cleaned and transformed? Will any downstream data sources preserve anonymity? Are there processes to validate accuracy of the data? Are there rules for orphan or missing data?
- Are the procedures and policies for data sharing? Have outside vendors getting access to the data been properly on-boarded?
- Are there data attributes that can present unfair bias, such as demographic information, and can that be removed without compromising the analysis?
Examples of Ethical Conflicts with Data
Analysis Bias and Discrimination
Bias data can lead to making inaccurate conclusions about the population. For example, an Amazon AI-based candidate evaluation tool was scrapped after it was found to use past decisions as input and disproportionately exclude women from the qualified pool.
1. Cognitive Bias
Humans are trained to make hundreds of decisions a day, often leaning on our past experiences to reach conclusions quickly. These shortcuts are known as heuristics and can create a cognitive bias. When working with data there is a chance this pattern will manifest subconsciously, for example picking data that aligns with our belief system. If we already have a conclusion in mind, we may seek out data that positively supports our argument.
To avoid cognitive bias, it is important that we articulate the research question, our hypothesis and define the goals for our analysis. Once this is documented, we should gather information that supports the arguments on both sides. This deliberate process will produce a more neutral result set we can compare against our original question and hypothesis.
2. Historical Bias
Historical bias occurs when the data used for analysis no longer represents the current reality. For example, an automotive safety analysis that uses only a male dummy driver and a female passenger dummy driver. This may have been a normal pattern when cars were first manufactured but is no longer relevant today. Another example is looking at gender incomes to make conclusions. It may still be a problem today, however, historically the inequality was even larger and older data sets may no longer be relevant.
To avoid historical bias, we can audit the data set, ensure we are establishing inclusivity parameters and be mindful of existing bias when selecting historical datasets.
Lack of Transparency in Data Analysis
This can occur when the data selection, methods, assumptions and limitations are not clearly documented, communicated or left out on purpose.
An example is a customer satisfaction survey that is skewed, targeting high revenue customers. In this instance, it is likely that there is a positive correlation between repeated purchases and high customer satisfaction. It clearly leaves out a critical population segment necessary for a complete analysis. This can mislead stakeholders, leading to wrong marketing campaign decisions.
To avoid a lack of transparency in data processing, create documentation for the entire process lifecycle from data acquisition, data profiling to data validation. Any assumptions and limitations should be reviewed with the stakeholder, including discussions on mitigation and known risks to the analysis.
Balancing Innovation and Ethics
AI is changing our landscape; however, we need to think about the future and that means exploring the ethics of AI. Many of us are advising our clients with new AI technology and are supporting platforms that already have embedded AI solutions. One can argue that at a minimum, companies should be transparent with systems that leverage AI for decision-making. A good example is a resume platform that matches candidates to job openings, or a financial system that pre-approves loans.
The solution to this challenge is evolving, starting with a good data ethics governance framework, asking the questions we outlined in this blog and applying some of the tactics we discussed with conflicts surrounding data ethics. It will require constant focus on ensuring we can continue building our skills with data ethics as we keep up with innovation.
The good news is that you don’t have to do this alone. Data ethics are a major consideration in our strategic data discussions with clients. If data ethics are particularly salient for your organization, we can tailor a Strategy, Vision and Roadmap session to emphasize data ethics principles and best practices.