Getting Started: Data Strategy
What Does This Article Cover?
Access to data does not make it useful. Industrial data is raw and must be made fit for purpose to extract its true value. Furthermore, the tools used to make the data fit for purpose must operate at the scale of an industrial enterprise.
With these realities in mind, this article is a simple introduction to data strategy and is intended to help manufacturing and industrial leaders better understand the process of applying proven data engineering processes and technologies to design an industrial data architecture for scale. The article covers the following topics.
- Align the solution with business goals
- Define a specific use case
- Identify the target systems
- Identify the data sources
- Select the integration architecture
- Include a Data Engineer on the team
- Other related material
1. Align the solution with business goals:
As with any major initiative, ensuring that your project is aligned with corporate goals and strategy has to be the first step. Because industrial data will be used by IT, operations, and line-of-business users, your objectives must support overarching business goals while being easily understood by data users across the organization. Cross-departmental collaboration will be necessary, as data architecture tends to reach across a great deal of the enterprise. Make sure the right cross-functional stakeholders are engaged in the project from the project beginning. For example, will the solution provide the data necessary to enable decision that improve product quality or improve asset utilization or provide the traceability data needed to comply with regulations.
2. Define a specific use case:
A successful data strategy begins with a clearly defined use case and business goals. For many manufacturing companies, projects may focus on machine maintenance, process improvements, or product analysis to improve quality or traceability. As part of the use case, company stakeholders should identify the project scope and applicable data that will be required. A use case has a target persona who will consume the information and act on it. This persona is critical to identify because their knowledge, experience, and background will all impact how the data will be delivered to them and ultimately the success of the project.
3. Identify the target systems:
With the business goals and use cases identified, your next step is identifying the target applications that will be used to accomplish these goals. This approach is contrary to traditional data acquisition approaches that would have you begin with source systems. Focusing on your target systems first will allow you to identify exactly what data you need to send, how it must be sent, and at what frequency it should be sent, so you can determine which sources and structures are best suited to deliver that data. Focusing on the target system and persona will also help identify the context the data may require and the frequency of the data updates. Some target systems may require data from multiple sources blended into a single payload. Most require the data to be formatted in a specific way for consumption. By starting with the target system and persona, you can ensure that you are architecting your solution in a way that delivers not just the right data but the right data with the right context at the right time.
4. Identify the data sources:
Identifying the right data sources for your use case is a crucial step. However, there are often significant barriers to accessing the automation level data sources.
- Volume - The typical modern industrial factory has hundreds to thousands of pieces of machinery and equipment constantly creating data.
- Correlation - Automation data is correlated for process control, not for asset maintenance, product quality, or traceability purposes.
- Context - Data structures on PLCs and machine controllers have minimal descriptive information
- Standardization - Automation in a factory evolves over time, with machinery and equipment sourced from a wide variety of hardware vendors.
5. Select the integration architecture:
The DataOps approach to data integration and security aims to improve data quality and reduce time spent preparing and maintaining data for use throughout the enterprise.
An integration hub is purpose-built to move high volumes of data at high speeds with transformations being performed in real time while the data is in motion. Since a DataOps integration hub is an application itself, it provides a platform to identify impact when devices or applications are changed, perform data transformations, and provide visibility to these transformations.
As your use case scales, the integration hub methodology easily scales with it. With an integration hub, you can templatize connections, models, and flows, allowing you to easily reuse integrations and onboard similar assets. Integration hubs can also share configurations, projects, and more, so when you are ready to expand your use case to another site, you can simply launch another integration hub and carry your work over.
Once your project plan is in place, you can begin system integration in earnest by establishing secure connections to the source and target systems. It is vital that you wholly understand the protocols you will be working with and the security risks and benefits that come with them.
6. Include a Data Engineer on the team:
There are many approaches to creating a team for a DataOps project. The team of course should be cross functional and have top down sponsorship. A large organization might decide to create a Center of Excellence for the solution. The team should obtain guidance from IT architects. We see the Data Engineer role as crucial for DataOps projects. The Data Engineer Role may be fulfilled by one or more individuals who configure data flow solution(s) using Intelligence Hub. The role may be a corporate or site resource and the individual(s) for the role may vary per use case and or site. Individual aptitude is probably more important than specific job function and we often find that the Data Engineer Role is someone from a corporate or site Information Technology team or a Controls Engineer or someone from a Data Science or Analytics team.