Creating a mapping document for migrating Netezza tables to Azure Data Lake requires a thorough understanding of the source (Netezza) and target (Azure Data Lake) environments, as well as the data, schema, and processes involved. Below is a comprehensive list of questions to ask to ensure the mapping document is detailed, accurate, and effective for the migration process. These questions are grouped into categories for clarity.
### 1. Source System (Netezza) Analysis
– **Table Inventory**:
– What is the complete list of tables in the Netezza database to be migrated?
– Are there any tables that are redundant, unused, or can be archived instead of migrated?[](https://learn.microsoft.com/en-us/azure/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations)
– Which tables are actively used, and how can we verify usage (e.g., via query history or metadata)?[](https://learn.microsoft.com/en-us/azure/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations)
– **Schema and Metadata**:
– What are the schema details for each table (e.g., column names, data types, constraints, primary keys, foreign keys)?
– Are there any unsupported data types in Netezza (e.g., INTERVAL, ST_GEOMETRY) that need special handling in Azure Data Lake?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– Are there materialized views, clustered base tables (CBTs), or zone maps in Netezza, and how will they be mapped to Azure?[](https://learn.microsoft.com/en-us/azure/synapse-analytics/migration-guides/netezza/1-design-performance-migration)%5B%5D(https://learn.microsoft.com/en-us/azure/synapse-analytics/migration-guides/netezza/5-minimize-sql-issues)
– What metadata views (e.g., _V_TABLE, _V_RELATION_COLUMN, _V_OBJECTS) are available to extract schema information?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– **Data Characteristics**:
– What is the volume of data per table (e.g., row count, size in GB/TB)?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
– Are there any tables with sensitive or PII data requiring encryption or masking?
– What is the data distribution (e.g., skewed data, partitioning) in Netezza tables?
– Are there watermark columns for incremental data loads, and how are they defined?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
– **Usage Patterns**:
– How are tables accessed (e.g., read/write patterns, frequency of queries)?
– Which scripts or applications load data into these tables, and which pull data from them?[](https://nexla.com/how-to-migrate-your-data-from-netezza-to-snowflake/)
– Are there any Netezza-specific utilities (e.g., nzsql, nzload) used for data operations?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
### 2. Target System (Azure Data Lake) Requirements
– **Storage Configuration**:
– Will Azure Data Lake Storage Gen2 be used as the primary storage, and what file formats are preferred (e.g., Parquet, Avro, CSV)?[](https://k21academy.com/data-engineering/adf-interview-questions/)%5B%5D(https://intellipaat.com/blog/interview-question/azure-data-factory-interview-questions/)
– How will the data be organized in the Data Lake (e.g., folder structure, partitioning strategy)?
– Are there specific naming conventions for files or folders in the Data Lake?
– **Schema Mapping**:
– How will Netezza data types be mapped to Azure Data Lake formats? (e.g., VARCHAR to string, TIMESTAMP to datetime)[](https://learn.microsoft.com/en-us/azure/synapse-analytics/migration-guides/netezza/1-design-performance-migration)%5B%5D(https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– Are there any transformations required during migration (e.g., data type conversions, column renaming)?
– Will the schema be preserved as-is, or will it be normalized/denormalized in the Data Lake?
– **Performance Optimization**:
– How will Netezza features like zone maps or CBTs be replaced in Azure Data Lake (e.g., partitioning, indexing)?[](https://learn.microsoft.com/en-us/azure/synapse-analytics/migration-guides/netezza/1-design-performance-migration)%5B%5D(https://learn.microsoft.com/en-us/azure/synapse-analytics/migration-guides/netezza/5-minimize-sql-issues)
– Are there specific performance requirements for data access in the Data Lake (e.g., query speed, concurrency)?
– **Security and Compliance**:
– What security measures are required (e.g., encryption at rest, access control lists, row-level security)?[](https://www.projectpro.io/article/azure-data-lake-interview-questions-and-answers/708)%5B%5D(https://intellipaat.com/blog/interview-question/azure-data-factory-interview-questions/)
– Are there compliance requirements (e.g., GDPR, HIPAA) that impact data storage or access?
### 3. Data Migration Process
– **Migration Strategy**:
– Will the migration be a one-time historical load, incremental, or both?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-overview)
– Should data be extracted to files (e.g., CSV) or moved directly via network connection?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– Will the migration be orchestrated from Netezza or Azure (e.g., using Azure Data Factory)?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– **Tools and Technologies**:
– Which tools will be used for migration (e.g., Azure Data Factory, third-party ETL tools like Informatica, or custom scripts)?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)%5B%5D(https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– Are there existing ETL/ELT processes in Netezza that need to be re-engineered for Azure?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– Will Azure Data Factory’s Netezza connector be used, and how will parallel copying be configured (e.g., parallelCopies setting)?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)%5B%5D(https://learn.microsoft.com/en-us/azure/data-factory/connector-netezza)
– **Partitioning and Parallelism**:
– How will large tables (>100 GB) be partitioned for migration (e.g., data slice, dynamic range partitioning)?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
– What is the optimal number of partitions per table to balance performance and resource usage?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
– How will the parallelCopies setting in Azure Data Factory be tuned for throughput?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)%5B%5D(https://learn.microsoft.com/en-us/azure/data-factory/connector-netezza)
– **Data Validation**:
– How will data integrity be validated post-migration (e.g., row counts, checksums, column-level comparisons)?
– Are there tools like Onix’s Pelican for automated data validation?[](https://www.onixnet.com/migration-to-azure/)
– What is the process for logging and handling incompatible rows during migration?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
### 4. ETL and Processing Considerations
– **ETL Pipeline Mapping**:
– What existing Netezza ETL processes (e.g., nzsql scripts, third-party tools) need to be migrated?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– How will Netezza utilities like nzload be replaced in Azure (e.g., Azure Data Factory pipelines)?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– Are there dependencies between tables or ETL jobs that need to be preserved?
– **Transformation Logic**:
– Are there transformations applied in Netezza that need to be replicated in Azure (e.g., in Data Factory or Spark)?
– Will transformations be handled during migration (ELT) or post-migration in the Data Lake?
– **Scheduling and Orchestration**:
– How will migration pipelines be scheduled (e.g., using Azure Data Factory triggers)?[](https://k21academy.com/data-engineering/adf-interview-questions/)
– Are there specific time windows for migration to minimize impact on Netezza performance?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/5-minimize-sql-issues.md)
### 5. Performance and Optimization
– **Network and Bandwidth**:
– What is the available network bandwidth between Netezza and Azure, and how does it impact migration time?[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)%5B%5D(https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-overview)
– Is a private connection like Azure ExpressRoute required for secure data transfer?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
– **Resource Configuration**:
– What is the configuration of the self-hosted Integration Runtime (IR) for Netezza-to-Azure migration (e.g., CPU, memory)?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
– How will IR resources be scaled based on migration performance monitoring?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
– **Throughput Goals**:
– What is the target throughput for data migration (e.g., MBps per copy activity)?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
– How will concurrency settings (e.g., ForEach concurrency in Azure Data Factory) be optimized?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
### 6. Post-Migration Considerations
– **Data Access and Consumption**:
– How will data in Azure Data Lake be accessed post-migration (e.g., Azure Synapse Analytics, Spark, Power BI)?
– Are there applications or scripts dependent on Netezza that need to be reconfigured for Azure Data Lake?[](https://nexla.com/how-to-migrate-your-data-from-netezza-to-snowflake/)
– **Testing and QA**:
– What is the QA process to ensure data accuracy and application compatibility post-migration?[](https://nexla.com/how-to-migrate-your-data-from-netezza-to-snowflake/)
– Will parallel versions of applications (Netezza and Azure) be run during transition?[](https://nexla.com/how-to-migrate-your-data-from-netezza-to-snowflake/)
– **Decommissioning**:
– What is the plan for decommissioning Netezza tables after successful migration?
– Are there archival requirements for Netezza data not migrated to Azure?
### 7. Governance and Documentation
– **Metadata Management**:
– How will metadata (e.g., table schemas, ETL logic) be documented and maintained in Azure?
– Will a control table be used to track watermark values for incremental loads?[](https://learn.microsoft.com/en-us/azure/data-factory/data-migration-guidance-netezza-azure-sqldw)
– **Stakeholder Requirements**:
– Who are the key stakeholders (e.g., DBAs, data engineers, business users) and what are their requirements?
– Are there specific reporting or auditing requirements for the migration process?
– **Change Management**:
– How will changes to the mapping document be tracked and communicated?
– What is the process for handling schema changes during migration?
### Sample Mapping Document Structure
Based on the questions above, the mapping document should include:
1. **Source Table Details**: Table name, schema, columns, data types, constraints, volume, watermark column.
2. **Target Mapping**: Azure Data Lake folder path, file format, data type mapping, partitioning strategy.
3. **Transformation Rules**: Any data transformations or cleansing required.
4. **Migration Strategy**: Full/incremental load, partitioning approach, tools used.
5. **Validation Rules**: Checks for data integrity (e.g., row count, checksum).
6. **Dependencies**: ETL job dependencies, application dependencies.
7. **Performance Notes**: Parallelism settings, expected throughput.
8. **Security/Compliance**: Encryption, access control requirements.
### Additional Notes
– Leverage Netezza metadata views (_V_TABLE, _V_RELATION_COLUMN) to automate schema extraction.[](https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– Use Azure Data Factory’s Netezza connector for efficient data movement, with parallel copying for large tables.[](https://learn.microsoft.com/en-us/azure/data-factory/connector-netezza)
– Consider third-party tools (e.g., Informatica, Talend, or Onix’s Datametica Birds) for automation if already in use.[](https://www.onixnet.com/migration-to-azure/)%5B%5D(https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/synapse-analytics/migration-guides/netezza/2-etl-load-migration-considerations.md/)
– Plan for a phased migration, starting with low-impact tables to validate the process.[](https://nexla.com/how-to-migrate-your-data-from-netezza-to-snowflake/)
These questions ensure the mapping document captures all necessary details for a successful Netezza-to-Azure Data Lake migration. If you need a template for the mapping document or assistance with specific sections, let me know!
Leave a comment