-
Type:
Suggestion
-
Resolution: Unresolved
-
Component/s: Enterprise Insights - Data File Retrieval
-
None
-
1
User Problem
The Customer is transitioning from an API-based data extraction model to a more secure and efficient S3 Parquet-based replication system for data storage and disaster recovery.
The following challenges were identified:
- The customer's requested replication model is non-standard for Atlassian and requires internal review, testing, and automation updates.
- AWS replication delays for large data volumes risk disrupting sequential file processing and data accuracy.
- Versioning requirements increase costs and complexity.
- Ensuring compatibility of Parquet schema with the Customer's incremental updates.
- Security and setup configurations require significant customization and approval.
Suggested Solution
- Optimize Replication for Large Data Volumes
- Investigate premium AWS replication services or alternative solutions to reduce replication delays for larger data volumes.
- Implement monitoring tools to track replication progress and identify bottlenecks early, ensuring timely data processing.
- Ensure Data Integrity with Versioning and Backup Strategies
- Confirm that both parties enable versioning for the source and target S3 buckets, with clear policies to manage versioning costs effectively.
- Develop a strategy for sequential file processing to avoid data drift, including error-handling mechanisms for delayed or incomplete file transfers.
- Validate Data Compatibility with Parquet Schema
- Ensure all required tables and columns are present and usable in the new Parquet-based replication model.
- Map the Customer's existing hourly API-based incremental updates to the Parquet schema, ensuring compatibility with the new system.
- Strengthen Security and Streamline Configuration
- Customize the replication setup templates provided by the Customer to align with Atlassian's environment and security requirements.
- Conduct a security review with Atlassian's security team to identify and address potential vulnerabilities in the replication process.
Current Workarounds
- Continue API-Based Incremental Updates: Use the existing hourly API process until the S3 replication model is fully implemented.
- Schedule Transfers Strategically: Perform exports during off-peak hours to minimize AWS replication delays.
- Small-Scale Parquet Testing: Validate the Parquet schema with limited data before scaling up.
- Manual Monitoring: Assign oversight for critical data transfers to address delays or issues.
- Temporary Non-Versioned Buckets: Use non-versioned buckets for initial testing to reduce costs.
- Iterative Template Refinement: Adjust provided templates to fit Atlassian’s standards and expedite security approval.
- mentioned in
-
Page Loading...
- relates to
-
ALIGNSP-28721 Loading...