Google Big Query Cloning of Datasets & Tables across GCS Projects.
We searched the internet and could not find a simple cloning / copying of Tables and Datasets script from PROD --> TEST --> DEV in Google Big Query etc with ease. We needed a utility that had the option of copying complete datasets across projects within Google Big Query. There is the option within the Big Query UI to copy 1 table at a time but that would take us forever. Client is using SDLC and working through 3 envs. We needed to clone data from Production to our Test Environment needed for Shake down testing of Airflow deliverables (DAGS etc). Their are probably a lot of customers out their that have started loading their data into an initial environment and now need to copy the datasets to the other projects, What ever the reason may be we have this excellent script that can be run to leverage some of the work, The script can easily be amended and scheduled to run on weekends or evenings.
- Pip library: google.cloud import bigquery
Notes on API.
python3 bq_dataset_migrator.py source_project source_dataset target_project target_dataset
1 - Source Project
2 - Source Data Set
3 - Target Project
4 - Target Data Set