use airbyte to migrate data from tidb cloud to snowflake

Authors: Qiang Wu, Xiang Zhang (TiDB Cloud Engineers) 
Editors: Fendy Feng, Tom Dewan 

TiDB Cloud is a fully-managed Database-as-a-Service (DBaaS) that brings TiDB, an open source Hybrid Transactional and Analytical Processing (HTAP) database, to your cloud. 

Airbyte is an open-source data integration engine that helps you consolidate your data in data warehouses, data lakes, and databases. 

Snowflake provides Software-as-a-Service (SaaS) service with a single and global platform for data warehousing with seamless data collaboration.

Airbyte can help you connect TiDB Cloud with Snowflake, migrate data from TiDB Cloud, the source, to Snowflake, the destination, and leverage Snowflake’s strong analytical capability to respond to your multi-dimensional analytical queries. (The source data can come from either TiDB Cloud or a self-deployed TiDB cluster.) Airbyte can also synchronize data from TiDB Cloud to Snowflake periodically and give you a macro-progressive data view. 

In this tutorial, we will walk you through how to use Airbyte to connect TiDB Cloud with Snowflake, and how to make data migration from the former to the latter.

Get started with TiDB Cloud Dev Tier

  1. Sign up for a TiDB Cloud account free of charge and log in to it. If you already have a TiDB Cloud account, you can also log in to your account directly.
  2. On the Active Clusters page, click Create Cluster. Then, in the Developer Tier box, select 1 year Free Trial.
    Note: You can also select the paid TiDB Dedicated Tier to get better performance. image
  3. Set up your cluster name, cloud provider (only AWS is available for the Dev Tier), and region. (A nearby region is recommended.) Then, click Create to create your cluster.
    Note: It usually takes 1 to 3 minutes to create your TiDB Cloud cluster.
  4. After you create a cluster, you will be directed to the Active Clusters page. Click the cluster name you just created and go to the cluster control panel. image
  5. Click Connect to create a traffic filter. image
  6. On the pop-up window, set 0.0.0.0/0as the IP Address to allow access from any other IP address. image

Import sample data to TiDB Cloud

After you create a TiDB Cloud cluster, you can migrate data to TiDB Cloud. Here, we use Capital Bikeshare’s system dataset as the sample data to walk you through how to import data to TiDB Cloud.

  1. Click Import and you will be directed to the Data Import Task page. image image
  2. Fill the following parameters in the fields on the Data Import Task page.
    • Data Source Type: AWS S3
    • Bucket URL: s3://tidbcloud-samples/data-ingestion/
    • Data Format: TiDB Dumpling
    • Role-ARN: arn:aws:iam::385595570414:role/import-sample-access
    • Target Cluster Username and password: Same as that of your TiDB cluster.
  3. After you fill in all the fields, click Import. Now, the sample data can be imported to TiDB Cloud. It usually takes around 3 minutes to finish the import process.

Set up Airbyte-specific entities in Snowflake

To write data into Snowflake, you need to create Airbyte-specific Snowflake entities, including a warehouse, a database, a schema, a user, and a role, with the OWNERSHIP permission.

  1. Sign up for a Snowflake account first, which allows you a 30-day free trial. If you already have one, you can sign in to your account directly.
  2. After you sign in to your Snowflake account, click Worksheets to create a new worksheet. image
  3. Change the password, aribyte_role, airbyte_username, airbyte_warehouse, airbyte_database, and airbyte_schema in the following script. Here, we use the default parameter as an example. Then, copy the whole script and paste it into the Worksheets page.
    -- set variables (these need to be uppercase)
    set airbyte_role = 'AIRBYTE_ROLE';
    set airbyte_username = 'AIRBYTE_USER';
    set airbyte_warehouse = 'AIRBYTE_WAREHOUSE';
    set airbyte_database = 'AIRBYTE_DATABASE';
    set airbyte_schema = 'AIRBYTE_SCHEMA';
    -- set user password
    set airbyte_password = 'password';
    begin;
    -- create Airbyte role
    use role securityadmin;
    create role if not exists identifier($airbyte_role);
    grant role identifier($airbyte_role) to role SYSADMIN;
    -- create Airbyte user
    create user if not exists identifier($airbyte_username)
    password = $airbyte_password
    default_role = $airbyte_role
    default_warehouse = $airbyte_warehouse;
    grant role identifier($airbyte_role) to user identifier($airbyte_username);
    -- change role to sysadmin for warehouse / database steps
    use role sysadmin;
    -- create Airbyte warehouse
    create warehouse if not exists identifier($airbyte_warehouse)
    warehouse_size = xsmall
    warehouse_type = standard
    auto_suspend = 60
    auto_resume = true
    initially_suspended = true;
    -- create Airbyte database
    create database if not exists identifier($airbyte_database);
    -- grant Airbyte warehouse access
    grant USAGE
    on warehouse identifier($airbyte_warehouse)
    to role identifier($airbyte_role);
    -- grant Airbyte database access
    grant OWNERSHIP
    on database identifier($airUse existing destinationbyte_database)
    to role identifier($airbyte_role);
    commit;
    begin;
    USE DATABASE identifier($airbyte_database);
    -- create schema for Airbyte data
    CREATE SCHEMA IF NOT EXISTS identifier($airbyte_schema);
    commit;
    begin;
    -- grant Airbyte schema access
    grant OWNERSHIP
    on schema identifier($airbyte_schema)
    to role identifier($airbyte_role);
    commit;       
    

image 4. Click Query. image 5. Go back to the Airbyte home page and click Database. You’ll see that a worksheet named AIRBYTE_DATABASE has been created with an ACCOUNTADMIN and AIRBYTE_ROLE. image
Note: Make a note of these parameters. You will be entering some of them again later when you create a Snowflake destination.You can also learn more at the Airbyte documentation.

Deploy Airbyte in your own workspace

You only need to take a few steps to make local deployment of Airbyte.

  1. Install Docker, an application container, onto your workspace.
  2. Clone the Airbyte source code shown below to your Docker.
    $ git clone [https://github.com/airbytehq/airbyte.git](https://github.com/airbytehq/airbyte.git)
    $ cd airbyte
    
  3. Run the Docker images with docker-compose. When you see the Airbyte banner, the UI is ready to go at http://localhost:8000.
    $ docker-compose up
    

Create a TiDB Cloud Source in Airbyte

After you deploy Airbyte, you can set up a new source connector for TiDB Cloud.

  1. Click Sources on your Docker panel, and select TiDB as your source type.
  2. Fill in all the fields including the Name, Host, Port, Database, Username, and Password on the panel. image
  3. In the JDBC URL Params field, fill in either enabledTLSProtocols=TLSv1.2 or enabledTLSProtocols=TLSv1.3 to set your TLS protocols to TLSv1.2 or TLSv1.3.
  4. Enable SSL Connection.
    Note: If you want to disable the TLS connection with TiDB Cloud, you need to change the previous parameter in the JDBC URL Params field to useSSL=false, and disable SSL Connection.
  5. Click Set up source to create the TiDB Cloud source.

Create a Snowflake destination in Airbyte

  1. Click Destinations on your Docker panel and select Snowflake as the destination type.
  2. Fill in all the fields including the Name, Host, Role, Warehouse, Database, Default Schema, and Username. These are parameters in the Set up Airbyte-specific entities in Snowflake section.
  3. In the Authorization Method drop-down box, select Username and Password to associate the Password with the Username. image
  4. In the Loading Method drop-down box, select [Recommended] Internal Staging.
  5. Click Set up destination to create the Snowflake destination.

Build the connection and transform data from TiDB Cloud to Snowflake

After you set up the TiDB Cloud source and Snowflake destination, you can build the connection between the two.

  1. Select Destinations on your Docker panel and click New Connection.
  2. Click Create a source and select Select an existing source. Then, in the drop-down box, choose TiDB which you previously created and click Use existing source. image
  3. Similarly, select Snowflake we created previously as the destination, and click Use existing destination. image
  4. After you select the source and destination, go to the Set up connection panel. Create a name to help you identify this connection. Here, we use TiDB-Snowflake as the connection name. image
  5. In the Replication frequency field, select Manual. This means the data will be migrated manually.
  6. In the Destination Namespace field, select Mirror source structure to make the data stored in the same location in the destination as in the source.
  7. Select and activate the schema you want to migrate from TiDB Cloud to Snowflake. Here, as an example, we select the trips and type_count tables.
    Notes:
    • TiDB source connector supports two types of synchronization: Incremental and Full Refresh. In the Incremental sync mode, Airbyte only reads records added to the source since the last sync task. (In the first sync task, the Incremental sync mode is equivalent to the Full Refresh sync mode.) In the Full Refresh sync mode, Airbyte reads all the records in the source and migrates them to the destination during every sync task.
    • You can set a sync mode for each table seperately in the Namespace section in Airbyte.
  8. In the Normalization & Transformation section, select Normalized tabular data. image
  9. Click Set up connection to start the data synchronization task. It usually takes a few minutes to complete this process. Congratulations! After you finish all the steps above, you have successfully migrated the data from TiDB Cloud to Snowflake, and stored the data in the AIRBYTE_DATABASE.BIKESHARE schema. Now, you can make analytical queries in Snowflake. image

Summary

In this tutorial, you’ve learned how to configure TiDB Cloud, Snowflake, and Airbyte, and how to use Airbyte to connect the other two. You also learned how to migrate data from TiDB Cloud and Snowflake for multi-dimensional analytics. TiDB source connector is already available in Airbyte local deployment, and soon we will release the TiDB destination connector to support data migration from other sources.

If you encounter any problem during the configuration or migration, you can contact us through Twitter, our Slack channel, or you can join TiDB Internals to share your thoughts and feedback with us.

Keep reading:
Analytics on TiDB Cloud with Databricks
How to Achieve High-Performance Data Ingestion to TiDB in Apache Flink
Data Transformation on TiDB Made Easier

Qiang Wu

About the Author

Qiang Wu

More From Qiang Wu

Subscribe to Stay Informed!

TiDB Cloud logo-white

TiDB Cloud

Get the massive scale and resiliency of TiDB databases in a fully managed cloud service

TiDB logo-white

TiDB

TiDB is effortlessly scalable, open, and trusted to meet the real-time needs of the digital enterprise