Record of a simple process of migrating Oracle offline data to TiDB

created at 01-19-2022 views: 3

background

Recently, I am supporting a project from Oracle to TiDB. In order to facilitate the application-side compatibility test, it is necessary to synchronize the database table structure and data of the Oracle test environment to TiDB. Since the amount of data is not large, how can it be convenient? Here we use CSV export and import method to achieve.

The whole process can be divided into three steps:

  • Library table structure conversion
  • Source data export
  • Import target library

Library table structure conversion

It is well known that TiDB is compatible with the MySQL protocol, so Oracle's table structure definitions may not be fully used in TIDB. At this time, some conversions need to be done, such as field types, keywords, system functions, and so on. If the table is relatively small, it is not impossible to manually transfer it, but one of the Oracle users in this test has nearly 900 tables, and it is obviously impossible to manually convert it.

The tool I use here is TransferDB, which can support the structural conversion of heterogeneous data Oracle to MySQL/TiDB. The project homepage is https://github.com/wentaojin/transferdb.

Although this tool has not been officially released to the public, it is really useful. TransferDB is a part of the TiDB operation and maintenance common toolset (TiDBA). Others include collecting statistics, Mok parsing keys, generating break up statements based on region keys, data ranges, and data estimation, viewing table data, and indexing region leader distribution, Version upgrade, comparison of 3.0 and 4.0 configuration files and tidb system variables, etc., can be said to be very practical.

Friends who have used Lightning must not be unfamiliar with the use of this tool. It can be said that the configuration file to the running program is almost the same, and the operation manual that comes with the project is also very detailed.

It includes the following core functions: schema conversion, table structure check, migration cost assessment, data migration (full or incremental), CSV export, etc. Some of these functions are still experimental features, I only use it here The core feature of schema conversion.

Its configuration file parameters are very rich, the comments are very clear, and it is very easy to use. For the schema conversion scenario, you only need to modify the connection information in the [source] and [target] parts. The detailed configuration list can be found here: https:/ /github.com/wentaojin/transferdb/blob/main/conf/config.toml

After the configuration file is modified, the conversion can be achieved by executing the following two commands:

# This process is to generate a migration meta information library in the target library to store conversion rules, breakpoint information, etc., similar to the dm_meta library in DM
./transferdb --config config.toml --mode prepare
# This process is to realize schema conversion and output sql file
./transferdb --config config.toml --mode reverse

After the execution is successful, two SQL files will be generated, one is called reverse_${sourcedb}.sql, which is the sql that can be executed in TiDB, and the other is compatibility_${sourcedb}.sql, which is not compatible with TiDB SQL, such as Foreign Key, Constraint, etc., this part of SQL needs to be manually evaluated and implemented using other solutions.

Next, import reverse_${sourcedb}.sql into TiDB. There are two commonly used methods:

  • mysql -h -u -P < reverse.sql
  • source reverse.sql

Source data export

I use sqluldr2 to export Oracle data to CSV files. This is a data export tool that is widely used in Oracle. Its characteristics are small, light, fast, cross-platform, and support custom SQL.

There is a lot of information on the Internet, so I won't go into details on how to use it here.

Although sqluldr2 is very powerful, it is very confusing that it does not support batch export. There is no way but to find another way to achieve it.

I first put the list of tables to be exported into a txt file:

./sqluldr2linux64.bin user=user/pwd@192.168.1.1:1521/orcl query='select table_name from all_tables where owner='test';' file=/tmp/tidb/sqluldr_tables.sql

Write a batch script to export all tables:

#!/bin/bash

cat /tmp/tidb/sqluldr_tables.sql | while read line
do
    echo $line
    /tmp/tidb/sqluldr2linux64.bin user=user/pwd@192.168.1.1:1521/orcl query={$line} charset=UTF8 field=0x7c0x260x7c record=0x3d0x37 null=null file=/tmp/tidb/data/orcltest.{$line}.csv
done

There are a few things to note here:

  • It is recommended to use complex characters for field separators and newlines, and it is best to use multiple combination characters (ASCII code is recommended), so as to avoid the inconsistency of the exported data value count and column count as much as possible.
  • The exported file character set is set to UTF8 format to avoid Chinese garbled characters after data is imported into TiDB
  • Since Lightning will be used to import CSV later, the file naming format must meet Lightning requirements, ie {dbname}.{tablename}.csv At this point the data is ready.

Import to TiDB There are two common ways to import CSV files into TiDB. The first is Lightning, and the second is Load Data. I recommend giving priority to Lightning both in terms of operation and performance. The reasons are as follows:

  • Load Data can only import a single file, Lightning can import in batches
  • Lightning is famous for its performance and supports a variety of back-end modes. Load Data can only go to the computing layer, and there is a phenomenon of memory amplification.
  • For abnormal situations, Lightning can resume the transfer from a breakpoint, and Load Data needs to clean up the data and re-import it

From the 900 csv files alone, Lightning is definitely the first choice.

Here are a few core configurations of Lightning:

[tikv-importer]
# Select the local backend to use
backend = "local"
# Set the temporary storage address of sorted key-value pairs, the target path needs to be an empty directory
sorted-kv-dir = "/tmp/tidb/lightning_dir"

[mydumper]
data-source-dir = "/tmp/tidb/data"
no-schema=true
filter = ['*.*']

[mydumper.csv]
# Field separator, supports one or more characters, the default value is ','.
separator = '|&|'
# Quote delimiter, set to empty to indicate that the string is not quoted.
delimiter = ''
# End-of-line delimiter, supports one or more characters. Set to empty (default) to indicate "\n" (line feed) and "\r\n" (carriage return + line feed), both indicating end of line.
terminator = "=%"
# Whether the CSV file contains headers.
# If header = true, the first line will be skipped.
header = false
# Whether the CSV file contains NULLs.
# If not-null = true, all columns of the CSV cannot be parsed as NULL.
not-null = false
# If not-null = false (i.e. CSV can contain NULL),
# Fields with the following values ​​will be parsed as NULL.
null = '\N'
# Whether to escape "\" in the field
backslash-escape=true
# If any line ends with a delimiter, remove the trailing delimiter.
trim-last-separator = false

Notice:

  • It is recommended to use local mode, which is more friendly to blob type data processing
  • No need to import table structure, so set no-schema=true
  • Delimiters and newlines should be the same as those set by sqluldr2

Finally run Lightning:

./tidb-lightning --config tidb.toml --checkrequirements=false

I also found a Lightning bug in the process, which I will try to fix myself later.

other

As for other parts of Oracle, such as stored procedures and custom functions, there is no need to think about how to migrate, and honestly change the code to implement it on the application side.

For views, I first used PLSQL to import the views into sql files and then imported them into TiDB, but there were many errors, which were basically incompatible with system functions and syntax. This part involves SQL rewriting, and there is no good solution.

Summarize

In previous projects, DSG has also been used to implement data migration from Oracle to TiDB, but it is a purely commercial tool after all, and it is not a good choice for this kind of testing phase.

Of course, the official book "TiDB in Action" also gives a migration case from Oracle to TiDB: it is implemented based on Oracle's own OGG components, and it is a little troublesome to deploy and use.

Please log in to leave a comment.