Recently, I am supporting a project from Oracle to TiDB. In order to facilitate the application-side compatibility test, it is necessary to synchronize the database table structure and data of the Oracle test environment to TiDB. Since the amount of data is not large, how can it be convenient? Here we use CSV
export and import method to achieve.
The whole process can be divided into three steps:
It is well known that TiDB is compatible with the MySQL protocol, so Oracle's table structure definitions may not be fully used in TIDB. At this time, some conversions need to be done, such as field types, keywords, system functions, and so on. If the table is relatively small, it is not impossible to manually transfer it, but one of the Oracle users in this test has nearly 900 tables, and it is obviously impossible to manually convert it.
The tool I use here is TransferDB, which can support the structural conversion of heterogeneous data Oracle to MySQL/TiDB. The project homepage is https://github.com/wentaojin/transferdb.
Although this tool has not been officially released to the public, it is really useful. TransferDB is a part of the TiDB operation and maintenance common toolset (TiDBA). Others include collecting statistics, Mok parsing keys, generating break up statements based on region keys, data ranges, and data estimation, viewing table data, and indexing region leader distribution, Version upgrade, comparison of 3.0 and 4.0 configuration files and tidb system variables, etc., can be said to be very practical.
Friends who have used Lightning
must not be unfamiliar with the use of this tool. It can be said that the configuration file to the running program is almost the same, and the operation manual that comes with the project is also very detailed.
It includes the following core functions: schema
conversion, table structure check, migration cost assessment, data migration (full or incremental), CSV
export, etc. Some of these functions are still experimental features, I only use it here The core feature of schema
conversion.
Its configuration file parameters are very rich, the comments are very clear, and it is very easy to use. For the schema conversion scenario, you only need to modify the connection information in the [source] and [target] parts. The detailed configuration list can be found here: https:/ /github.com/wentaojin/transferdb/blob/main/conf/config.toml
After the configuration file is modified, the conversion can be achieved by executing the following two commands:
# This process is to generate a migration meta information library in the target library to store conversion rules, breakpoint information, etc., similar to the dm_meta library in DM
./transferdb --config config.toml --mode prepare
# This process is to realize schema conversion and output sql file
./transferdb --config config.toml --mode reverse
After the execution is successful, two SQL files will be generated, one is called reverse_${sourcedb}.sql
, which is the sql that can be executed in TiDB, and the other is compatibility_${sourcedb}.sql
, which is not compatible with TiDB SQL, such as Foreign Key, Constraint, etc., this part of SQL needs to be manually evaluated and implemented using other solutions.
Next, import reverse_${sourcedb}.sql
into TiDB. There are two commonly used methods:
mysql -h -u -P < reverse.sql
source reverse.sql
I use sqluldr2
to export Oracle data to CSV files. This is a data export tool that is widely used in Oracle. Its characteristics are small, light, fast, cross-platform, and support custom SQL.
There is a lot of information on the Internet, so I won't go into details on how to use it here.
Although sqluldr2 is very powerful, it is very confusing that it does not support batch export. There is no way but to find another way to achieve it.
I first put the list of tables to be exported into a txt
file:
./sqluldr2linux64.bin user=user/pwd@192.168.1.1:1521/orcl query='select table_name from all_tables where owner='test';' file=/tmp/tidb/sqluldr_tables.sql
Write a batch script to export all tables:
#!/bin/bash
cat /tmp/tidb/sqluldr_tables.sql | while read line
do
echo $line
/tmp/tidb/sqluldr2linux64.bin user=user/pwd@192.168.1.1:1521/orcl query={$line} charset=UTF8 field=0x7c0x260x7c record=0x3d0x37 null=null file=/tmp/tidb/data/orcltest.{$line}.csv
done
There are a few things to note here:
value count
and column count
as much as possible.UTF8
format to avoid Chinese garbled characters after data is imported into TiDB{dbname}.{tablename}.csv
At this point the data is ready.Import to TiDB There are two common ways to import CSV files into TiDB. The first is Lightning, and the second is Load Data. I recommend giving priority to Lightning both in terms of operation and performance. The reasons are as follows:
From the 900 csv files alone, Lightning is definitely the first choice.
Here are a few core configurations of Lightning:
[tikv-importer]
# Select the local backend to use
backend = "local"
# Set the temporary storage address of sorted key-value pairs, the target path needs to be an empty directory
sorted-kv-dir = "/tmp/tidb/lightning_dir"
[mydumper]
data-source-dir = "/tmp/tidb/data"
no-schema=true
filter = ['*.*']
[mydumper.csv]
# Field separator, supports one or more characters, the default value is ','.
separator = '|&|'
# Quote delimiter, set to empty to indicate that the string is not quoted.
delimiter = ''
# End-of-line delimiter, supports one or more characters. Set to empty (default) to indicate "\n" (line feed) and "\r\n" (carriage return + line feed), both indicating end of line.
terminator = "=%"
# Whether the CSV file contains headers.
# If header = true, the first line will be skipped.
header = false
# Whether the CSV file contains NULLs.
# If not-null = true, all columns of the CSV cannot be parsed as NULL.
not-null = false
# If not-null = false (i.e. CSV can contain NULL),
# Fields with the following values will be parsed as NULL.
null = '\N'
# Whether to escape "\" in the field
backslash-escape=true
# If any line ends with a delimiter, remove the trailing delimiter.
trim-last-separator = false
Notice:
- It is recommended to use local mode, which is more friendly to blob type data processing
- No need to import table structure, so set no-schema=true
- Delimiters and newlines should be the same as those set by sqluldr2
Finally run Lightning:
./tidb-lightning --config tidb.toml --checkrequirements=false
I also found a Lightning bug in the process, which I will try to fix myself later.
As for other parts of Oracle, such as stored procedures and custom functions, there is no need to think about how to migrate, and honestly change the code to implement it on the application side.
For views, I first used PLSQL to import the views into sql files and then imported them into TiDB, but there were many errors, which were basically incompatible with system functions and syntax. This part involves SQL rewriting, and there is no good solution.
In previous projects, DSG has also been used to implement data migration from Oracle to TiDB, but it is a purely commercial tool after all, and it is not a good choice for this kind of testing phase.
Of course, the official book "TiDB in Action" also gives a migration case from Oracle to TiDB: it is implemented based on Oracle's own OGG components, and it is a little troublesome to deploy and use.