How to migrate varbinary UTF-8 data to varchar, going from MySQL to MSSQL

This post talks about converting the Bitnami MediaWiki v1.34 on Ubuntu from MySQL v8 database to Microsoft SQL Server 2017.  The solution probably applies to any conversion of varbinary to varchar data from MySQL v8 to MSSQL 2014, 2016 and 2017.



The challenge is that these fields, TEXT.old_text and TEXT.old_flags, contain UTF-8 data typed as varbinary.  This is the text of your page content.  It is pretty important. 

When you bring these fields into MSSQL, yes, the text exists in an array of bytes, but it is not human readable.  Evidence: when I did this, 

select top 3 * from text

the old_text was just 0x.... and a long stream of hex codes.  

NB: Some SQL viewers such as HeidiSQL will do an automatic conversion which can make it seem like everything is fine. This is an illusion that bites when you try to use the data elsewhere.

While you can use MSSQL CAST to make the text human-readable, 

SELECT TOP (10) 
     [old_id],
     cast([old_text] as varchar(max)),
     cast([old_flags] as varchar(10))
 FROM text 

that is not something you would want to be doing every single time you worked with the data.

We had used a product from SpectralCore called FullConvert for other conversion projects in the past.  We tried it on this MediaWiki project and initially had trouble.  Fortunately their support department let us upload a sample and they quickly added a feature to support automatic conversion to UTF-8. 

It is necessary to manually override the target datatype to varchar(max) for the TEXT.old_text field in order to trigger the new feature. That is very quick, and then the FullConvert project can be saved and re-used as needed.

Note the free trial version has limitations, so only some data is converted. 

We also had success doing the data migration using the command line version of SQLines, but only on a system that happened to have the correct combination of drivers and DLL files for both MySQL and MSSQL.  The FullConvert solution is standalone and much easier to operate, and has great support.  It seems ( right now, March 2020 ) that there is no more active development on the SQLines project on github.


- from the technical team at HREF Tools Corp.    If you need an experienced, compassionate hosting partner, give us a yell.




Comments

Popular Posts