Talend Open Studio Cookbook
上QQ阅读APP看书,第一时间看更新

Creating schemas from lists

This next recipe doesn’t make use of Talend at all. Rather, it is a technique to save lots of tedious typing when creating schemas from documents and/or spreadsheets.

Getting ready

Open the MS Word document customerFieldList.docx. As you can see, there are a reasonable number of field descriptions that would take a reasonable amount of time to define individually.

How to do it...

  1. Select all the column names from the word document and paste into an Excel spreadsheet:
    How to do it...
  2. Now select all the fields, right click it, and select Copy.
  3. Go to the second worksheet and click the top-left cell.
  4. Then, right-click and select Paste Special, and select the option Transpose:
    How to do it...
  5. This will copy the previous vertical list into a horizontal list.
  6. Delete the initial worksheet and save the file as a CSV file named TransposedCustomer.csv
  7. You can then import the CSV file using the wizard for File delimited and stating that the file has a heading row.
    How to do it...
  8. Set the field separator as Comma, and tick the box Set heading row as column names.
    How to do it...
  9. Click on Next, and you should see the individual fields listed in the schema.You are now able to add the field types and lengths.
  10. If you wish, you can then copy the delimited schema to a generic schema.

How it works…

The transpose facility of the spreadsheet enables a vertical list of fields to be converted into a horizontal list. By saving this list as a CSV file, the horizontal field list can be highlighted as a heading row during an import into Talend. This automatically fills in the field names in the schema, thus avoiding the need to type in the names of the columns individually.

There’s more…

Even after importing a list using this method, you will still have to ensure that column types and lengths are populated, however, if you also add data to the CSV file prior to importing it, Talend will try to guess the type and length of each column during the import stage

It is possible to force Talend to guess correctly by adding data to the file that matches the type exactly. There are two methods that can be used:

Transpose the data

Starting with the original list, add a second column to the list, and populate it with data values for each of the fields.

When transposing the data in the spreadsheet, copy both the column of field names and the data; and transpose both list columns, so that they become a heading row and a row of data.

Edit the CSV file

The second method is to add a row of data either to excel or CSV files manually prior to importing the metadata.

What data to add? If you take care to add data that is the maximum representative size of the column, then Talend will usually guess the correct types and lengths.

For example, if the field is a ten character string, for example, then ensure that you add ten characters to the data in either the list column or the CSV file. For numbers, ensure that you use numbers to let Talend know that the field is numeric.

In the preceding example, if you only set the number fields to 99999999.999 prior to import, it will save significant time. This is easy to do in Word or Excel and can save time when defining large schemas.