Dictionary.com offers a very clear definition of diacritics.
“Also called diacritical mark. a mark, point, or sign added or attached to a letter or character to distinguish it from another of similar form, to give it a particular phonetic value, to indicate stress,etc., as a cedila, tilde, circumflex, or macron.”
Deciding if you should include diacritics in your database is not as straightforward. Depending on your system and the encoding system that you use to handle data, diacritics can be a non-issue or a complete nightmare. Proper handling of diacritics does not stop there. Some printers are not equipped to handle diacritics. The end result is that your perfectly maintained data is destroyed when printing.
Some postal administrations such as La Poste (French Post) removed diacritics from their postal data sets over a decade ago and this trend continues. Colombia Post’s instructions state that address data should be written “sin tildes” meaning without tildes, where tildes are the accents above letters.
Removing diacritics from data simplifies the process of OCR scanning so it is understandable why some postal administrations have chosen to remove diacritics from their data sets.
Here are three things you should know about diacritics.
- People will enter data into web forms using their native keyboard and spelling. If you have a global contacts or prospects, be prepared to handle diacritics.
- Personal name are especially tricky. Removing diacritics from personal names poses a unique spelling problem. For example, the family name of Müller contains an ü. If your system cannot properly handle diacritics, you can replace the ü with a ue which is a widely accepted standard. However, replacing the ü with simply the letter u would create a spelling error.
- Think before you act. Serious thought should be given to the preservation of diacritics, especially in personal names. An error in a personal name can result in a misspelling and risk of insulting the recipient.
If you choose to maintain diacritics in your database, follow your data through the various paths it will travel. I have handled a great deal of data that has been passed from one business partner to another and it is easy to spot when someone in the chain is unfamiliar with the proper handling of data.
Here is one example.
Data is meticulously maintained in company CRM system. The company decides to send out a promotional mailer.
The data is sent to the partner advertising agency that opens the file in Excel without using the proper steps to preserve the original data. The data suffers its first round of corruption.
Data is forward to mail house who imports the corrupted file into their system.
Data is printed on mail piece
The end result is that many records in the mailing have corrupted characters. Some may be so severe that it destroys the delivery address.
Click to view our free infographic on some common diacritical marks, broken up by geographical region.