If you have been learning about ENS and the underlying technology, you have most likely come across the term “normalization”, or seen ENS domains that have an error message about a “normalization” conflict.
Many within the ENS community ask what normalization is in term of different ENS domains.
Normalization will allow “currently invalid ENS domains” to be valid, if the ENS name falls “within the approved normalization criteria”. Once an address becomes valid, it will have the full utility of the ENS protocol.
In this technical article, we will explain what normalization is, its importance, and why it’s silently flying under the radar, to create clarity and utility for decentralized and human-readable Web3 naming.
What is ENS Normalization
Normalization is an essential part of any database design, and it is important to understand the fundamentals of normalization and the various forms that it can take. By following the rules of normalization, you can ensure that your database is optimized for performance and stability.
ENS (Ethereum Name Service) normalization refers to the process of ensuring that domain names in the ENS system are standardized and consistent in format.
In terms of ENS, there are “technically different characters” for the “same visible character”, and normalization will ensure “which of many versions of a character” is the “actually technical character” in the ENS domain name.
This is important because it allows users standardize the right characters, when using the easy-to-remember and utilize the ENS-Ether-Web3-addresses that they are looking for, (rather than having to remember long and complex hexadecimal strings). Additionally, ENS normalization can help prevent phishing and other types of fraud by making it easier for users and software to prevent attacks, and verify that users are interacting with the actual intended address.
Key Benefits for Normalization
- The “Beautifier” allows for emojis to be restored to their actual appearance.
- Decreases the chances of Spoofs or Scams.
- Eliminates “combining marks” from “being stacked”.
- Disallow “dash-like marks”, which aren’t actual hyphens.
- Eliminating whole-script confusable; the visual inspection of the entire label.
What Does ENS Normalization Solve?
There’s a group of characters that have been verbally approved for normalization.
- $ Cashtag
- _ Underscore
- ’ Apostrophe
- – Negative
The ENS Ethmoji community is excited about how beneficial normalization will be for the emoji keycaps. Currently, the keycaps are jumbled when displayed on desktops. Removing the FEOF ( – Test End-of-File Indicator) in keycaps allows for them to become “beautified”.
The normalization update will also relieve discrepancies with hidden characters and combining marks. There are many moving parts to launching this update. Deciding which characters, marks, potential whitelisting of names, beautification, and more. The complexity of configuring normalization and many of the developers working on this part-time require having the patience to launch something of this caliber.
Who is Responsible for the ENS Normalization?
Successfully delivering the normalization function takes a dedicated team to deliver the feature. Many would agree that Raffy.eth is the prominent developer leading the way in the evolution of normalization.
Raffy has stated he’s been working on the normalization project on and off for over a year. He does this part-time and views it as a hobby. He’s most popular in the Ethmoji community by the utility and enthusiasm he provides to the emoji community. Raffy has developed many tools for ENS to help define what will or won’t be normalized currently.
He’s also written Punycode and NFC contracts. Punycode is a representation of Unicode with a limited ASCII character subset used for internet hostnames.
Once normalization is fully launched Raffy would like to help people integrate his library if capable.
- Raffy.eth: ENS Resolver
- Raffy.eth: Keccak Hasher
- Raffy.eth: Punycode coder
- Raffy.eth: ENS Emoji Frequency
Raffy is one of the main communicators on the ENS Dao forum about normalization. He’s consistently providing updates while also seeking feedback from the community.
There are many ways to type characters in different formats. Numerous ways to type a zero can make it confusing to tell which ENS is the validated address. Some names have invisible characters that can’t be seen by the naked eye. Being able to normalize characters will allow to safeguard people more accurately buying/selling and minting ENS names.
Many are familiar with hyphens and how they’re used in many scenarios. The bulk of different combining marks isn’t normalized. This method is used to create alternative names with additional marks some being unnoticeable. Raffy.eth has stated to disallow anything th
at is dash-like but isn’t a hyphen during his Town Hall Q3 presentation with ENS Dao.
Confusables is a python package that allows for analyzing and matching words that appear of similarities. Hello,/H3llo. In other words, using different fonts and characters that can look similar or identical. Being able to properly configure this is time-consuming and takes trial and error.
Many of the characters that are confused are disallowed by Raffy’s rules. Raffy provided me an example of the letter “2” being confusable.
Raffy has preprocessed normalization to remove all the sequences that idna (internationalizing Domain Name in Applications) already maps and disallows. Seeing the final output is what can fully determine what’s confusable.
The method Raffy uses to edit confusables is making a few changes and then rerunning the derived code. Doing this will show if there were any flaws during the editing.
Finding Information on Norm
Finding information on normalization can be challenging if you’re not already established within the ENS community. I’ve determined a few places to consume the latest information.
The ENS DAO Forum will lead to great discussions on the implementation of the normalization features and feedback. Approving/disapproving characters and symbols for normalization with reasoning why. A recent hot discussion was the potential whitelisting of already minted non-RGI emoji names.
Zombiehacker recently made a point about why whitelisting a few ENS names could be beneficial to the ecosystem. He pointed out a 1922 penny is worth $500 because of a mistake of it lacking the “D mark” while being produced. His logic of a few mistakes creating value and being part of human society enables community members to have a valid discussion.
- Confusable emojis are an unsolved problem
- All invisible characters are addressed with his library
- A massive amount of confusables need to be sorted out
- Has rewritten the Ensip three times
I was fortunate enough to be able to ask Raffy.eth a series of questions regarding general questions about normalization and updates.
An issue between sharing characters between multiple scripts was discovered while updating the ENSIP document. Two Unicode script adjustments consist of allowing restricted script access to its currency symbol or allowing a scripted character to be used universally. There are also confusable characters which are a set of different contiguous sequences which compose visually indistinguishable.
Mapping (working identically) of certain letters has been discussed by how certain languages allow them to be a digraph (a combination of two letters representing one sound). However, this can lead to wandering down an endless rabbit hole.
Cleaning up the Latin characters cleaned up has allowed for many of the unnecessary Greek and Cyrillic whole-script confusable disappear. Raffy.eth has continued to try to remove all of the Greek characters from Latin.
What’s clear is the importance of successfully launching normalization for ENS and the benefits that every participant will be able to access.
What isn’t clear is the timeframe of the launch and what will be included in the launch. This is a tedious task that takes many gigabrains to complete with trial-and-error testing. Once the next ensip update is finished hopefully the ENS community will be updated with a timeline to launch normalization.