What is Unicode?

Author Topic: What is Unicode?  (Read 2718 times)

Offline kazi shahin

  • Hero Member
  • *****
  • Posts: 607
  • Fear is not real
    • View Profile
    • Personal website of kazi shahin
What is Unicode?
« on: August 22, 2010, 10:02:05 PM »
Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.

These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.
Kazi Shahin                   
092-15-795
Department of CSE   
Cell : 01718 699 590
Blood Group: O+
Google + :  https://plus.google.com/u/0/101741817431143727344/about?hl=en
Facebook : http://www.facebook.com/kazishahin.rahman
Web : http://www.kazishahin.com/

Offline kazi shahin

  • Hero Member
  • *****
  • Posts: 607
  • Fear is not real
    • View Profile
    • Personal website of kazi shahin
Re: What is Unicode?
« Reply #1 on: August 22, 2010, 10:04:16 PM »
In order for a computer to be able to store text and numbers that humans can understand, there needs to be a code that transforms characters into numbers. The Unicode standard defines such a code by using character encoding.

Kazi Shahin                   
092-15-795
Department of CSE   
Cell : 01718 699 590
Blood Group: O+
Google + :  https://plus.google.com/u/0/101741817431143727344/about?hl=en
Facebook : http://www.facebook.com/kazishahin.rahman
Web : http://www.kazishahin.com/

Offline kazi shahin

  • Hero Member
  • *****
  • Posts: 607
  • Fear is not real
    • View Profile
    • Personal website of kazi shahin
Re: What is Unicode?
« Reply #2 on: August 22, 2010, 10:13:41 PM »
ASCII which stands for American Standard Code for Information Interchange became the first widespread encoding scheme. However, it is limited to only 128 character definitions. Which is fine for the most common English characters, numbers and punctuation but is a bit limiting for the rest of the world. They naturally wanted to be able to encode their characters too. And, for a little while depending on where you were, there might be a different character being displayed for the same ASCII code. In the end, the other parts of the world began creating their own encoding schemes and things started to get a little bit confusing. Not only were the coding schemes of different lengths, programs needed to figure out which encoding scheme they were meant to be using.

It became apparent that a new character encoding scheme was needed and the Unicode standard was created. The objective of Unicode is to unify all the different encoding schemes so that the confusion between computers can be limited as much as possible. These days the Unicode standard defines values for over 100,000 characters and can be seen at the Unicode Consortium. It has several character encoding forms, UTF standing for Unicode Transformation Unit:
Kazi Shahin                   
092-15-795
Department of CSE   
Cell : 01718 699 590
Blood Group: O+
Google + :  https://plus.google.com/u/0/101741817431143727344/about?hl=en
Facebook : http://www.facebook.com/kazishahin.rahman
Web : http://www.kazishahin.com/

Offline kazi shahin

  • Hero Member
  • *****
  • Posts: 607
  • Fear is not real
    • View Profile
    • Personal website of kazi shahin
Re: What is Unicode?
« Reply #3 on: August 22, 2010, 10:14:56 PM »
UTF-8: only uses one byte (8 bits) to encode English characters. It can use a sequence of bytes to encode the other characters. UTF-8 is widely used in email systems and on the Internet.

UTF-16: uses two bytes (16 bits) to encode the most commonly used characters. If needed, the additional characters can be represented by a pair of 16-bit numbers.

UTF-32: uses four bytes (32 bits) to encode the characters. It became apparent that as the Unicode standard grew a 16-bit number is too small to represent all the characters. UTF-32 is capable of representing every Unicode character as one number.
Kazi Shahin                   
092-15-795
Department of CSE   
Cell : 01718 699 590
Blood Group: O+
Google + :  https://plus.google.com/u/0/101741817431143727344/about?hl=en
Facebook : http://www.facebook.com/kazishahin.rahman
Web : http://www.kazishahin.com/