Unicode Preparation

Overview

As you know, bits can be interpreted to be pretty much anything! One thing that we obviously need them to represent are the characters that human languages rely on. However, there are several different standard ways of encoding characters into bits and we’ll be thinking about two of them in depth: ASCII and UTF-8. Today, you’ll read about the ideas behind these two so we can think more deeply about these encodings in class.

Basic Learning Objectives

Before class you should be able to:

Explain the high level of ASCII
Explain the problem with ASCII and the idea of UTF-8 at a high level

Advanced Learning Objectives

After class, you should be able to:

Encode and decode ASCII characters
Encode and decode Unicode codepoints using the UTF-8 encoding
Implement UTF-8 in C

Resources

Read the following:

Checks

Submit answers to the following on Moodle (I recommend you do them by hand with this table )as reference to practice for the exam):

Convert the codepoint 1F to its UTF-8 binary encoding.
Decode the following UTF-8 binary encoding to its codepoint: 11000110 10111010.
Have you been able to run the starter code for homework 2?