The Pug Automatic

The delightful char data

Written December 21, 2015. Tagged Elixir.

Many Elixir functions (including most of IO) accept "char data".

I love char data. It's one of those small pieces of delight in Elixir – it makes me smile, and I miss it in other languages.

I'll tell you what I love about it, but before we get to that…

What is char data?

Char data is:

  • Strings, like "a"
  • Lists of codepoints, like [97, 0x00E6] (which can also be written as [97, 230], [?a, ?æ] or 'aæ')
  • Lists of codepoints and strings like [97, "a"]
  • Lists of char data (recursively!), like ["a", [97, [["a"]]]]

You can for example pass char data to IO.puts:

IO.puts ["a", [97, [["a"]]]]
# outputs: aaa

Or if you want to turn it into a straight string:

IO.chardata_to_string ["a", [97, [["a"]]]]
# => "aaa"

The char data type comes from Erlang, where it's part of the :unicode module.

IO data

There is another quite similar concept called IO data (in Elixir) or IO lists (in Erlang). The difference is that these by default treat integers as Latin-1 bytes (allowing values 0–255) instead of as UTF-8 codepoints, so you might end up with binaries that are not UTF-8 strings:

IO.iodata_to_binary ["a", 230]
# => <<97, 230>>

IO.chardata_to_string ["a", 230]
# => "aæ"

# The codepoint 230 becomes the bytes 195 and 166.
IO.chardata_to_string(["a", 230]) <> <<0>>
# => <<97, 195, 166, 0>>

# IO data doesn't allow values above 255.
IO.iodata_to_binary ["a", 256]
# ** (ArgumentError) argument error
# :erlang.iolist_to_binary(["a", 256])

# Char data allows values above 255.
IO.chardata_to_string ["a", 256]
# => "aĀ"

So what is it for?

Char data lets you build up output strings by nesting lists, instead of constantly concatenating strings yourself. If you pass this value to a function that accepts char data, you never have to concatenate it yourself.

You can use it in all sorts of situations. Here are some I've run into.

IO.ANSI

The IO.ANSI module provides ANSI codes to format text in a terminal. Colors and such.

You can use it with char data like this:

status = [IO.ANSI.green, "ON", IO.ANSI.reset]
IO.puts ["The status is: ", status]

I use this quite extensively in the API (and implementation) of my progress_bar library.

Phoenix.HTML

Phoenix.HTML provides HTML helpers for the Phoenix web framework.

You want to pass user-provided input through its helpers to avoid security issues. And of course you can do that as char data:

import Phoenix.HTML

html_escape [
content_tag(:span, label_text, class: "my-label"),
" ",
value_text,
]

I've used this in my BEAM Toolbox project.

Strictly speaking, Phoenix.HTML works with IO lists, not char data, at the time of writing – I'm looking into why.

Code golf

While this is admittedly of limited use, char data is great if you're golfing and want to keep the character count down.

These are equivalent:

IO.puts [a,?b,c,d,e,10]
IO.puts "#{a}b#{c}#{d}#{e}\n"

(10 is the ASCII code for a newline.)

I used this for my plain Christmas tree.

I also used this for my insane blinking Christmas tree, where it was a particularly good fit together with IO.ANSI, Stream.cycle and Enum.take.

Performance

Performance hasn't been my reason for using char data over string concatenation, but that is also a factor, supposedly.

Benchmarking the performance is outside the scope of this blog post, but I'd love to hear about it if anyone wants to experiment.

As I understand it, Erlang strings (single-quoted strings in Elixir, 'foo') are particularly expensive to concatenate because they are implemented as linked lists (slow traversal and lots of garbage collection); concatenating binaries (double-quoted strings in Elixir, "foo") is cheaper but still not ideal. For more details, see the "IO Lists" section of Learn You Some Erlang for Great Good!.

Also, supposedly the eventual concatenation of an IO list (or char data?) happens in C, for speed. Thank you to Vienna BEAMers for tweeting me this link and others.

If you have more insight into the performance side of things, or more examples of the delight of char data (or IO data), please let me know in the comments or on Twitter!