Jump to content
  • Sky
  • Blueberry
  • Slate
  • Blackcurrant
  • Watermelon
  • Strawberry
  • Orange
  • Banana
  • Apple
  • Emerald
  • Chocolate
  • Charcoal
  • 0

replace string methods with unicode variants


The default string methods do not support unicode characters correctly (unicode.char(0x25A0):len() returns 3, for example).  Is it possible to replace the default string methods so that :len() references unicode.length()?  If not, is is possible to add methods for the unicode functions (perhaps :ulen(), :usub())?  I looked but could not find an answer to this.

Link to post
Share on other sites

5 answers to this question

Recommended Posts

  • 0
  • Solution


7 hours ago, Elijahlorden said:

But getmetatable does not return the string metatable.  I am assuming this is because the string metatable is protected for some reason.

you are correct, string metatables are protected, so you can‘t change or add functions.

I assume, unicode.len(string) is not what you want, or is it?

In case you want to have an object, you could interact with, you could build one.

local unicode = require"unicode"
local ucode = {}

local mt = {
  __len = function(self)
    return unicode.len(self.string)

function ucode.new(str)
  local obj = {
    string = str
  return setmetatable(obj, mt)

return ucode

Sorry for the code being just plane text, I’m on my mobile and dont know how to insert some code properly so i‘ll will make a proper one as soon as i get to my pc.

You then add all the missing functions of the unicode or string lib that you want to the metatable, and then you are good to go.

Btw, with the __len function, you should be able to call #objname, and it should return the correct length.

Link to post
Share on other sites
  • 0

A combo of a wrapper and a custom program env might be able to do what you want but you're right that it might just be just a lot of overhead for a few convenience functions.. It also wouldn't cover cases on raw string values like ("yolo"):len()

Link to post
Share on other sites
  • 0

it is better to think of these string functions as raw byte array methods, and our unicode methods as utf8 sequence methods

string.len is not a count of glyphs rendered on the screen, rather it is the number of chars in the string. a utf8 sequence may be one or more chars for a single glyph

string.sub selects a sequence of chars in a byte stream from [1, string.len]

unicode.len can be used for all utf8 strings to measure the number of logical utf8 sequences/glyphs

unicode.wlen measures the physical rendered width of a rendered utf8 string

unicode.sub deals with positions of sequence sets or glyphs. these indexes are in the range of [1, unicode.len]

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


  • Create New...

Important Information

By using this site, you agree to our Terms of Use and Privacy Policy.