• mina86.com

  • Categories
  • Code
  • Contact
  • Escaping markup with CDATA

    I’ve previously written about escaping special characters in HTML using character entities. This article discusses another method of achieving a similar result: CDATA sections. It explains what CDATA is, how it’s used in XML, SVG, MathML etc., when it can be included in HTML5 file and how it impacts XHTML.

    What is CDATA?

    A CDATA section in an XML document instructs the parser to interpret the enclosed text literally, as pure character data, not as markup. It starts with the <![CDATA[ sequence and ends with the ]]> sequence. Within a CDATA section, characters which normally have special meaning — namely less-than sign and ampersand — are processed verbatim without the need to escape them. It provides a convenient way to embed1 styles and scripts without having to worry about special characters as seen in Fig. 1:

    Styling external links, yea or nay?

    Dear Reader, I have a question: should external (or outbound) links be styled differently from internal ones? For example, in a sentence like: ‘the recent events made me reconsider my earlier position on the matter,’ should the first link (to an external website) have a different style than the second (to another page on the same website)? I’d love to hear your thoughts in comments below. I’m particularly interested in existing UX research on the subject.

    At the time of writing, this website uses an icon after the link to indicate it sends the user to a third-party website. This is inspired by Wikipedia which uses the same design. It adds information but also visual clutter and cognitive load. It doesn’t appear to be a common feature on the Internet. Hence my inquiry.


    Rather than leaving you, Dear Reader, with only a question, I’ll now present how to achieve this styling using three different CSS methods. The effect needs two elements: selecting external links and changing their appearance.

    Zapis dat

    Przeglądając Poradnię językową PWN natknąłem się na opinię dr. Adama Wolańskiego dot. zapisu dat; konkretnie nt. formy RRRR-MM-DD, którą dr Wolański bezapelacyjnie odrzucił. Wychodząc poza sferę języka polskiego, argumentował dalej, iż „w korespondencji z Wielką Brytanią datę zapiszemy cyfrowo jako 21/6/2009 lub 21/06/2009, lub 21.6.2009, a z USA — jako 6/21/2009 lub 06/21/2009.” Jest to opinia, która pomija istotę problemu i sens komentarza czytelnika Poradni. Z założenia jestem zwolennikiem przestrzegania reguł językowy, jednak w tym przypadku jasno trzeba przyjąć, iż korespondując z międzynarodowym odbiorcą, daty piszemy jako RRRR/MM/DD lub, wedle uznania, RRRR-MM-DD.

    W międzynarodowych projektach tożsamość i kompetencja językowa współpracowników często nie są znane. Może to być Brytyjczyk w Nowym Jorku, Amerykanin w Szwajcarii, Francuz w Londynie, Japończyk w San Francisco itp. Dodatkowo w przypadku niejednoznaczności czytelnik może próbować zgadywać co autor miał na myśli: „Wiadomość przyszła od Europejczyka, więc pewnie chodziło o zapis dzień/miesiąc.”

    Podstawową funkcją języka jest komunikacja. Zapis XX/YY/RRRR (czy też XX/YY bez określenia roku) komunikacji nie sprzyja. Wręcz przeciwnie; jest on źródłem nieporozumień i opóźnień. Z tego powodu powinien być odrzucony.

    Stop using pickle already. Seriously, stop it!

    Perusing glossy magazines,1 I was made aware of CVE-2024-2912 which describes how a POST request could lead to remote code execution (RCE) in BentoML servers. A feature most users would rather live without. Bugs happen and I don’t want to criticise the developers unjustly, but knowing the root of the issue was Python’s pickle module, I can only wonder: How the fuck is this still happening?

    pickle is insecure by design

    import pickle
    pickle.loads(b'cos\nsystem\n'
                 b'(S"echo evil"\ntR.')
    Example how ‘unpickling’ an insecure data leads to execution of a shell command.

    pickle serialisation uses a stack-based virtual machine with a ‘reduce’ operation which allows calling arbitrary Python functions (as shown in figure on the right). It’s no surprise it keeps popping up in security vulnerabilities. It’s been known for decades using picule invites trouble.2 The documentation highlights the dangers quite clearly, but that’s apparently not enough.

    Call to action

    I call upon you to stop this madness. There are easy steps you can take to make everyone safer:

    • If you see a junior developer type import pickle, mentor them and explain the module must never be used due to security holes.
    • If you see +import pickle line during a code review, reject the patch.
    • If you write Python code yourself, use an alternative serialisation method. Some options are listed below.
    • And finally, if you’re Python project member, deprecate pickle. Many Python features have been deprecated already, so backwards compatibility by itself is not a valid excuse. C managed to get rid of gets, I believe it’s possible to heal Python as well.3

    Are GMT and UTC the same thing?

    No. But yes. But no. It’s complicated, let me explain.

    Vintage pocket watch

    Greenwich Mean Time (GMT) is the local mean time in Greenwich, London. Local indicates it tracks the position of the Sun on the sky.1 However, because Earth’s rotation speed varies, a second according to GMT has different lengths on different days.

    Meanwhile, Coordinated Universal Time (UTC2) is based on atomic clocks which guarantee constant length of a second. Unfortunately, Earth refuses to conform to human standards. To account for that, leap seconds are occasionally applied to UTC. As necessary, one second can be added (resulting in time 23:59:60) or removed (resulting in day ending at 23:59:58) at the end of June or December.3

    While GMT and UTC use different methods for tracking time and adjusting to the Earth’s irregular rotation, they are synchronised to within 0.9 s and for everyday purposes they are the same.

    Unfortunately things can get more convoluted. Someone may incorrectly use GMT to refer to time zone in London which is UTC+1 during daylight saving time. Furthermore, throughout history there were additional conflicting definitions of GMT.

    Conclusion

    All in all, it’s best to use UTC to avoid ambiguity.

    No Nick, 7-bit colour depth is not enough

    In ‘Your Screen is Secretly 30 Years Old’ video on The Science Asylum channel, Nick Lucid argues that 7-bit colour depth is sufficient for screens, claiming that 2 million colours (vs 16 million at 8-bit colour depth) ‘would be more than enough for most people’:

    Can screens make fewer [than 16 million] colours without us noticing? The answer is absolutely yes. 16 million is an overkill. 2 million would be more than enough for most people. It’s just that controlling a screen with 16 million colours costs the same as the screen with 2 million because they use the same number of bytes.

    This article interrogates that statement. To provide a visual baseline, Fig. 1 compares two grey gradients: one uses 8 bits per component (bpc) while the other uses 7 bpc. Fewer colour discrete levels lead the 7 bpc gradient to exhibit clearly visible banding (where rather than smooth transition between colours, places where colours change can be identified) immediately undermining the premise that 7-bit colour depth is perceptually indistinguishable from higher bit-depths.

    Grey gradients using 8- and 7-bit colour depth
    Fig. 1 Grey gradients using different colour depths. Each goes from colour #10 10 10 to #7f 7f 7f but use different quantisation. The top gradient uses 112 distinct grey levels while the bottom one only 54.