Malformed vs. Invalid

2 minute read

It’s a small nuisance, but I often see error messages using the words “malformed” and “invalid” interchangeably. These words actually have different meanings. The distinction can become important when an issue is being debugged.

Malformed

If a payload is “malformed” that means it is not syntactically valid. This means that there is some syntax issue keeping it from being well-formed, so trying to parse it will result in an error. If the payload happens to be in a binary format, it could be said that it cannot be deserialized.

For example, the following JSON is malformed for several reasons:

{
  "key": "value,
  "key2:" value",
  "list": [
}

Invalid

If a payload is “invalid” that means it has failed validation. The payload could be parsed and therefore is well-formed, but is does not meet certain validation constraints. Since a program must be able to parse a payload in order to validate it, all invalid payloads are also well-formed1.

The fundamental difference is that these constraints enforce whether a program is willing to accept a payload rather than whether a program can understand it.

For example, age is not allowed to be negative in the following payload:

{
  "age": -12
}

The JSON communicates that the age key maps to the number -12, but knowing that ages must be non-negative a program may choose to reject it.

Edge Cases

What about this payload, where count is expected to be an integer?

{
  "count": "123"
}

This case may seem ambiguous, but even though an integer is expected, the payload is syntactically valid2. The program reading this payload can choose whether it is invalid or not. The syntax of JSON allows any value to be of any type so there are no problems with well-formedness.

The previous payload is distinct from this one below where the value of count is a number.

{
  "count": 123
}

It’s worth noting that many JSON parsers will automatically convert strings to integers when possible but this just changes what a program is willing to accept and not the definition of JSON syntax.3

Not all formats have the same approach to mismatched types. For example, in a protobuf document the types are specified making the distinction a matter of well-formedness rather than validity.

message Payload {
  int32 count = 1;
}

What about this payload with duplicated keys?

{
  "key": "value1",
  "key": "value2"
}

Again, it is syntactically valid but this time it is also ambiguous. I would argue that the most reasonable thing to do is to reject the payload as invalid due to the ambiguity. There are many cases where the payload must be interpreted anyways and some reasonable choice must be made.

  1. It’s possible (and performant) to write a parser which validates fields in a streaming fashion as they are parsed rather than all at once after the parsing is complete. Therefore, a document could be rejected for being invalid even though it is later malformed. For example:

    {
      "age": -12,
      "list": [
    }
    

    In this case, either message is useful. Both issues must be solved eventually. 

  2. https://www.json.org/json-en.html

    https://stackoverflow.com/questions/15368231/can-json-numbers-be-quoted 

  3. There are situations where a number must be encoded as a string, but again this is beyond the scope of syntax