A Gentle Introduction to JMESPath - an intuitive way to parse JSON documents

I discovered jmespath when using aws cli --query. This post is an earnest attempt to spread the word about JMESPath expressions
Coding
Bash Scripting
Author

Senthil Kumar

Published

January 12, 2023

A better version of this blog piece is published in Toyota Connected India Medium Link

What you will see in this post?

  • A Brief Introduction to JMESPath
    • What is a json? Where is it used?
    • Is JMESPath the only way to parse json documents?
  • How to install JMESPath CLI tool jp
  • What are the common JMESPath expressions?
  • Application of JMESPath in AWS CLI
    • We will take practical, tough AWS cases and use a combination of jp commands to parse the json outputs

Brief Introduction

In this article, we are going to discuss how to leverage jmespath expressions for extracting elements from json documents. The command line module jp and the python module import jmespath are the two popular interfaces to jmespath. Not just Python, jmespath libraries are available in javascript, ruby and golang too.


In this article we can learn jmespath expressions through jp tool.


What is a JSON?

  • JSON (JavaScript Object Notation) is used everywhere. Typically they are seen/used
    • as data interchange formats while developing
    • as Logs in JSON format
    • as Configurations in JSON Format
    • while transfering data in Cloud Serverless Services

What are the different types of JSON

  • JSON (JavaScript Object Notation) represents structured data in key, value pairs
  • Types of JSON
    • String: {"name":"Senthil"}
    • dictionary
    • list
    • float or int
    • boolean
    • null

Examples of Valid Jsons

  • A typical dictionary type
{"name":"Senthil"}
  • A nested dictionary with list data type values
{
  "Android Phones": [
    [
      {
        "name": "Samsung Galaxy",
        "price": 899
      }
    ]
  ]
}
  • A dictionary with null value
{"name":null}

Checking the above is a valid json

echo "{\"name\":null}" > string_json.json 
python -c "import json; dict_list = json.load(open('string_json.json','r')); print(dict_list)"
{'name': None}
  • A json containing only a list of values is also
[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel"
]
  • Checking the above is a valid json
>> cat list_json.json
[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel"
]
>> python -c "import json; dict_list = json.load(open('list_json.json','r')); print(dict_list)"
['iPhone', 'Samsung Galaxy', 'Google Pixel']

What is JMESPath?

  • JMESPath stands for JSON Matching Expression Paths source
  • JMESPath is a query expression language for searching in JSON documents

How to install jp

  • If you use Mac device
brew install jmespath/jmespath/jp
  • Please note I have used a Mac OS terminal for all the below examples

  • If you use linux

sudo wget https://github.com/jmespath/jp/releases/latest/download/jp-linux-amd64 \
  -O /usr/local/bin/jp  && sudo chmod +x /usr/local/bin/jp
  • If you use Windows
scoop install jp

source

Common JP Expressions

1. Simple Retrieval of Keys

echo '{"field_1":30, "field_2":50}' | jp 'field_2'
50
echo '{"field":{"sub_field":30}}' | jp 'field.sub_field'
30
  • In the above examples, we have extracted a specific key by using . operator field.sub_field

2. Slicing array or list type field

echo '{"field_1":30, "field_2":50, "field_3":[1,2,3]}' | jp 'field_3[*]'
[
  1,
  2,
  3
]
echo '{"field_1":30, "field_2":50, "field_3":[1,2,3]}' | jp 'field_3[0]'
1
echo '{"field_1":30, "field_2":50, "field_3":[1,2,3]}' | jp '[field_3[0], field_3[2]]'
[
  1,
  3
]
  • In the above examples, we have used syntax such as
    • [*] to extract all elements in an array
    • [field[index], field[another_index]] to extract specific indices of an array

3. Slicing an array of dictionaries products[{...}] to fetch one of the keys name

cat data.json
{
  "products": [
    {
      "name": "iPhone",
      "price": 999
    },
    {
      "name": "Samsung Galaxy",
      "price": 899
    },
    {
      "name": "Google Pixel",
      "price": 799
    },
    {
      "name": "OnePlus",
      "price": 699
    }
  ]
}
jp -f data.json 'products[*].name'
[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel",
  "OnePlus"
]
  • In the above example, we have used [*] to look into all values in an array and then show only one field name

4. Filtering based on condition

  • Retrieve all values from a specific key name in an array prodcts where price greater than a specified value
jp -f data.json 'products[?price >= `799`].name'
[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel"
]
  • In the above example, we have used a condition on a field price to retrieve from an array products and then display only field name

5. Retrieve multiple values and make a new json

jp -f data.json '{"AndroidPhones":products[?name != `"iPhone"`].[{"android_phone_name":name, "price":price}]}' > android_phones_data.json && cat android_phones_data.json
{
  "AndroidPhones": [
    [
      {
        "android_phone_name": "Samsung Galaxy",
        "price": 899
      }
    ],
    [
      {
        "android_phone_name": "Google Pixel",
        "price": 799
      }
    ],
    [
      {
        "android_phone_name": "OnePlus",
        "price": 699
      }
    ]
  ]
}

6. Pipe Expressions

The above result can be made with Pipes (which give a sense of modularized expressions)

jp -f data.json '{"Android Phones":products[?name != `"iPhone"`]} | "Android Phones"[*]'

[
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "Google Pixel",
    "price": 799
  },
  {
    "name": "OnePlus",
    "price": 699
  }
]

7. Built-in Functions

There are so many built-in jmespath functions (refer here). Let us cover some of them. The rest of them should follow similar template.

A. sort_by, min_by, max_by

Sort an array in ascending order

jp -f data.json 'products[*] | sort_by(@,&price)' 
[
  {
    "name": "OnePlus",
    "price": 699
  },
  {
    "name": "Google Pixel",
    "price": 799
  },
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "iPhone",
    "price": 999
  }
]

Note:
- The &key_name is critical to refer to the variable inside a built-in function

  • Sort an array in descending order
jp -f data.json 'products | sort_by(@,&price) | reverse(@)'

[
  {
    "name": "iPhone",
    "price": 999
  },
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "Google Pixel",
    "price": 799
  },
  {
    "name": "OnePlus",
    "price": 699
  }
]

Note: - The use of @ sympbolizing the output from previous portion of the pipe to be used to the next stage

  • Maximum Element in an array
jp -f data.json 'products | max_by(@,&price)'

{
  "name": "iPhone",
  "price": 999
}
  • Minimum Element in an array
jp -f data.json 'products | min_by(@,&price) | name'

"OnePlus"
jp -u -f data.json 'products | min_by(@,&price) | name'

OnePlus
  • The Pipe expressions are modularized and easy to handle.
  • Note the argument -u (unquoted) to get string without quotes

B. contains official docs

jp -u -f data.json 'products | contains([].name,`"OnePlus"`)'

true
  • contains gives out true or false; simplest example contains('foobar','bar') will give true
jp -u -f data.json 'products[?contains(name, `"Plus"`)]'

[
  {
    "name": "OnePlus",
    "price": 699
  }
]
  • We can use contains to match a portion of text in a variable inside an array

C. join official docs

>> jp -f data.json 'products[*].name'

[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel",
  "OnePlus"
]

>> jp -f data.json 'join(`","`,products[*].name)'
"iPhone,Samsung Galaxy,Google Pixel,OnePlus"

>> jp -u -f data.json 'join(`","`,products[*].name)'
iPhone,Samsung Galaxy,Google Pixel,OnePlus
  • You can use the -u argument when you want the output to be displayed as plain, unquoted strings instead of valid JSON

D. keys official docs

>> jp -f data.json 'keys(@)'

[
  "products"
]

>> jp -f data.json 'products[0] | keys(@)'

[
  "name",
  "price"
]

8. Logical OR and &&

jp -f data.json 'products[?(price >  `699` && price < `999`)]'

[
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "Google Pixel",
    "price": 799
  }
]
jp -f data.json 'products[?(contains(name, `"Sam"`) || price < `899`)]'

[
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "Google Pixel",
    "price": 799
  },
  {
    "name": "OnePlus",
    "price": 699
  }
]

Practical AWS Cases

1. Let us analyze an example of an AWS cli output json

aws lambda list-functions --output json >> aws_example.json && cat aws_example.json

{
  "Functions": [
    {
      "FunctionName": "my-function-1",
      "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-1",
      "Runtime": "nodejs12.x",
      "MemorySize": 128,
      "Timeout": 3,
      "LastModified": "2023-06-18T10:15:00Z"
    },
    {
      "FunctionName": "my-function-2",
      "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-2",
      "Runtime": "python3.8",
      "MemorySize": 256,
      "Timeout": 5,
      "LastModified": "2023-06-17T14:30:00Z"
    },
    {
      "FunctionName": "my-function-3",
      "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-3",
      "Runtime": "java11",
      "MemorySize": 512,
      "Timeout": 10,
      "LastModified": "2023-06-16T09:45:00Z"
    }
  ]
}

Q1. Query all lambda functions running python

  • Based on how you want to parse the output, you can have it as a list or just the first element by accessing [0]
jp -f aws_example.json 'Functions[?starts_with(Runtime,`"python"`)]'
[
  {
    "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-2",
    "FunctionName": "my-function-2",
    "LastModified": "2023-06-17T14:30:00Z",
    "MemorySize": 256,
    "Runtime": "python3.8",
    "Timeout": 5
  }
]

jp -f aws_example.json 'Functions[?starts_with(Runtime,`"python"`)] | [0]' 

{
  "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-2",
  "FunctionName": "my-function-2",
  "LastModified": "2023-06-17T14:30:00Z",
  "MemorySize": 256,
  "Runtime": "python3.8",
  "Timeout": 5
}

Q2. Query all lambda functions using memory more than 128 MB

jp -f aws_example.json 'Functions[?MemorySize > `128`]'

[
  {
    "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-2",
    "FunctionName": "my-function-2",
    "LastModified": "2023-06-17T14:30:00Z",
    "MemorySize": 256,
    "Runtime": "python3.8",
    "Timeout": 5
  },
  {
    "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-3",
    "FunctionName": "my-function-3",
    "LastModified": "2023-06-16T09:45:00Z",
    "MemorySize": 512,
    "Runtime": "java11",
    "Timeout": 10
  }
]

2. Let us analyze a more complicated example from official jmespath tutorial.

  • It looks like the state of EC2 instances
cat official_example_for_nested.json
{
  "reservations": [
    {
      "instances": [
        {"type": "small",
         "state": {"name": "running"},
         "tags": [{"Key": "Name",
                   "Values": ["Web"]},
                  {"Key": "version",
                   "Values": ["1"]}]},
        {"type": "large",
         "state": {"name": "stopped"},
         "tags": [{"Key": "Name",
                   "Values": ["Web"]},
                  {"Key": "version",
                   "Values": ["1"]}]}
      ]
    }, {
      "instances": [
        {"type": "medium",
         "state": {"name": "terminated"},
         "tags": [{"Key": "Name",
                   "Values": ["Web"]},
                  {"Key": "version",
                   "Values": ["1"]}]},
        {"type": "xlarge",
         "state": {"name": "running"},
         "tags": [{"Key": "Name",
                   "Values": ["DB"]},
                  {"Key": "version",
                   "Values": ["1"]}]}
      ]
    }
  ]
}

Q1. Find all instances that are running and give me a count of them

jp -f official_example_for_nested.json 'reservations[].instances[?state.name == `"running"`][]'  
[
  {
    "state": {
      "name": "running"
    },
    "tags": [
      {
        "Key": "Name",
        "Values": [
          "Web"
        ]
      },
      {
        "Key": "version",
        "Values": [
          "1"
        ]
      }
    ],
    "type": "small"
  },
  {
    "state": {
      "name": "running"
    },
    "tags": [
      {
        "Key": "Name",
        "Values": [
          "DB"
        ]
      },
      {
        "Key": "version",
        "Values": [
          "1"
        ]
      }
    ],
    "type": "xlarge"
  }
]
jp -f official_example_for_nested.json 'length(reservations[].instances[?state.name == `"running"`][])'  

2
  • Two instances are running

  • Note the [] in the end to flatten the list. A simpler example below:

echo "[[0,1],2,3,[4,5,6]]" | jp '[]'

[
  0,
  1,
  2,
  3,
  4,
  5,
  6
]

Q2. Find the status of instances of type large or xlarge

jp -f official_example_for_nested.json 'reservations[].instances[?(type==`"xlarge"` || type==`"large"`)][]'

[
  {
    "state": {
      "name": "stopped"
    },
    "tags": [
      {
        "Key": "Name",
        "Values": [
          "Web"
        ]
      },
      {
        "Key": "version",
        "Values": [
          "1"
        ]
      }
    ],
    "type": "large"
  },
  {
    "state": {
      "name": "running"
    },
    "tags": [
      {
        "Key": "Name",
        "Values": [
          "DB"
        ]
      },
      {
        "Key": "version",
        "Values": [
          "1"
        ]
      }
    ],
    "type": "xlarge"
  }
]

Q3. When the instance is used for DB (tags.Values == ["DB"]), what is the instance type and give its type and state details alone

  • Well, If you happen to get a good answer in jp, let me know :)
  • I would rather use json module from python for more complex things
import json
instances_dict = json.load(open('official_example_for_nested.json', 'r'))

for element in instances_dict["reservations"]:
    for instance in element["instances"]:
        for tag in instance["tags"]:
            for value in tag["Values"]:
                if value == "DB":
                    print(f'Instance Type: {instance["type"]}')
                    print(f'Instance State: {instance["state"]}')
                    print(f'Instance Tags: {instance["tags"]}')
Instance Type: xlarge
Instance State: {'name': 'running'}
Instance Tags: [{'Key': 'Name', 'Values': ['DB']}, {'Key': 'version', 'Values': ['1']}]

Conclusion

  • JMESPath is a great tool to have in your arsenal, especially if you are a heavy cloud user

    • All major cloud providers - AWS, Azure and Oracle Cloud use - jmespath. (Google Cloud has its own variation to parsing json that has a lot of similarities to jmespath)
  • But in one of the examples, we used python dictionary parsing instead of jp. This is because - the jp command would be hard to interpret and to debug on tough cases. In those complicated cases, you could parse the json in your programming lang of choice. The idea is to use jp commands intuitively and not overcomplicate for maintenance/interpretation.

  • There are some alternatives such as jq, JSONPath, etc.,

    • JQ is a feature-rich command-line JSON processor specifically designed for JSON manipulation
    • jmespath seems to be easy to adopt than jq and jq is more feature-rich than jmespath | opinionated source

Good Sources