A better version of this blog piece is published in Toyota Connected India Medium Link

What you will see in this post?

A Brief Introduction to JMESPath
- What is a json? Where is it used?
- Is JMESPath the only way to parse json documents?
How to install JMESPath CLI tool jp
What are the common JMESPath expressions?
Application of JMESPath in AWS CLI
- We will take practical, tough AWS cases and use a combination of jp commands to parse the json outputs

Brief Introduction

In this article, we are going to discuss how to leverage jmespath expressions for extracting elements from json documents. The command line module jp and the python module import jmespath are the two popular interfaces to jmespath. Not just Python, jmespath libraries are available in javascript, ruby and golang too.

In this article we can learn jmespath expressions through jp tool.

What is a JSON?

JSON (JavaScript Object Notation) is used everywhere. Typically they are seen/used
- as data interchange formats while developing
- as Logs in JSON format
- as Configurations in JSON Format
- while transfering data in Cloud Serverless Services

What are the different types of JSON

JSON (JavaScript Object Notation) represents structured data in key, value pairs
Types of JSON
- String: {"name":"Senthil"}
- dictionary
- list
- float or int
- boolean
- null

Examples of Valid Jsons

A typical dictionary type

{"name":"Senthil"}

A nested dictionary with list data type values

{
  "Android Phones": [
    [
      {
        "name": "Samsung Galaxy",
        "price": 899
      }
    ]
  ]
}

A dictionary with null value

{"name":null}

Checking the above is a valid json

echo "{\"name\":null}" > string_json.json 
python -c "import json; dict_list = json.load(open('string_json.json','r')); print(dict_list)"
{'name': None}

A json containing only a list of values is also

[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel"
]

Checking the above is a valid json

>> cat list_json.json
[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel"
]
>> python -c "import json; dict_list = json.load(open('list_json.json','r')); print(dict_list)"
['iPhone', 'Samsung Galaxy', 'Google Pixel']

What is JMESPath?

JMESPath stands for JSON Matching Expression Paths source
JMESPath is a query expression language for searching in JSON documents

How to install `jp`

If you use Mac device

brew install jmespath/jmespath/jp

Please note I have used a Mac OS terminal for all the below examples
If you use linux

sudo wget https://github.com/jmespath/jp/releases/latest/download/jp-linux-amd64 \
  -O /usr/local/bin/jp  && sudo chmod +x /usr/local/bin/jp

If you use Windows

scoop install jp

source

Common JP Expressions

1. Simple Retrieval of `Keys`

echo '{"field_1":30, "field_2":50}' | jp 'field_2'
50

echo '{"field":{"sub_field":30}}' | jp 'field.sub_field'
30

In the above examples, we have extracted a specific key by using . operator field.sub_field

2. Slicing `array` or `list` type field

echo '{"field_1":30, "field_2":50, "field_3":[1,2,3]}' | jp 'field_3[*]'
[
  1,
  2,
  3
]

echo '{"field_1":30, "field_2":50, "field_3":[1,2,3]}' | jp 'field_3[0]'
1

echo '{"field_1":30, "field_2":50, "field_3":[1,2,3]}' | jp '[field_3[0], field_3[2]]'
[
  1,
  3
]

In the above examples, we have used syntax such as
- [*] to extract all elements in an array
- [field[index], field[another_index]] to extract specific indices of an array

3. Slicing an array of dictionaries `products[{...}]` to fetch one of the keys `name`

cat data.json
{
  "products": [
    {
      "name": "iPhone",
      "price": 999
    },
    {
      "name": "Samsung Galaxy",
      "price": 899
    },
    {
      "name": "Google Pixel",
      "price": 799
    },
    {
      "name": "OnePlus",
      "price": 699
    }
  ]
}

jp -f data.json 'products[*].name'
[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel",
  "OnePlus"
]

In the above example, we have used [*] to look into all values in an array and then show only one field name

4. Filtering based on condition

Retrieve all values from a specific key name in an array prodcts where price greater than a specified value

jp -f data.json 'products[?price >= `799`].name'
[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel"
]

In the above example, we have used a condition on a field price to retrieve from an array products and then display only field name

5. Retrieve multiple values and make a new json

jp -f data.json '{"AndroidPhones":products[?name != `"iPhone"`].[{"android_phone_name":name, "price":price}]}' > android_phones_data.json && cat android_phones_data.json
{
  "AndroidPhones": [
    [
      {
        "android_phone_name": "Samsung Galaxy",
        "price": 899
      }
    ],
    [
      {
        "android_phone_name": "Google Pixel",
        "price": 799
      }
    ],
    [
      {
        "android_phone_name": "OnePlus",
        "price": 699
      }
    ]
  ]
}

6. Pipe Expressions

The above result can be made with Pipes (which give a sense of modularized expressions)

jp -f data.json '{"Android Phones":products[?name != `"iPhone"`]} | "Android Phones"[*]'

[
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "Google Pixel",
    "price": 799
  },
  {
    "name": "OnePlus",
    "price": 699
  }
]

7. Built-in Functions

There are so many built-in jmespath functions (refer here). Let us cover some of them. The rest of them should follow similar template.

A. sort_by, min_by, max_by

Sort an array in ascending order

jp -f data.json 'products[*] | sort_by(@,&price)' 
[
  {
    "name": "OnePlus",
    "price": 699
  },
  {
    "name": "Google Pixel",
    "price": 799
  },
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "iPhone",
    "price": 999
  }
]

Note:
- The &key_name is critical to refer to the variable inside a built-in function

Sort an array in descending order

jp -f data.json 'products | sort_by(@,&price) | reverse(@)'

[
  {
    "name": "iPhone",
    "price": 999
  },
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "Google Pixel",
    "price": 799
  },
  {
    "name": "OnePlus",
    "price": 699
  }
]

Note: - The use of @ sympbolizing the output from previous portion of the pipe to be used to the next stage

Maximum Element in an array

jp -f data.json 'products | max_by(@,&price)'

{
  "name": "iPhone",
  "price": 999
}

Minimum Element in an array

jp -f data.json 'products | min_by(@,&price) | name'

"OnePlus"

jp -u -f data.json 'products | min_by(@,&price) | name'

OnePlus

The Pipe expressions are modularized and easy to handle.
Note the argument -u (unquoted) to get string without quotes

B. contains official docs

jp -u -f data.json 'products | contains([].name,`"OnePlus"`)'

true

contains gives out true or false; simplest example contains('foobar','bar') will give true

jp -u -f data.json 'products[?contains(name, `"Plus"`)]'

[
  {
    "name": "OnePlus",
    "price": 699
  }
]

We can use contains to match a portion of text in a variable inside an array

C. join official docs

>> jp -f data.json 'products[*].name'

[
  "iPhone",
  "Samsung Galaxy",
  "Google Pixel",
  "OnePlus"
]

>> jp -f data.json 'join(`","`,products[*].name)'
"iPhone,Samsung Galaxy,Google Pixel,OnePlus"

>> jp -u -f data.json 'join(`","`,products[*].name)'
iPhone,Samsung Galaxy,Google Pixel,OnePlus

You can use the -u argument when you want the output to be displayed as plain, unquoted strings instead of valid JSON

D. keys official docs

>> jp -f data.json 'keys(@)'

[
  "products"
]

>> jp -f data.json 'products[0] | keys(@)'

[
  "name",
  "price"
]

8. Logical OR and &&

jp -f data.json 'products[?(price >  `699` && price < `999`)]'

[
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "Google Pixel",
    "price": 799
  }
]

jp -f data.json 'products[?(contains(name, `"Sam"`) || price < `899`)]'

[
  {
    "name": "Samsung Galaxy",
    "price": 899
  },
  {
    "name": "Google Pixel",
    "price": 799
  },
  {
    "name": "OnePlus",
    "price": 699
  }
]

Practical AWS Cases

1. Let us analyze an example of an AWS cli output json

aws lambda list-functions --output json >> aws_example.json && cat aws_example.json

{
  "Functions": [
    {
      "FunctionName": "my-function-1",
      "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-1",
      "Runtime": "nodejs12.x",
      "MemorySize": 128,
      "Timeout": 3,
      "LastModified": "2023-06-18T10:15:00Z"
    },
    {
      "FunctionName": "my-function-2",
      "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-2",
      "Runtime": "python3.8",
      "MemorySize": 256,
      "Timeout": 5,
      "LastModified": "2023-06-17T14:30:00Z"
    },
    {
      "FunctionName": "my-function-3",
      "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-3",
      "Runtime": "java11",
      "MemorySize": 512,
      "Timeout": 10,
      "LastModified": "2023-06-16T09:45:00Z"
    }
  ]
}

Q1. Query all lambda functions running `python`

Based on how you want to parse the output, you can have it as a list or just the first element by accessing [0]

jp -f aws_example.json 'Functions[?starts_with(Runtime,`"python"`)]'
[
  {
    "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-2",
    "FunctionName": "my-function-2",
    "LastModified": "2023-06-17T14:30:00Z",
    "MemorySize": 256,
    "Runtime": "python3.8",
    "Timeout": 5
  }
]

jp -f aws_example.json 'Functions[?starts_with(Runtime,`"python"`)] | [0]' 

{
  "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-2",
  "FunctionName": "my-function-2",
  "LastModified": "2023-06-17T14:30:00Z",
  "MemorySize": 256,
  "Runtime": "python3.8",
  "Timeout": 5
}

Q2. Query all lambda functions using memory more than 128 MB

jp -f aws_example.json 'Functions[?MemorySize > `128`]'

[
  {
    "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-2",
    "FunctionName": "my-function-2",
    "LastModified": "2023-06-17T14:30:00Z",
    "MemorySize": 256,
    "Runtime": "python3.8",
    "Timeout": 5
  },
  {
    "FunctionArn": "arn:aws:lambda:us-east-1:1234567890:function:my-function-3",
    "FunctionName": "my-function-3",
    "LastModified": "2023-06-16T09:45:00Z",
    "MemorySize": 512,
    "Runtime": "java11",
    "Timeout": 10
  }
]

2. Let us analyze a more complicated example from official jmespath tutorial.

It looks like the state of EC2 instances

cat official_example_for_nested.json
{
  "reservations": [
    {
      "instances": [
        {"type": "small",
         "state": {"name": "running"},
         "tags": [{"Key": "Name",
                   "Values": ["Web"]},
                  {"Key": "version",
                   "Values": ["1"]}]},
        {"type": "large",
         "state": {"name": "stopped"},
         "tags": [{"Key": "Name",
                   "Values": ["Web"]},
                  {"Key": "version",
                   "Values": ["1"]}]}
      ]
    }, {
      "instances": [
        {"type": "medium",
         "state": {"name": "terminated"},
         "tags": [{"Key": "Name",
                   "Values": ["Web"]},
                  {"Key": "version",
                   "Values": ["1"]}]},
        {"type": "xlarge",
         "state": {"name": "running"},
         "tags": [{"Key": "Name",
                   "Values": ["DB"]},
                  {"Key": "version",
                   "Values": ["1"]}]}
      ]
    }
  ]
}

Q1. Find all instances that are `running` and give me a count of them

jp -f official_example_for_nested.json 'reservations[].instances[?state.name == `"running"`][]'  
[
  {
    "state": {
      "name": "running"
    },
    "tags": [
      {
        "Key": "Name",
        "Values": [
          "Web"
        ]
      },
      {
        "Key": "version",
        "Values": [
          "1"
        ]
      }
    ],
    "type": "small"
  },
  {
    "state": {
      "name": "running"
    },
    "tags": [
      {
        "Key": "Name",
        "Values": [
          "DB"
        ]
      },
      {
        "Key": "version",
        "Values": [
          "1"
        ]
      }
    ],
    "type": "xlarge"
  }
]

jp -f official_example_for_nested.json 'length(reservations[].instances[?state.name == `"running"`][])'  

2

Two instances are running
Note the [] in the end to flatten the list. A simpler example below:

echo "[[0,1],2,3,[4,5,6]]" | jp '[]'

[
  0,
  1,
  2,
  3,
  4,
  5,
  6
]

Q2. Find the status of instances of type `large` or `xlarge`

jp -f official_example_for_nested.json 'reservations[].instances[?(type==`"xlarge"` || type==`"large"`)][]'

[
  {
    "state": {
      "name": "stopped"
    },
    "tags": [
      {
        "Key": "Name",
        "Values": [
          "Web"
        ]
      },
      {
        "Key": "version",
        "Values": [
          "1"
        ]
      }
    ],
    "type": "large"
  },
  {
    "state": {
      "name": "running"
    },
    "tags": [
      {
        "Key": "Name",
        "Values": [
          "DB"
        ]
      },
      {
        "Key": "version",
        "Values": [
          "1"
        ]
      }
    ],
    "type": "xlarge"
  }
]

Q3. When the instance is used for `DB` (`tags.Values == ["DB"]`), what is the instance type and give its `type` and `state` details alone

Well, If you happen to get a good answer in jp, let me know :)
I would rather use json module from python for more complex things

import json
instances_dict = json.load(open('official_example_for_nested.json', 'r'))

for element in instances_dict["reservations"]:
    for instance in element["instances"]:
        for tag in instance["tags"]:
            for value in tag["Values"]:
                if value == "DB":
                    print(f'Instance Type: {instance["type"]}')
                    print(f'Instance State: {instance["state"]}')
                    print(f'Instance Tags: {instance["tags"]}')

Instance Type: xlarge
Instance State: {'name': 'running'}
Instance Tags: [{'Key': 'Name', 'Values': ['DB']}, {'Key': 'version', 'Values': ['1']}]

Conclusion

JMESPath is a great tool to have in your arsenal, especially if you are a heavy cloud user
- All major cloud providers - AWS, Azure and Oracle Cloud use - jmespath. (Google Cloud has its own variation to parsing json that has a lot of similarities to jmespath)
But in one of the examples, we used python dictionary parsing instead of jp. This is because - the jp command would be hard to interpret and to debug on tough cases. In those complicated cases, you could parse the json in your programming lang of choice. The idea is to use jp commands intuitively and not overcomplicate for maintenance/interpretation.
There are some alternatives such as jq, JSONPath, etc.,
- JQ is a feature-rich command-line JSON processor specifically designed for JSON manipulation
- jmespath seems to be easy to adopt than jq and jq is more feature-rich than jmespath | opinionated source

Good Sources

JMESPath Official Page source
Official Examples used source1
YCombinator Discussion between JMESPATH and JQ | Source
Some tough examples sources for JP
- filters-and-multiselect-hashes
- list-and-slice-projections
Want to practice with a different example?
- Use the data in JQ Tutorial here: JQ Tutorial