Customize Vegetation Indices

Objectives

Derive a index that can better associate with the canopy area of alfalfa than NDVI does. The input image used in this tutorial is an orthoimage with 6 spectral channels. We will demonstrate how to utilize the output files to perform a simple feature selection.

Inspect output files

[1]:
# imports
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from scipy.stats import pearsonr
[2]:
# inspect the output files
os.listdir()
[2]:
['GRID_bin.png',
 'GRID_seg_border.png',
 'GRID.shx',
 '.DS_Store',
 'GRID.shp',
 'GRID_data.csv',
 'GRID.dbf',
 'GRID_image.npy',
 'GRID_seg_ID.png',
 '.ipynb_checkpoints',
 'GRID_kmeans.png',
 'GRID_raw.png']
[3]:
# inspect the input image
img = mpimg.imread('GRID_raw.png')
plt.figure(figsize=(8, 8))
plt.imshow(img)
[3]:
<matplotlib.image.AxesImage at 0x7f94b1002e90>
../_images/ch4_adv_custom_index_6_1.png
[4]:
# inspect the tabular output
dt = pd.read_csv("GRID_data.csv"); dt
[4]:
var row col area_all area_veg NDVI NDVI_std GNDVI GNDVI_std CNDVI ... ch_2 ch_2_std ch_3 ch_3_std ch_4 ch_4_std ch_5 ch_5_std ch_6 ch_6_std
0 unnamed_0 1 1 14910 6205 0.680722 0.101848 0.481553 0.086146 0.955003 ... 59.423529 15.689919 26.239484 8.893769 167.117808 21.852656 69.889927 10.829972 30.540088 24.633806
1 unnamed_23 1 2 10251 6074 0.676454 0.094532 0.475063 0.077119 0.942506 ... 59.718966 14.534593 26.431841 8.389850 165.603721 20.710229 69.595324 9.956617 26.050244 23.631348
2 unnamed_46 1 3 12996 8037 0.679914 0.089733 0.474278 0.074406 0.944246 ... 60.750404 14.811752 26.750280 8.541490 167.954585 21.471309 69.989051 10.539928 31.216141 24.766432
3 unnamed_69 1 4 11100 5435 0.643977 0.114235 0.442635 0.088716 0.879109 ... 65.185097 15.769556 28.936155 9.405345 166.668813 21.403908 68.871757 10.314017 20.444095 21.586253
4 unnamed_92 1 5 12506 5645 0.659209 0.118899 0.460301 0.092747 0.913120 ... 62.443933 16.140226 27.421612 9.592832 166.453853 19.131734 69.195748 9.295681 17.890222 21.159997
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
271 unnamed_183 23 8 13851 6455 0.743537 0.114225 0.598427 0.093205 1.154553 ... 44.106274 12.926948 20.407901 8.120920 173.637180 21.825689 72.545778 11.594452 38.396902 19.955167
272 unnamed_206 23 9 12877 6293 0.735538 0.113924 0.594051 0.090958 1.140933 ... 44.571746 13.482742 21.143334 8.533524 172.578738 20.879160 72.617670 11.297813 34.843954 19.919131
273 unnamed_229 23 10 12880 7028 0.751853 0.099948 0.605144 0.079713 1.168800 ... 43.526323 11.750768 20.231929 7.710727 175.287137 22.499670 74.050370 12.060664 40.409932 21.961529
274 unnamed_252 23 11 13122 6433 0.734485 0.123072 0.590896 0.098009 1.137487 ... 45.554018 14.221262 21.560547 9.116043 174.559304 22.556524 73.487331 12.153618 44.907508 16.025357
275 unnamed_275 23 12 15252 7576 0.740249 0.106438 0.593371 0.087980 1.144174 ... 45.191526 12.338402 21.199842 8.073755 175.554250 23.091342 74.060058 12.693901 43.955649 18.627713

276 rows × 29 columns

[5]:
# subset columns of canopy area, NDVI, and spectral signals
dt_idx = dt[["area_veg", "NDVI"] + ["ch_%d" % i for i in range(1, 7)]].dropna(); dt_idx
[5]:
area_veg NDVI ch_1 ch_2 ch_3 ch_4 ch_5 ch_6
0 6205 0.680722 32.572442 59.423529 26.239484 167.117808 69.889927 30.540088
1 6074 0.676454 32.683240 59.718966 26.431841 165.603721 69.595324 26.050244
2 8037 0.679914 32.748165 60.750404 26.750280 167.954585 69.989051 31.216141
3 5435 0.643977 36.915731 65.185097 28.936155 166.668813 68.871757 20.444095
4 5645 0.659209 35.175554 62.443933 27.421612 166.453853 69.195748 17.890222
... ... ... ... ... ... ... ... ...
271 6455 0.743537 26.120372 44.106274 20.407901 173.637180 72.545778 38.396902
272 6293 0.735538 27.076593 44.571746 21.143334 172.578738 72.617670 34.843954
273 7028 0.751853 25.338788 43.526323 20.231929 175.287137 74.050370 40.409932
274 6433 0.734485 27.561324 45.554018 21.560547 174.559304 73.487331 44.907508
275 7576 0.740249 26.680042 45.191526 21.199842 175.554250 74.060058 43.955649

270 rows × 8 columns

Design index formulas

[6]:
# make a dictionary of formulas
# where x0 and x1 stand for channel signals from two different (same) columns
ls_formula = {"(x0 - x1) / (x0 + x1)": lambda x0, x1: (x0 - x1) / (x0 + x1),
              "x0 / x1":               lambda x0, x1: x0 / x1,
              "(x0 ^ 2) + (x1 ^ 2)":   lambda x0, x1: (x0 ** 2) + (x1 ** 2)}

Compute correlations

We can compute Pearson’s correlation coefficients between the canopy areas (i.e., the column of “area_veg”) and derived indices

[15]:
# instantiate pandas dataframe
dt_results = pd.DataFrame(columns=["x0", "x1", "formula", "r2"])

# define the loop
cols_ignored = 2  # we won't loop over the first 2 columns, which are canopy and NDVI
p = 6  # number of channels we'd like to loop over

# pairwise iterate through 6 spectral bands
for i in range(p):
    for j in range(p):
        # get band signals and store them in x0 and x1
        x0 = dt_idx.iloc[:, cols_ignored + i]
        x1 = dt_idx.iloc[:, cols_ignored + j]

        # compute 3 indices from x0 and x1
        ls_idx = [list(ls_formula.values())[i](x0, x1) for i in range(3)]

        # store results into a temporary dataframe
        dt_temp = pd.DataFrame({
            "x0":      i,
            "x1":      j,
            "formula": list(ls_formula.keys()),
            "r":       [pearsonr(dt_idx["area_veg"], ls_idx[i])[0]
                        for i in range(3)],
            "r2":      [pearsonr(dt_idx["area_veg"], ls_idx[i])[0] ** 2
                        for i in range(3)]
        })

        # append our result dataframe
        dt_results = dt_results.append(dt_temp)
[8]:
# handle results from zero division error
dt_results.loc[dt_results.r2.isna(), ["r2", "r"]] = 0
dt_results
[8]:
x0 x1 formula r2 r
0 0 0 (x0 - x1) / (x0 + x1) 0.000000 0.000000
1 0 0 x0 / x1 0.000000 0.000000
2 0 0 (x0 ^ 2) + (x1 ^ 2) 0.230163 -0.479753
0 0 1 (x0 - x1) / (x0 + x1) 0.124206 -0.352429
1 0 1 x0 / x1 0.134024 -0.366092
... ... ... ... ... ...
1 5 4 x0 / x1 0.010826 0.104048
2 5 4 (x0 ^ 2) + (x1 ^ 2) 0.050595 0.224933
0 5 5 (x0 - x1) / (x0 + x1) 0.000000 0.000000
1 5 5 x0 / x1 0.000000 0.000000
2 5 5 (x0 ^ 2) + (x1 ^ 2) 0.003587 0.059888

108 rows × 5 columns

Heatmaps of correlations

This visualization can show how derived indices associate with canopy areas

[9]:
# define font size
size_title = 48
size_label = 32
size_tick  = 36

# generate 3 heatmaps for each formula
fig, axes = plt.subplots(1, 3, figsize=(30, 10), constrained_layout=True)
for i in range(3):
    # get r2 values by each formula
    formula = list(ls_formula.keys())[i]
    ls_r2 = dt_results.loc[dt_results.formula == formula, "r2"]

    # reshape the list to a 2D matrix
    M = np.array(ls_r2).reshape((p, p))

    # plot interface
    axes[i].set_title(list(ls_formula.keys())[i], fontsize=size_title)
    axes[i].set_xlabel("x1", fontsize=size_tick)
    axes[i].set_ylabel("x0", fontsize=size_tick)
    for x0 in range(p):
        for x1 in range(p):
            text = axes[i].text(x0, x1, round(M[x1, x0], 2),
                                ha="center", va="center",
                                color="w", fontsize=size_label)
    plt.sca(axes[i])
    plt.xticks(ticks=np.arange(p), fontsize=size_tick)
    plt.yticks(ticks=np.arange(p), fontsize=size_tick)

    # show heatmap
    axes[i].imshow(M)
../_images/ch4_adv_custom_index_15_0.png

Tabular Results of selected indices

Also we can use in a dataframe to inspect which indices perform better than NDVI

[10]:
# compute correlation between NDVI and canopy area
r2_ndvi = pearsonr(dt_idx["area_veg"], dt_idx["NDVI"])[0] ** 2
r2_ndvi
[10]:
0.1975636107759065
[11]:
# extract rows containing better indices
dt_results.loc[dt_results.r2 > r2_ndvi].sort_values(by="r2", ascending=False)
[11]:
x0 x1 formula r2 r
1 0 4 x0 / x1 0.238634 -0.488502
1 0 2 x0 / x1 0.230562 -0.480169
2 0 0 (x0 ^ 2) + (x1 ^ 2) 0.230163 -0.479753
0 0 2 (x0 - x1) / (x0 + x1) 0.220862 -0.469959
0 2 0 (x0 - x1) / (x0 + x1) 0.220862 0.469959
0 0 4 (x0 - x1) / (x0 + x1) 0.214008 -0.462610
0 4 0 (x0 - x1) / (x0 + x1) 0.214008 0.462610
1 2 0 x0 / x1 0.213051 0.461574
1 0 3 x0 / x1 0.209906 -0.458155
2 0 2 (x0 ^ 2) + (x1 ^ 2) 0.199044 -0.446143
2 2 0 (x0 ^ 2) + (x1 ^ 2) 0.199044 -0.446143
0 0 3 (x0 - x1) / (x0 + x1) 0.198422 -0.445445
0 3 0 (x0 - x1) / (x0 + x1) 0.198422 0.445445

Validate the index

Based on the results shown above, now we know that by dividing the 1st channel by the 5th channel can derive the best canopy-tracking index. we can validate this result by visualize the index as an image.

[13]:
# load image
img = np.load("GRID_image.npy")
img_idx = img[:, :, 0] / (img[:, :, 4] + 1e-9) # adding 1e-9 to avoid zero division error

# as we see the negative correlation,
# we have to treat the largest values as the smallest ones,
# and turn the smallest values to the largest values. (I call it "flip")
img_idx_flip = abs(img_idx - img_idx.max())

# and we can compare the result with the RGB image
fig, axes = plt.subplots(1, 2, figsize=(18, 8), constrained_layout=True)
axes[0].imshow(img[:, :, :3])
axes[1].imshow(img_idx_flip)
[13]:
<matplotlib.image.AxesImage at 0x7f94d05a9c10>
../_images/ch4_adv_custom_index_20_1.png