This initial clustering model analysis is exploring agricultural commodity loss nationwide, and how damage causes may be similar or dissimilar per commodity and year.  Our steps:

  1. We aggregate and summarize all insurance loss for an individual commodity for a set of years, creating a sparse matrix of damage causes for each county. We initially use PCA to reduce our dimensions, then we initially estimate the optimal number of clusters using several approaches (silhouette, within sum of squares, as well as a gap statistic).  After evaluating all these options, an optimal cluster number is chosen.
  2. Using this optimal cluster number, we visually examine our clusters (again, which are for a singular commodity, per year, at a county level).  As such, a cluster number is given to each county that has claims.  We then map our clusters spatially.  The example plots and maps below are for CORN.  Again, this methodology can be run for any commodity across the full time frame range of insurance claims.  Finally, the below analysis is for total loss ($).  We also are running analyses that are for total acres, loss per acre, as well as loss per claim.

Cluster Plots and Maps