[ALWAYS] UNDER CONSTRUCTION...

Learn More

NEXT PROJECT

Who is Jordan Bell?

A Statistical look at the former Oregon Fighting Duck and newest Golden State Warrior, Jordan Bell.

  • [CS231n]
    E1Q2: Training a Support Vector Machine

    A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane and are used with associated learning algorithms to analyze data used for classification and regression analysis. Here we will implement and optimize a fully-vectorized loss function and analytic gradient to better classify the images in the CIFAR-10 image set.

    6

    [CS231n]
    E1Q2: Training a Support Vector Machine

    ARTICLE 6
  • [CS231n]
    E1Q1: k-Nearest Neighbor Classifier

    In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In this assignment we utilize a very basic implementation to predict image labels in the CIFAR-10 Dataset.

    5

    [CS231n]
    E1Q1: k-Nearest Neighbor Classifier

    The kNN classifier consists of two stages:

    - During training, the classifier takes the training data and simply remembers it

    - During testing, kNN classifies every test image by comparing to all training images and transfering the labels of the k most similar training examples

    - The value of k is cross-validated

    In the following we will implement these steps to better understand the basic Image Classification pipeline, cross-validation, and gain proficiency in writing efficient, vectorized code.

    INITIAL SETUP

    Q1: k-Nearest Neighbor classifier

    In [1]:
    # Run some setup code for this notebook.
    
    import random
    import numpy as np
    from cs231n.data_utils import load_CIFAR10
    import matplotlib.pyplot as plt
    
    from __future__ import print_function
    
    # This is a bit of magic to make matplotlib figures appear inline in the notebook
    # rather than in a new window.
    %matplotlib inline
    plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
    plt.rcParams['image.interpolation'] = 'nearest'
    plt.rcParams['image.cmap'] = 'gray'
    
    # Some more magic so that the notebook will reload external python modules;
    # see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
    %load_ext autoreload
    %autoreload 2
    
    In [2]:
    # Load the raw CIFAR-10 data.
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
    
    # As a sanity check, we print out the size of the training and test data.
    print('Training data shape: ', X_train.shape)
    print('Training labels shape: ', y_train.shape)
    print('Test data shape: ', X_test.shape)
    print('Test labels shape: ', y_test.shape)
    
    Training data shape:  (50000, 32, 32, 3)
    Training labels shape:  (50000,)
    Test data shape:  (10000, 32, 32, 3)
    Test labels shape:  (10000,)
    
    In [3]:
    # Visualize some examples from the dataset.
    # We show a few examples of training images from each class.
    classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
    num_classes = len(classes)
    samples_per_class = 7
    for y, cls in enumerate(classes):
        idxs = np.flatnonzero(y_train == y)
        idxs = np.random.choice(idxs, samples_per_class, replace=False)
        for i, idx in enumerate(idxs):
            plt_idx = i * num_classes + y + 1
            plt.subplot(samples_per_class, num_classes, plt_idx)
            plt.imshow(X_train[idx].astype('uint8'))
            plt.axis('off')
            if i == 0:
                plt.title(cls)
    plt.show()
    
    In [4]:
    # Subsample the data for more efficient code execution in this exercise
    num_training = 5000
    mask = list(range(num_training))
    X_train = X_train[mask]
    y_train = y_train[mask]
    
    num_test = 500
    mask = list(range(num_test))
    X_test = X_test[mask]
    y_test = y_test[mask]
    
    In [5]:
    # Reshape the image data into rows
    X_train = np.reshape(X_train, (X_train.shape[0], -1))
    X_test = np.reshape(X_test, (X_test.shape[0], -1))
    print(X_train.shape, X_test.shape)
    
    (5000, 3072) (500, 3072)
    
    In [6]:
    from cs231n.classifiers import KNearestNeighbor
    
    # Create a kNN classifier instance. 
    # Remember that training a kNN classifier is a noop: 
    # the Classifier simply remembers the data and does no further processing 
    classifier = KNearestNeighbor()
    classifier.train(X_train, y_train)
    

    We would now like to classify the test data with the kNN classifier. Recall that we can break down this process into two steps:

    1. First we must compute the distances between all test examples and all train examples.

    2. Given these distances, for each test example we find the k nearest examples and have them vote for the label

    Lets begin with computing the distance matrix between all training and test examples. For example, if there are Ntr training examples and Nte test examples, this stage should result in a Nte x Ntr matrix where each element (i,j) is the distance between the i-th test and j-th train example.

    First, open `cs231n/classifiers/k_nearest_neighbor.py` and implement the function `compute_distances_two_loops` that uses a (very inefficient) double loop over all pairs of (test, train) examples and computes the distance matrix one element at a time.

    We would now like to classify the test data with the kNN classifier. Recall that we can break down this process into two steps:

    1. First we must compute the distances between all test examples and all train examples.
    2. Given these distances, for each test example we find the k nearest examples and have them vote for the label

    Lets begin with computing the distance matrix between all training and test examples. For example, if there are Ntr training examples and Nte test examples, this stage should result in a Nte x Ntr matrix where each element (i,j) is the distance between the i-th test and j-th train example.

    First, open cs231n/classifiers/k_nearest_neighbor.py and implement the function compute_distances_two_loops that uses a (very inefficient) double loop over all pairs of (test, train) examples and computes the distance matrix one element at a time.

    In [7]:
    # Open cs231n/classifiers/k_nearest_neighbor.py and implement
    # compute_distances_two_loops.
    
    # Test your implementation:
    dists = classifier.compute_distances_two_loops(X_test)
    print("Shape: " + str(dists.shape))
    
    Computing Distance:  0
    Computing Distance:  25
    Computing Distance:  50
    Computing Distance:  75
    Computing Distance:  100
    Computing Distance:  125
    Computing Distance:  150
    Computing Distance:  175
    Computing Distance:  200
    Computing Distance:  225
    Computing Distance:  250
    Computing Distance:  275
    Computing Distance:  300
    Computing Distance:  325
    Computing Distance:  350
    Computing Distance:  375
    Computing Distance:  400
    Computing Distance:  425
    Computing Distance:  450
    Computing Distance:  475
    Shape: (500, 5000)
    
    In [23]:
    # We can visualize the distance matrix: each row is a single test example and
    # its distances to training examples
    plt.imshow(dists, interpolation='none')
    plt.show()
    

    INLINE QUESTION #1:

    1. What in the data is the cause behind the distinctly bright rows?

    The rows represent each test image and every column in that row is it's similarity to the 5000 training images. If the pixel for row i column j is white then we see high distances, or rather no similarity between test and training image. Thus a row (test image) that is distinctly bright means that the test image has very few similar images in the entire training set.

    2. What causes the columns?

    Similar to the explanation above, the white columns represent images in the training set that are similar to very few of the test images, as most of the 0-500 test images show large L2 distances (bright).

    INLINE QUESTION #1:

    1. The rows represent each test image and every column in that row is it's similarity to the 5000 training images. If the pixel for row i column j is white then we see high distances, or rather no similarity between test and training image. Thus a row (test image) that is distinctly bright means that the test image has very few similar images in the entire training set.

    2. Similar to the explanation above, the white columns represent images in the training set that are similar to very few of the test images, as most of the 0-500 test images show large L2 distances (bright).

    In [42]:
    # Now implement the function predict_labels and run the code below:
    # We use k = 1 (which is Nearest Neighbor).
    y_test_pred = classifier.predict_labels(dists, k=1)
    
    # Compute and print the fraction of correctly predicted examples
    num_correct = np.sum(y_test_pred == y_test)
    accuracy = float(num_correct) / num_test
    print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))
    
    Got 137 / 500 correct => accuracy: 0.274000
    
    In [109]:
    y_test_pred = classifier.predict_labels(dists, k=5)
    num_correct = np.sum(y_test_pred == y_test)
    accuracy = float(num_correct) / num_test
    print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))
    
    Got 145 / 500 correct => accuracy: 0.290000
    
    In [46]:
    # Now lets speed up distance matrix computation by using partial vectorization
    # with one loop. Implement the function compute_distances_one_loop and run the
    # code below:
    dists_one = classifier.compute_distances_one_loop(X_test)
    
    Computing Distance:  0
    Computing Distance:  25
    Computing Distance:  50
    Computing Distance:  75
    Computing Distance:  100
    Computing Distance:  125
    Computing Distance:  150
    Computing Distance:  175
    Computing Distance:  200
    Computing Distance:  225
    Computing Distance:  250
    Computing Distance:  275
    Computing Distance:  300
    Computing Distance:  325
    Computing Distance:  350
    Computing Distance:  375
    Computing Distance:  400
    Computing Distance:  425
    Computing Distance:  450
    Computing Distance:  475
    
    In [47]:
    # To ensure that our vectorized implementation is correct, we make sure that it
    # agrees with the naive implementation. There are many ways to decide whether
    # two matrices are similar; one of the simplest is the Frobenius norm. In case
    # you haven't seen it before, the Frobenius norm of two matrices is the square
    # root of the squared sum of differences of all elements; in other words, reshape
    # the matrices into vectors and compute the Euclidean distance between them.
    difference = np.linalg.norm(dists - dists_one, ord='fro')
    print('Difference was: %f' % (difference, ))
    if difference < 0.001:
        print('Good! The distance matrices are the same')
    else:
        print('Uh-oh! The distance matrices are different')
    
    Difference was: 0.000000
    Good! The distance matrices are the same
    
    In [59]:
    # Now implement the fully vectorized version inside compute_distances_no_loops
    # and run the code
    dists_two = classifier.compute_distances_no_loops(X_test)
    
    # check that the distance matrix agrees with the one we computed before:
    difference = np.linalg.norm(dists - dists_two, ord='fro')
    print('Difference was: %f' % (difference, ))
    if difference < 0.001:
        print('Good! The distance matrices are the same')
    else:
        print('Uh-oh! The distance matrices are different')
    
    Difference was: 0.000000
    Good! The distance matrices are the same
    
    In [53]:
    # Let's compare how fast the implementations are
    def time_function(f, *args):
        """
        Call a function f with args and return the time (in seconds) that it took to execute.
        """
        import time
        tic = time.time()
        f(*args)
        toc = time.time()
        return toc - tic
    
    two_loop_time = time_function(classifier.compute_distances_two_loops, X_test)
    print('Two loop version took %f seconds' % two_loop_time)
    
    one_loop_time = time_function(classifier.compute_distances_one_loop, X_test)
    print('One loop version took %f seconds' % one_loop_time)
    
    no_loop_time = time_function(classifier.compute_distances_no_loops, X_test)
    print('No loop version took %f seconds' % no_loop_time)
    
    # you should see significantly faster performance with the fully vectorized implementation
    
    FINISHED WITH TEST IMAGE:  0
    FINISHED WITH TEST IMAGE:  50
    FINISHED WITH TEST IMAGE:  100
    FINISHED WITH TEST IMAGE:  150
    FINISHED WITH TEST IMAGE:  200
    FINISHED WITH TEST IMAGE:  250
    FINISHED WITH TEST IMAGE:  300
    FINISHED WITH TEST IMAGE:  350
    FINISHED WITH TEST IMAGE:  400
    FINISHED WITH TEST IMAGE:  450
    Two loop version took 32.644345 seconds
    FINISHED WITH TEST IMAGE:  0
    FINISHED WITH TEST IMAGE:  50
    FINISHED WITH TEST IMAGE:  100
    FINISHED WITH TEST IMAGE:  150
    FINISHED WITH TEST IMAGE:  200
    FINISHED WITH TEST IMAGE:  250
    FINISHED WITH TEST IMAGE:  300
    FINISHED WITH TEST IMAGE:  350
    FINISHED WITH TEST IMAGE:  400
    FINISHED WITH TEST IMAGE:  450
    One loop version took 75.630863 seconds
    No loop version took 0.413558 seconds
    

    CROSS VALIDATION

    We have implemented the k-Nearest Neighbor classifier but we set the value k = 5 arbitrarily. We will now determine the best value of this hyperparameter with cross-validation.

    In [142]:
    num_folds = 5
    k_choices = [1, 3, 5, 8, 10, 12, 15, 20, 50, 100]
    
    X_train_folds = []
    y_train_folds = []
    
    #use numpy array_split function to split the training data into num_folds folds
    X_train_folds = np.array_split(X_train, num_folds)
    y_train_folds = np.array_split(y_train, num_folds)
    
    # A dictionary holding the accuracies for different values of k that we find
    # when running cross-validation. After running cross-validation,
    # k_to_accuracies[k] should be a list of length num_folds giving the different
    # accuracy values that we found when using that value of k.
    k_to_accuracies = {}
    
    for k in k_choices:
        k_to_accuracies[k] = []
        # Run kNN algorithm num_folds times
        for i in range(num_folds):
            X_train = []
            y_train = []
            for j in range(num_folds):
                if i != j:
                    X_train.extend(X_train_folds[j])
                    y_train.extend(y_train_folds[j])
    
            X_train = np.array(X_train)
            y_train = np.array(y_train)
            classifier = KNearestNeighbor()
            classifier.train(X_train, y_train)
            dists = classifier.compute_distances_no_loops(X_test)
            y_test_pred = classifier.predict_labels(dists, k=k)
    
            num_correct = np.sum(y_test_pred == y_test)
            accuracy = float(num_correct) / num_test
    
            k_to_accuracies[k].append(accuracy)
    
    # Print out the computed accuracies
    for k in sorted(k_to_accuracies):
        for accuracy in k_to_accuracies[k]:
            print('k = %d, accuracy = %f' % (k, accuracy))
    
    k = 1, accuracy = 0.258000
    k = 1, accuracy = 0.276000
    k = 1, accuracy = 0.260000
    k = 1, accuracy = 0.250000
    k = 1, accuracy = 0.254000
    k = 3, accuracy = 0.276000
    k = 3, accuracy = 0.280000
    k = 3, accuracy = 0.262000
    k = 3, accuracy = 0.272000
    k = 3, accuracy = 0.252000
    k = 5, accuracy = 0.284000
    k = 5, accuracy = 0.294000
    k = 5, accuracy = 0.272000
    k = 5, accuracy = 0.268000
    k = 5, accuracy = 0.280000
    k = 8, accuracy = 0.280000
    k = 8, accuracy = 0.282000
    k = 8, accuracy = 0.282000
    k = 8, accuracy = 0.250000
    k = 8, accuracy = 0.290000
    k = 10, accuracy = 0.274000
    k = 10, accuracy = 0.286000
    k = 10, accuracy = 0.278000
    k = 10, accuracy = 0.260000
    k = 10, accuracy = 0.270000
    k = 12, accuracy = 0.282000
    k = 12, accuracy = 0.266000
    k = 12, accuracy = 0.272000
    k = 12, accuracy = 0.276000
    k = 12, accuracy = 0.280000
    k = 15, accuracy = 0.278000
    k = 15, accuracy = 0.270000
    k = 15, accuracy = 0.250000
    k = 15, accuracy = 0.262000
    k = 15, accuracy = 0.270000
    k = 20, accuracy = 0.274000
    k = 20, accuracy = 0.254000
    k = 20, accuracy = 0.242000
    k = 20, accuracy = 0.258000
    k = 20, accuracy = 0.274000
    k = 50, accuracy = 0.240000
    k = 50, accuracy = 0.234000
    k = 50, accuracy = 0.234000
    k = 50, accuracy = 0.246000
    k = 50, accuracy = 0.234000
    k = 100, accuracy = 0.230000
    k = 100, accuracy = 0.218000
    k = 100, accuracy = 0.224000
    k = 100, accuracy = 0.224000
    k = 100, accuracy = 0.224000
    
    In [143]:
    # plot the raw observations
    for k in k_choices:
        accuracies = k_to_accuracies[k]
        plt.scatter([k] * len(accuracies), accuracies)
    
    # plot the trend line with error bars that correspond to standard deviation
    accuracies_mean = np.array([np.mean(v) for k,v in sorted(k_to_accuracies.items())])
    accuracies_std = np.array([np.std(v) for k,v in sorted(k_to_accuracies.items())])
    plt.errorbar(k_choices, accuracies_mean, yerr=accuracies_std)
    plt.title('Cross-validation on k')
    plt.xlabel('k')
    plt.ylabel('Cross-validation accuracy')
    
    plt.show()
    
    In [152]:
    # Based on the cross-validation results above, choose the best value for k,   
    # retrain the classifier using all the training data, and test it on the test
    # data. You should be able to get above 28% accuracy on the test data.
    best_k = 8
    
    classifier = KNearestNeighbor()
    classifier.train(X_train, y_train)
    y_test_pred = classifier.predict(X_test, k=best_k)
    
    # Compute and display the accuracy
    num_correct = np.sum(y_test_pred == y_test)
    accuracy = float(num_correct) / num_test
    print('Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))
    
    Got 145 / 500 correct => accuracy: 0.290000
    

    K-NEAREST NEIGHBOR CLASS

    The following code contains the kNN class and it's containing functions that I use in the above analysis.

    In [ ]:
    import numpy as np
    from past.builtins import xrange
    from collections import Counter
    
    class KNearestNeighbor(object):
        """ a kNN classifier with L2 distance """
    
        def __init__(self):
            pass
    
        def train(self, X, y):
            """
            Train the classifier. For k-nearest neighbors this is just 
            memorizing the training data.
    
            Inputs:
            - X: A numpy array of shape (num_train, D) containing the training data
              consisting of num_train samples each of dimension D.
            - y: A numpy array of shape (N,) containing the training labels, where
                 y[i] is the label for X[i].
            """
            self.X_train = X
            self.y_train = y
    
        def predict(self, X, k=1, num_loops=0):
            """
            Predict labels for test data using this classifier.
    
            Inputs:
            - X: A numpy array of shape (num_test, D) containing test data consisting
                 of num_test samples each of dimension D.
            - k: The number of nearest neighbors that vote for the predicted labels.
            - num_loops: Determines which implementation to use to compute distances
              between training points and testing points.
    
            Returns:
            - y: A numpy array of shape (num_test,) containing predicted labels for the
              test data, where y[i] is the predicted label for the test point X[i].  
            """
            if num_loops == 0:
                dists = self.compute_distances_no_loops(X)
            elif num_loops == 1:
                dists = self.compute_distances_one_loop(X)
            elif num_loops == 2:
                dists = self.compute_distances_two_loops(X)
            else:
                raise ValueError('Invalid value %d for num_loops' % num_loops)
    
            return self.predict_labels(dists, k=k)
    
        def compute_distances_two_loops(self, X):
            """
            Compute the distance between each test point in X and each training point
            in self.X_train using a nested loop over both the training data and the 
            test data.
    
            Inputs:
            - X: A numpy array of shape (num_test, D) containing test data.
    
            Returns:
            - dists: A numpy array of shape (num_test, num_train) where dists[i, j]
              is the Euclidean distance between the ith test point and the jth training
              point.
            """
    
            num_test = X.shape[0]
            num_train = self.X_train.shape[0]
            dists = np.zeros((num_test, num_train))
            for i in xrange(num_test):
                if i%50 == 0:
                    print("FINISHED WITH TEST IMAGE: ",i)
                for j in xrange(num_train):
                    dists[i, j] = np.sqrt(np.sum((X[i, :] - self.X_train[j, :]) ** 2))
    
            return dists
    
        def compute_distances_one_loop(self, X):
            """
            Compute the distance between each test point in X and each training point
            in self.X_train using a single loop over the test data.
    
            Input / Output: Same as compute_distances_two_loops
            """
            num_test = X.shape[0]
            num_train = self.X_train.shape[0]
            dists = np.zeros((num_test, num_train))
    
            for i in xrange(num_test):
    
                if i%50 == 0:
                    print("FINISHED WITH TEST IMAGE: ",i)
                dists[i, :] = np.sqrt(np.sum(np.square(self.X_train - X[i, :]), axis=1))
    
            return dists
    
        def compute_distances_no_loops(self, X):
            """
            Compute the distance between each test point in X and each training point
            in self.X_train using no explicit loops.
    
            Input / Output: Same as compute_distances_two_loops
            """
            num_test = X.shape[0]
            num_train = self.X_train.shape[0]
            dists = np.zeros((num_test, num_train))
    
    
            # (x-y)^2 = x^2 + y^2 - 2xy --> test_sum + train_sum - 2*inner_product
            test_sum = np.sum(np.square(X), axis=1) # shape -> (500,)
            train_sum = np.sum(np.square(self.X_train), axis=1) # shape -> (5000,)
            inner_product = np.dot(X, self.X_train.T) # shape -> (500,5000)
    
            # reshape test_sum from (500,) to (500,1) while keeping same data
            # the -1 infers same shape as before (500)
            dists = np.sqrt(test_sum.reshape(-1, 1) + train_sum - 2*inner_product)
    
            return dists
    
        def predict_labels(self, dists, k=1):
            """
            Given a matrix of distances between test points and training points,
            predict a label for each test point.
    
            Inputs:
            - dists: A numpy array of shape (num_test, num_train) where dists[i, j]
              gives the distance betwen the ith test point and the jth training point.
    
            Returns:
            - y: A numpy array of shape (num_test,) containing predicted labels for the
              test data, where y[i] is the predicted label for the test point X[i].  
            """
            num_test = dists.shape[0]
            y_pred = np.zeros(num_test)
            for i in xrange(num_test):
                # A list of length k storing the labels of the k nearest neighbors to
                # the ith test point.
                closest_y = []
    
                #while looping through i's get the distances of each images (i) with every
                #training image and save that as a new numpy array "dists_i"
                dists_i = dists[i]
    
                #dists_i.argsort() gives the indices of sorted distances, low to high.
                #dists_i.argsort()[:k], gives k lowest distance indices (k Nearest Neighbors)
                #y_train["this lowest distance indice"] gives the labels of that training img
                #this array closest_y is of len=k
                closest_y = self.y_train[dists_i.argsort()[:k]]
    
                #choose the most common label in closest_y
                #(ties broken with lowest label...Counter does this....)
                y_pred[i] = Counter(closest_y).most_common(1)[0][0]
    
    
            return y_pred
    
    ARTICLE 5
  • Stanford University's CS231n:
    Convolutional Neural Networks for Visual Recognition

    Feeling a little rusty on my basic deep learning skills, I have decided to spend the next few weeks working through Stanford University's class, CS231n: Convolutional Neural Networks for Visual Recognition. Much of the material, especially the raw Python, is simply a refresher for me, but a lot of the application is new as I have not worked extensivly on applying machine learning to imaging problems. As I work through the class I will be putting up my thoughts, notes, assignments, and IPython Notebooks up here as a way to keep myself honest and motivated to progress through the class in it's entirety.

    4

    Stanford University's CS231n:
    Convolutional Neural Networks for Visual Recognition

    Notes, Thoughts, Assignments, and Progress

    ARTICLE 4
  • Embedding Jupyter Notebooks

    As discussed in my post on underlying site structure, I have opted to build out every piece of this site from scratch instead of working with static-page generators such as Pelican or Hyde. I discuss the logic behind this choice there, so I wont reiterate, but due to this choice I need to set up processes for displaying work moving forward. Jupyter Notebooks will be the most important and continuous form so I want to explain how I've decided to go about this.

    3

    Embedding IPython Notebooks

    ARTICLE 3
  • JaVale By the Numbers

    Just prior to the start of the 2016-2017 NBA season when the big-man depleted Golden State Warriors signed JaVale Magee to a 1-year minimum contract, the critics quickly surfaced. Articles bashing JaVale we’re numerous, and praise nonexistent. Looking at his raw averages of 6.1 points, 3.2 rebounds, and 0.9 blocks in only 9.6 minutes per game won’t blow you away, but his per-36-minute stats are absurd. In this article we take a look at just how impressive a starting full-minutes JaVale would rank up against the rest of the league.

    2

    JAVALE BY THE NUMBERS
    May 3rd 2017

    JaVale By the Numbers

    Just prior to the start of the 2016-2017 NBA season when the big-man depleted Golden State Warriors signed JaVale Magee to a 1-year minimum contract, the critics quickly surfaced. Articles bashing JaVale we’re numerous, and praise nonexistent. An article on Complex ranked JaVale as the 12th worst player in the NBA and wrote “The Golden State Warriors need help defending the paint so they’d figure, ‘Hey, JaVale is a giant human with a heartbeat, let’s sign him!’ Problem is he really isn’t that good. Maybe the organization of the future will somehow turn JaVale into an elite big man, or maybe they’re just desperate for some rim protection.”

    Well, seven months later, four wins into the 2017 playoffs, it looks like the Golden State organization may have done exactly that. Not only did JaVale produce important big-man minutes off the bench all season, he started ten games in place of the injured Zaza Pachulia and put together one of the most impressive per-minute statistical seasons for a center in recent memory. JaVale only averaged 9.6 minutes a game and much of his statistical dominance may be attributed to the strength of the league-leading team surrounding him, but anyone who actually watched this season got a chance to a see a whole new JaVale: a powerfully athletic, rim destroying, ball swatting monster having a career year.

    Looking at his raw averages of 6.1 points, 3.2 rebounds, and 0.9 blocks in only 9.6 minutes per game won’t blow you away, but his per-36-minute stats are absurd. In this article we take a look at just how impressive a starting full-minutes JaVale would rank up against the rest of the league. I’m not suggesting that increasing his play time would effect his stats linearly like this, and with Golden State’s depth and small-ball play we most likely aren’t going to find out any time soon, but the Warriors are without a doubt playing some of their best basketball when he's on the floor, and so far JaVale has silenced most if not all of his critics; looking at you Shaq.

    Shooting

    Dunks

    Of all the stats I am going to talk about today, this one should be the least surprising and is without a doubt the most dominant. Through all his blunders and mistakes over the years, JaVale has never failed to thrown down with authority. At 7’ tall and sporting an 8’ wingspan, there aren’t many people on the planet who have that kind of range around the rim. Hometown bias and NBA sponsorship (KIA) aside, JaVale probably should have won the 2011 Slam Dunk Contest, and annually puts up highlight reel dunks. With a fraction of the minutes played, JaVale still ranked 12th in total dunks this season. Adjusted, JaVale is averaging 5.89 blocks/36min, a whopping 2.35 more than this seasons dunk leader DeAndre Jordan. The Clippers might still hold the title of Lob City but nobody has dominated the Ally-Oop this season like Draymond, Andre, and Steph chucking up bombs to JaVale McGee.

    RANK PLAYER TEAM MINUTES DUNKS Dunks / 36min RANK PLAYER TEAM MINUTES DUNKS Dunks / 36min
    1 JaVale McGee GSW 739 121 5.89 11 Marquese Chriss PHX 1743 103 2.13
    2 Clint Capela HOU 1551 163 3.78 12 Jabari Parker MIL 1728 92 1.92
    3 DeAndre Jordan LAC 2570 253 3.54 13 Tristan Thompson CLE 2336 122 1.88
    4 Montrezl Harrell HOU 1064 98 3.32 14 LeBron James CLE 2794 145 1.87
    5 Dwight Howard ATL 2199 199 3.26 15 Kevin Durant GSW 2070 107 1.86
    6 Rudy Gobert UTA 2744 235 3.08 16 Anthony Davis NOP 2708 135 1.79
    7 Richaun Holmes PHI 1193 92 2.78 17 Andre Drummond DET 2409 118 1.76
    8 Giannis Antetokounmpo MIL 2845 194 2.45 18 Steven Adams OKC 2389 112 1.69
    9 Hassan Whiteside MIA 2513 163 2.34 19 Aaron Gordon ORL 2298 99 1.55
    10 Mason Plumlee DEN 2147 132 2.21 20 Karl-Anthony Towns MIN 3030 130 1.54
    Dunks Per 36 Min

    Source: NBA.com/stats

    Still unconvinced? Check out all 121 of JaVale's Dunks below.

    Points in the Paint

    When your per-36 adjusted dunks are this far ahead of the competition, it’s not too much of a surprise that his adjusted points in the paint would reflect the same trend. Playing on a team that dished it at historic numbers this year and constantly looks for the 3rd pass doesn’t hurt either. Add that to the absurd gravitational pull of Golden State’s guards on the 3 point line and you have yourself a big-man PITP feeding frenzy on a nightly basis, something JaVale has happily taken advantage of this season. JaVale’s 20.00 PITP per-36 is the highest since Shaquille O’Neal’s 20.00 in 2001-2002. In fact, in the past 20 years no other player even eclipsed 15.5 PITP per-36, Shaq pulling off the feat for a dominant 11 straight years between 1996-2007.

    RANK PLAYER TEAM PITP MIN PITP / 36MIN RANK PLAYER TEAM PITP MIN PITP / 36MIN
    1 JaVale McGee GSW 410 739 20.00 11 DeMarcus Cousins NOP 848 2465 12.38
    2 Clint Capela HOU 720 1551 16.71 12 Dwight Howard ATL 748 2199 12.25
    3 Enes Kanter OKC 704 1533 16.53 13 Brook Lopez BKN 732 2222 11.86
    4 Hassan Whiteside MIA 970 2513 13.90 14 DeAndre Jordan LAC 824 2570 11.54
    4 Karl-Anthony Towns MIN 1154 3030 13.71 15 Steven Adams OKC 742 2389 11.18
    6 LeBron James CLE 1032 2794 13.30 16 Rudy Gobert UTA 822 2744 10.78
    7 Giannis Antetokounmpo MIL 1044 2845 13.21 17 Russell Westbrook OKC 816 2802 10.48
    8 Andre Drummond DET 882 2409 13.18 18 DeMar DeRozan TOR 760 2620 10.44
    9 Anthony Davis NOP 970 2708 12.90 19 Isaiah Thomas BOS 726 2569 10.17
    10 Nikola Jokic DEN 730 2038 12.89 20 John Wall WAS 774 2835 9.83
    Points in the Paint Per 36 Min

    Source: NBA.com/stats

    Offensive Rating

    Individual offensive rating is a tough stat to separate from the play of the team as a whole, but ranking #1 even when the rest of your team is in the top 10 still tells an impressive story. Not only does JaVale fit into the offensive juggernaut that is the Golden State Warriors, he makes them the most potent version of themselves when he’s on the floor.

    5-Man LineupOFFRTGDEFRTGNETRTG
    GSW Big 4 + JaVale124.492.232.1

    The Warriors lineup with McGee/Curry/Thompson/Durant/Green has the highest net rating of any five-man combo in the NBA this season with a minimum of 100 min played.

    RANK PLAYER TEAM OFFRTG RANK PLAYER TEAM OFFRTG
    1 JaVale McGee GSW 121.4 11 Nikola Jokic DEN 114.9
    2 Stephen Curry GSW 118.1 12 LeBron James CLE 114.9
    3 Pierre Jackson DAL 117.9 13 JJ Redick LAC 114.6
    4 Kevin Durant GSW 117.2 14 Jordan Farmar SAC 114.5
    5 Chris Paul LAC 116.2 15 Andre Iguodala GSW 114.3
    6 Zaza Pachulia GSW 115.8 16 Kyrie Irving CLE 114.2
    7 Klay Thompson GSW 115.6 17 Ryan Anderson HOU 113.8
    8 Draymond Green GSW 115.2 18 Clint Capela HOU 113.7
    9 Blake Griffin LAC 115.2 19 Isaiah Thomas BOS 113.6
    10 Gary Harris DEN 115.0 20 James Harden HOU 113.6
    Offensive Rating

    Source: NBA.com/stats

    PLUS/MINUS

    Plus/Minus further illustrates how the already outstanding Warriors are even better when JaVale is on the floor. It’s not surprising that the team with the NBA’s best record, #1 Offensive Efficiency and #2 Defensive Efficiency would dominate the plus/minus category.

    PLAYER TEAM +/- MIN PLUSMINUS / 36min RANK PLAYER TEAM +/- MIN PLUSMINUS / 36min
    1 JaVale McGee GSW 312 739 15.20 11 Blake Griffin LAC 440 2076 7.63
    2 Stephen Curry GSW 1015 2638 13.85 11 Ryan Anderson HOU 407 2116 6.92
    3 Kevin Durant GSW 711 2070 12.37 13 DeAndre Jordan LAC 459 2570 6.43
    4 Draymond Green GSW 820 2471 11.95 14 Kawhi Leonard SAS 436 2474 6.34
    5 Zaza Pachulia GSW 418 1268 11.87 15 LeBron James CLE 483 2794 6.22
    6 Klay Thompson GSW 801 2649 10.89 16 Patrick Beverley HOU 353 2058 6.17
    7 Chris Paul LAC 577 1921 10.81 17 Kyle Lowry TOR 358 2244 5.74
    8 Andre Iguodala GSW 527 1998 9.50 18 Rudy Gobert UTA 436 2744 5.72
    9 Patty Mills SAS 410 1754 8.42 19 Jae Crowder BOS 349 2335 5.38
    10 JJ Redick LAC 470 2198 7.70 20 James Harden HOU 425 2947 5.19
    Plus/Minus Per 36 Min

    Source: NBA.com/stats

    SECOND CHANCE POINTS

    RANK PLAYER TEAM MIN 2ND PTS 2ND PTS / 36MIN RANK PLAYER TEAM MIN 2ND PTS 2ND PTS / 36MIN
    1 Enes Kanter OKC 1533 260 6.11 11 LaMarcus Aldridge SAS 2335 247 3.81
    2 Hassan Whiteside MIA 2513 375 5.37 12 Russell Westbrook OKC 2802 292 3.75
    3 Andre Drummond DET 2409 355 5.31 13 Anthony Davis NOP 2708 282 3.75
    4 Dwight Howard ATL 2199 315 5.16 14 DeAndre Jordan LAC 2570 266 3.73
    5 Zach Randolph MEM 1786 255 5.14 14 Kevin Love CLE 1885 195 3.72
    6 JaVale McGee GSW 739 96 4.68 16 Robin Lopez CHI 2271 225 3.57
    7 Karl-Anthony Towns MIN 3030 386 4.59 16 DeMarcus Cousins NOP 2465 225 3.29
    8 Nikola Jokic DEN 2038 246 4.35 18 Steven Adams OKC 2389 210 3.17
    9 Jonas Valanciunas TOR 2066 240 4.18 19 Carmelo Anthony NYK 2538 210 2.98
    10 Rudy Gobert UTA 2744 301 3.95 20 Jimmy Butler CHI 2809 198 2.54
    2nd Chance Points Per 36 Min

    Source: NBA.com/stats

    Defense & Shot Blocking

    Blocks

    JaVale’s class-A airspace around the hoop doesn’t just exist on the offensive end. With premier ball stoppers like Draymond Green, Klay Thompson, and Andre Iguadala hounding the opposing offensive, JaVale consistently waits a step away to thunderously deny all shot attempts. With a 2017 block highlight reel almost as long as his dunk tape, JaVale has been nothing but impregnable around the rim, an defensive factor Golden State thought it would be sorely lacking with the departures of Andrew Bogut and Festus Ezeli. In fact pre-season this was the chink in the armor that many thought could fell the reigning western conference champions. JaVale has done more than his part to change that tune.

    RANK PLAYER TEAM MINUTES BLOCKS BLOCKS / 36MIN RANK PLAYER TEAM MINUTES BLOCKS BLOCKS / 36MIN
    1 JaVale McGee GSW 739 67 3.26 11 DeAndre Jordan LAC 2570 134 1.88
    2 Kyle O'Quinn NYK 1229 104 3.05 12 Robin Lopez CHI 2271 117 1.85
    3 Rudy Gobert UTA 2744 214 2.81 13 Serge Ibaka TOR 2422 124 1.84
    4 Myles Turner IND 2541 172 2.44 14 Kevin Durant GSW 2070 99 1.72
    5 Hassan Whiteside MIA 2513 161 2.31 15 Draymond Green GSW 2471 106 1.54
    6 Alex Len PHX 1560 98 2.26 16 Mason Plumlee DEN 2147 92 1.54
    7 Anthony Davis NOP 2708 167 2.22 17 Dwight Howard ATL 2199 92 1.51
    8 Kristaps Porzingis NYK 2164 129 2.15 18 Marc Gasol MEM 2531 99 1.41
    9 Brook Lopez BKN 2222 124 2.01 19 DeMarcus Cousins NOP 2465 93 1.36
    10 Giannis Antetokounmpo MIL 2845 151 1.91 20 Gorgui Dieng MIN 2653 95 1.29
    Blocks Per 36 Min

    Source: NBA.com/stats

    Watching segments like this one, it's not hard to see how JaVale's impact on both sides of the ball coupled with Golden State's long-ball wizardry leads to league leading +/- statistics.

    Defensive Win Shares

    Win Shares is a player statistic which attempts to divvy up credit for team success to the individuals on the team.

    RANK PLAYER TEAM MIN DEF WS DEF WS / 36MIN RANK PLAYER TEAM MIN DEF WS DEF WS / 36MIN
    1 Draymond Green GSW 2471 4.7 0.0685 11 JaVale McGee GSW 739 1.1 0.0536
    2 Patty Mills SAS 1754 3.3 0.0677 12 Anthony Davis NOP 2708 4 0.0532
    3 Stephen Curry GSW 2638 4.6 0.0628 13 Jrue Holiday NOP 2190 3.2 0.0526
    4 Rudy Gobert UTA 2744 4.7 0.0617 14 Paul Millsap ATL 2343 3.4 0.0522
    5 Andre Iguodala GSW 1998 3.4 0.0613 15 LaMarcus Aldridge SAS 2335 3.3 0.0509
    6 Klay Thompson GSW 2649 4.5 0.0612 16 Solomon Hill NOP 2374 3.3 0.0500
    7 Kevin Durant GSW 2070 3.5 0.0609 17 Andre Roberson OKC 2376 3.3 0.0500
    8 James Johnson MIA 2085 3.2 0.0553 18 Jimmy Butler CHI 2809 3.9 0.0500
    9 Victor Oladipo OKC 2222 3.4 0.0551 19 DeAndre Jordan LAC 2570 3.5 0.0490
    10 Gordon Hayward UTA 2516 3.8 0.0544 20 Kawhi Leonard SAS 2474 3.2 0.0466
    Defensive Win Shares Per 36 Min

    Source: NBA.com/stats

    Rebounding

    JaVale’s defensive rebound statistics are impressive for a team that defaults it’s rebounds to pace pushing forwards like Green, Iguadala, and Durant; similar to why we don’t see Houston or Oklahoma City bigs on this list. What really stands out is his adjusted OREB per-36, which would sit at #2 in the league at 4.87 if he played starters minutes. Many of the statistics we discussed above are due to offensive rebounding. JaVale doesn’t come into the game expecting to be integral in the GS offensive aside from lobs to the rim, instead taking advantage of the top perimeter shooting offensive in the league and the way it pulls opposing defenders to the 3pt line. The Warriors hit quite a lot of their long balls, but when they don’t, JaVale has been there to pull down boards and put them back in with authority. This consistent combination of dominant physical athleticism and mental awareness is a JaVale the league hasn’t seen before.

    RANK PLAYER TEAM MIN REB / 36MIN OREB / 36MIN DREB / 36MIN RANK PLAYER TEAM MIN REB / 36MIN OREB / 36MIN DREB / 36MIN
    1 Andre Drummond DET 2409 16.66 5.16 11.51 11 Marcin Gortat WAS 2555 11.96 3.35 8.61
    2 DeAndre Jordan LAC 2570 15.60 4.17 11.43 12 JaVale McGee GSW 739 11.89 4.87 7.01
    3 Hassan Whiteside MIA 2513 15.59 4.20 11.39 13 Anthony Davis NOP 2708 11.75 2.29 9.47
    4 Dwight Howard ATL 2199 15.39 4.85 10.54 14 DeMarcus Cousins NOP 2465 11.60 2.22 9.38
    5 Rudy Gobert UTA 2744 13.58 4.12 9.46 15 Russell Westbrook OKC 2802 11.10 1.76 9.34
    6 Jonas Valanciunas TOR 2066 13.23 3.94 9.29 16 Tristan Thompson CLE 2336 11.02 4.41 6.61
    7 Nikola Vucevic ORL 2163 12.97 2.93 10.04 17 Julius Randle LAL 2132 10.74 2.53 8.21
    8 Kevin Love CLE 1885 12.72 2.83 9.89 18 Giannis Antetokounmpo MIL 2845 8.86 1.80 7.06
    9 Nikola Jokic DEN 2038 12.68 3.74 8.94 19 Gorgui Dieng MIN 2653 8.78 2.55 6.23
    10 Karl-Anthony Towns MIN 3030 11.96 3.52 8.45 20 LeBron James CLE 2794 8.23 1.25 6.98
    Rebounding Per 36 Min

    Source: NBA.com/stats

    Have the Warriors turned JaVale McGee into an elite big man?
    To fully answer yes we would need to see consistent starting minutes, something that most likely won’t happen with the current team structure. Did they Warriors solve their supposed rim protection issues and in the process get an offensive juggernaut of a seven footer on a minimum contract? Without a doubt. Stats adjusted for minutes played will always lead to hypothetical conclusions when extrapolated, and being surrounding by a team full of generational talent like Warriors certainly makes everyone look better. What I can say is that Golden State has gotten above and beyond what it expected out of it’s veteran center, a player who nightly has a very tangible positive impact on a team surging towards it’s second title in three years. Through the first round of the playoffs JaVale show's no sign of slowing down in his quest to add even more on to what has been a career year.

    You must be logged in to comment!

    admin
    2017-05-02 14:14:53

    Comment Post Test....

    ARTICLE 2
  • Underlying Structure Choices

    I want to take a second to talk about some of the choices I have made in regards to the structure of this site moving forward.

    1

    Underlying Structure Choices

    I want to take a second to talk about some of the choices I have made in regards to the structure of this site moving forward. These days there are endless paths to go down depending on what kind of site or blog you are trying to set up. Packages such as Pelican and Hyde make it extreamly easy to write, publish, and push content in a simple, effective, and reproductable way and platforms like Wordpress and GitHub Pages make publishing static content a breeze for beginners and experts alike.

    Normally, since I plan to consitantly produce material I would go with one of these templated options, but the point of this site is much more than simply a blog. First off, there are portions of this site that will not be static and that pretty much rules out any of the former options. Second, a large part of this site is about design, not just function, and I want complete control over every line of code and pixel, even if that makes each post creation a little more tedious and drawn-out.

    ........

    ARTICLE 1
  • Mercurial Analytics:
    Iteration Zero

    Hi there! This is the first article on Mercurial Analytics. It isn't data, analytics, or programming related. This is an introduction to what I do here and what the point of this website is, which although still quite vague and abstract, is starting to coalese into something tangible.

    UNFINISHED

    0

    Mercurial Analytics:
    Iteration Zero

    Hello. My name is Cole Page.

    ARTICLE 0
"TO CONDENSE FACT FROM THE VAPOR OF NUANCE"