Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
You have accessResearch articles

Image cognition using contour curvature statistics

Andrew Marantan

Andrew Marantan

Department of Physics, Harvard University, Cambridge, MA 02138, USA

Contribution: Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft

Google Scholar

Find this author on PubMed

,
Irina Tolkova

Irina Tolkova

School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA

Contribution: Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

and
L. Mahadevan

L. Mahadevan

Department of Physics, Harvard University, Cambridge, MA 02138, USA

School of Engineering and Applied Sciences, Harvard University, Cambridge, MA 02138, USA

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA

[email protected]

Contribution: Conceptualization, Formal analysis, Funding acquisition, Methodology, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing

Google Scholar

Find this author on PubMed

    Abstract

    Drawing on elementary invariance principles, we propose that a statistical geometric object, the probability distribution of the normalized contour curvatures (NCC) in the intensity field of a planar image has the potential to categorize objects. We show that NCC is sufficient for discriminating between cognitive categories such as animacy, size and type, and demonstrate the robustness of this metric to variation in illumination and viewpoint, consistent with psychological experiments. A generative model for producing artificial images with the observed NCC distributions highlights the key features that our metric captures, and those that it does not. More broadly, our study points to the need for statistical geometric approaches to cognition that build in both the statistics and the natural invariances of the visual world.

    1. Introduction

    Humans and other primates are adept at the visual recognition of objects remarkably quickly and accurately, forming neural representations in the inferotemporal (IT) cortex that are spatially organized by cognitive categories, such as animacy, size, type (faces versus places), etc. How this happens is still not well understood. There are at least two ways to address this question: a bottom-up approach to understand the neural areas and circuits responsible using systematic microscopic studies, or a top-down approach for image representation that agrees with neural activation patterns in response to coarse-grained image features [1]. Of course the ultimate goal is to merge these levels of understanding into a unified whole that accounts for the natural statistics of categories in the visual world, and links the development of the visual cortex together with the evolutionary necessity to compress, classify and comprehend the external environment. At a high level, visual processing in primates is roughly divided across the ventral and dorsal pathways, with the former responsible for characterizing objects in the visual field (what), and the latter for guiding interactions with the objects (how) [2,3]. Within the ventral stream, object recognition is known to occur in the IT cortex [4,5]. Studies of this region using fMRI in both humans and non-human primates in response to viewing different stimuli have uncovered multiple cognitive categories or dimensions—such as size (from small to large objects), animacy (from animals to inanimate objects), body parts (such as faces, hands, bodies)—which elicit neural activation in different spatial domains of the IT cortex [69]. At the neural level, these analyses have iteratively simplified stimuli to recover ‘critical features’ which maximally activate a cell [10,11]. In some regions of the visual pathway, such features permit intuitive explanations—for instance, the middle temporal (MT) cortex and middle superior temporal (MST) cortex are known to represent visual motion through a collection of neurons encoding direction and speed [12,13]. Interestingly, within the intermediate subregions of the ventral cortical pathway leading to the IT cortex—areas V1, V2 and V4—the critical features associated with images of varying contours are found to represent both simple properties, such as position and orientation, but also a higher-order property: curvature [1417] and metrics based on this quantity are strongly correlated with neural dynamics [18,19].

    In psychology and psychophysics, the significance of contour curvature in perception has been alluded to at least since the 1950s [20], when it was suggested that perceptual information along a shape boundary is not equally distributed, but concentrated in regions of high curvature—a proposal that has since been supported both empirically and through an information-theoretic framework [21,22]. This observation that a single contour can be sufficient for recognition of some objects [23] has led to a number of studies of contour perception in humans and primates, e.g. hyper-acuity in the perception of curved lines [2426], the near-ideal ability to detect contours resembling those in natural images [27], the ability to detect closed contours more easily than open contours [28] and the anti-correlation between the perception of closed contours with contour complexity [29,30], etc. Additionally, the mechanisms for discriminating contour curvature are selective for spatial frequency and orientation, agreeing very closely with studies of neural features [31], consistent with the evidence for orientational selectivity in the visual cortex, and the fact that curvature (i.e. the spatial rate of change of orientation) is a relatively easy feature to extract. Yet there is a gap between the understanding of perception of contours and that of visual fields, driving the need for a broader mathematical framework to extend the analysis of contours to whole shapes and natural images [30].

    In machine vision, while the notion of curvature has been used widely [3237], most image descriptors for object recognition and classification tasks are composed of histograms of pixel-based metrics—e.g. the histogram of oriented gradients (HoG) approach [38]. While several studies have combined HoG with locally binned histograms of curvature to show improved performance in numerical tasks, they use curvature only to augment existing methods [39,40], rather than as a distinct metric. A promising alternative is that of curvature scale space (CSS)—a representation of shape through contour curvature calculated at different magnitudes of Gaussian smoothing [41] that has proved efficient and successful in problems of corner detection, clustering, shape indexing and retrieval, and silhouette-based object recognition[4245]. However, while CSS is rooted in and motivated by the mathematical invariances desirable for shape analysis, it does not seem to have been extended to quantifying two-dimensional images, or to our understanding of biological perception. Finally and most recently, while neural networks have been successful at solving many problems in computer vision [46], and are promising models of biological processing [47,48], they do not usually provide an interpretable understanding of the intermediary image representations that underlie perception.

    Given these insights from neurobiology, psychology and computer vision, how might one construct a computationally meaningful and interpretable image descriptor? Here, guided by the need for an invariant description to Euclidean motions, we propose the use of a simple statistical geometric measure, the ‘normalized contour curvature’ (NCC) distributions: a probability distribution of curvatures within smoothed natural images. We construct NCC through pooling of nonlinear transformations of an image’s contour curvature content—a simple calculation which has plausible implementability within neural circuitry, emphasizing that it is important to think of a statistical measure of this geometric quantity given the noisy nature of images. We show that NCC satisfies certain desired properties of shape characterization, and then use this to interpret example stimuli, while demonstrating that this metric carries sufficient information content for distinguishing between cognitive categories. Finally, we derive a generative model for constructing images corresponding to a given NCC distribution to help us understand when it works, and perhaps more importantly, when it does not work.

    2. Contour curvature computation

    For the compression, classification and comprehension of images treated as shapes, any meaningful perception mechanism should satisfy some basic invariances that include (i) Global Translation Invariance: i.e. shape is independent of location in space (ii) Global Rotation Invariance: i.e. shape is independent of orientation (iii) Resizing/Scale Invariance: i.e. shape is independent of overall scale (iv) Image Representation Invariance: i.e. shape is independent of rescaling the intensity map. While there are known cognitive exceptions (such as squares/diamonds, upside-down faces, etc.), these are specialized and we will not consider them here.

    For an image that is characterized by a two-dimensional intensity field, constant intensity contours typically form closed contours. In a smooth differential-geometric setting, the curvature of these curves is invariant to global translation and rotation and thus forms a natural candidate for an invariant description. Letting f(x,y) represent the intensity at pixel (x,y) and fx,fy,fxx,fyy and fxy represent the first and second derivatives, we can write the contour curvature (CC) at point (x,y) as

    κ=2fxfyfxyfy2fxxfx2fyy(fx2+fy2)3/2.2.1
    We note that this approach is different from using the intensity values of a given image to describe a height map and then compute the curvature tensor of the surface and thence the Gaussian or mean curvature at each pixel [49], although the two are of course related. Furthermore, we note that calculation of curvature follows naturally from orientational information that the retina is well known to respond to; curvature is just the spatial variation in orientational information and can be deduced approximately via a differencing scheme that is analogous to a difference of Gaussians. Though the contour curvature is invariant under translation, rotation and intensity scaling, it is not invariant to scale changes. To overcome this issue, we take the largest dimension of an image to be of unit length, so that the curvature of a circle fitting just inside the (square) image has radius r=1/2 and hence curvature κ=2. Finally, it is numerically useful to map the contour curvature defined on the whole real line to a finite interval which we choose to be [1,1]. Thus, we define the normalized contour curvature (NCC) as
    κ^=κκ1/2+|κ|,2.2

    The parameter κ1/2 sets the value of κ which maps to κ^=1/2, and makes the NCC easier to interpret; by taking κ1/2=2, a circle with curvature κ=2 (i.e. a circle inscribed in a square image) is mapped to κ^=1/2. Note that this definition corresponds to a closed-form inverse relation.

    In a discrete computational setting, pixel intensities can be used to construct level sets of constant intensity and compute the curvature of these contours at every pixel. While rescaling the intensity map does change the intensities of the contours, it does not change their overall shape, and so this method is manifestly invariant under intensity rescaling. To get a smooth surface interpolant from the image intensities, we filter the image using a Gaussian kernel that has zero mean and a standard deviation ρ, and thus avoid the computational task of constructing the contours passing through each pixel and calculate the contour curvature directly in terms of numerical derivatives [50] of the filtered image intensity, and directly apply equation (2.1). This produces an ‘image’ of the contour curvature, which can be converted to the normalized contour curvature via equation (2.2) (figure 1).

    Figure 1.

    Figure 1. Definition of normalized contour curvature (NCC). (a) The NCC is defined (by equation (2.2)) such that its value along a circle inscribed in a square of side length 1 is mapped to 1/2. (b) When calculating NCC, we consider the level curves of a three-dimensional surface defined by the pixel intensities of an image. This figure shows the level curves for the lightbulb image in (c). (c) This sequence shows the pipeline for calculating NCC. Starting with an image of a lightbulb, we apply Gaussian smoothing, calculate NCC for each pixel following equation (2.2), and finally histogram the values to derive a probability density.

    Finally, we use the normalized contour curvature image to construct a histogram for the original image, which we then convert to a probability density to produce the NCC distribution. We use equally spaced bins spanning from κ^=1 to κ^=1, choosing an odd number of bins in order to have a bin centred on κ^=0. In order to count only curvatures corresponding to the object in the image, we ignore pixels corresponding to background elements. Altogether, this conceptual framework and image processing pipeline is shown in figure 1, with additional details provided in Section A of the electronic supplementary material.

    3. Bayesian image classification of animacy and size

    To evaluate the role of NCC as an image classifier, we investigate whether the NCC metric carries sufficient information content for a simple binary classifier to distinguish between different cognitive categories. Our data were drawn from those used in fMRI studies on human participants who were shown real images distinguished by features such as animacy and size [7] that led to the fMRI-based detection of spatially localized responses in the IT cortex, and various artificial image sets [51] titled ‘texforms’—which, while designed to be unrecognizable, preserved enough low-level structure to elicit a similar neural response to the images from which they were generated (figure 3).

    (a) Methods

    For the natural images, the stimulus set contained 60 objects per each of four subcategories[7]—small animate, small inanimate, large animate and large inanimate—for a total of 240 images. The animate images spanned a large part of the phylogenetic tree of animals, including mammals, reptiles, birds and fish; the inanimate objects featured everyday items varying from a thimble to a firetruck. In all images, objects were centred on a white background. Examples of images from each category, together with their NCC, are shown in figure 2.

    Figure 2.

    Figure 2. Computing NCC. NCC probability densities calculated for example images from the stimulus dataset presented in [7], across four categories: a moose (large animate), mouse (small animate), telephone booth (large inanimate) and computer mouse (small inanimate). Notice that the prevalence of straight lines in the intensity contours of the telephone booth and computer mouse results in peaks near κ^=0, and that both mice contain higher probability mass for intermediate positive values (around κ^=0.5). We find both of these features to be characteristic of animacy and size (figure 4).

    To classify a given image into one of two predefined classes C1 or C2, we use a slightly modified log-likelihood scheme that relies on the supervised learning of two probability distributions, P(κ^|C1) and P(κ^|C2), representing the probability for a pixel in a given image to have normalized curvature κ^ given that it belongs to either C1 or C2. In practice, we bin the normalized curvatures, and the aforementioned distributions become probability vectors: P(κ^|C1)=pC1 and P(κ^|C2)=pC2, where the nth element pnC1 (or pnC2) describes the probability for κ^ to be in the nth bin.

    To construct pC1 (or pC2), we simply calculate NCC histograms for all images in the C1 (or C2) training set, add them together, and normalize by the total number of counts. Then, we can classify an image by calculating its NCC distribution pn, dividing by the total number of counts to obtain qn, and compute the log-likelihood:

    L=nNqnlog[pnC1pnC2],3.1
    where N is the number of bins. If L>0, the image is classified as belonging to C1; is L0, it is classified as belonging to C2. We note that L can also be thought of as the difference between the Kullback–Leibler (KL) divergence of q from pC1 and the KL divergence of q from pC2. Although we only use the sign of L for predicting the category of an image, the magnitude of L can inform the classification likelihood; large values of |L| are linked with a higher confidence of classification, while values with |L|0 have a lower confidence.

    In addition to analysing natural images, we also considered the paired image/texform stimulus set introduced in [51], which contains a dataset of 120 objects (30 from each of the four size/animacy sub-categories), along with corresponding texforms. We perform the same pre-processing steps as described previously, with one difference in the processing of the background, whereby we take advantage of the provided green-screen variant to isolate the background pixels and set them to white to match prior analysis; applying this same ‘background mask’ introduces an outline and background to the texform.

    (b) Natural image classification

    To test the efficacy of the normalized contour curvature as an image statistic and visualize the distinctions between classes, we compare the mean and variance of NCC distributions between classes in figure 3, and separately for each sub-class in figure 4. We highlight two important features. First, unlike animals, inanimate objects have a high prevalence of straight lines and edges, leading to a peak at κ^=0. Second, small objects contain a higher density of intermediate curvature values, while the NCC of larger objects is more heavily concentrated at the ends of the distribution. This is likely because the characteristics and details of small objects are proportionately larger, resulting in lower absolute curvatures, while the fine-scale detail of images of large objects results in higher absolute curvatures.

    Figure 3.

    Figure 3. Cognitive categories of size and animacy show characteristic NCC histograms. The solid lines indicate NCC probability densities for each of the four image categories from the stimulus dataset presented in [7], and the shaded region indicates data that are within one standard deviation. We see that NCC for the inanimate category is characterized by high density at κ^=0 (representing the amount of straightness in the image) and NCC for the small category is characterized by slightly higher probability density in intermediate positive NCC values (around κ^=0.5).

    Figure 4.

    Figure 4. Pairwise cognitive categories of size and animacy also show characteristic NCC histograms. Here, we consider distributions for each of the four sub-categories of images from the stimulus dataset presented in [7], separated both by size and by animacy. We observe similar characteristics as described in figure 3, though the distinction between large and small is much more prominent for inanimate images, which is consistent with the tripartite cognitive organization found in [7].

    To classify animacy over both large and small objects, and classify size on both animate and inanimate objects, we ran 1000 randomized trials for both tasks, adhering to a 30%/70% training/testing split. Our aggregate results are shown in figure 5. From the comparison of true positive and false positive rates in figure 5, we see that these distinctions are sufficient for a simple Bayesian classifier to distinguish animacy within both large and small objects, and size within inanimate objects (with poorer performance on classifying size within animals). This general tripartite organization—of small objects, animals and large objects—is consistent with studies of neural activation within the occipito-temporal cortex for human observers [7].

    Figure 5.

    Figure 5. Performance of a Bayesian classifier based on NCC. We examine the accuracy of a binary classifier (equation (3.1)) by visualizing two-dimensional histograms of the false positive rate against true positive rate over 1000 randomized trials. Classification is performed separately across the four categories for the stimulus dataset presented in [7]; for instance, the top right histogram shows results for binary classification tasks in which images with large objects are classified as animate or inanimate. We find that it is significantly more difficult to differentiate large animate and small animate images, while the other categories have high classification accuracy. These results are consistent with the tripartite domain separation observed in [7] across large inanimate, small inanimate and animate objects.

    The failure of the classifier is clearly exposed via misclassified samples, i.e. false positives and false negatives for each of the four classification tasks are shown in figure 6. Interpreting these mistakes in the NCC-based classifier is illuminating. For instance, in classifying large objects by animacy, most errors occur in inanimate objects with thin protruding ‘appendages’. This is likely because our image processing approach causes the thin components to fade due to Gaussian blurring, resulting in high magnitude curvatures at their endpoints, which align more closely with the animate distributions than with characteristically (straight) curvature-free inanimate distributions. When classifying small objects, most errors occur in inanimate objects, such as a pinecone and floral-patterned household items, all of which have rounded features and thus have a resemblance to animacy. On the other hand, the snail is a false negative, suggesting that the shell is uncharacteristic of animate images. In this context, we find that the most difficult task is distinguishing size within animate images: misclassified samples include both large animals with the rounded shape usually associated with small objects, and small animals of more irregular and elongated shapes. Inanimate objects can more successfully be separated by size, and the mistakes often correspond to box-like large objects and elongated small objects.

    Figure 6.

    Figure 6. Classification errors using NCC are informative about cognitive categories of images. To understand the sources of error within our classifier (equation (3.1)), we consider and interpret misclassified images. Specifically, we show a false negative and a false positive sample for each of the four classification tasks. We find that these mistakes are consistent with our observed differences in large/small and animate/inanimate NCC distributions. Since small objects are characterized by a peak in intermediate positive curvature, large objects are often misclassified when they have a rounded form, and small objects are often misclassified when comprising an elongated or irregular form. On the other hand, inanimate objects can be misclassified due to textured structure or when containing ‘appendages’. Animate objects are rarely misclassified as inanimate.

    All our results so far use a particular value of the Gaussian blurring filter to remove high-frequency information, namely ρ=0.04 and a training fraction of 30%. Classification success rates exhibit some dependence on ρ and a weak dependence on the training fraction; performance suffers when ρ is very small (image noise dominates) and when ρ is very large (image details begin to be entirely washed out), but the success rates are not sensitive to small changes in ρ near the optimally performing value (see electronic supplementary material, Section B, and figures S2, S3 and S4 for visualizations of these dependencies).

    (c) Texform image classification

    Images of natural objects are distinguished by their correlation statistics. Texforms attempt to scramble these [51] and thus serve as a test of our binary Bayesian classifier for distinguishing animacy and size. In figure 7, we show that texforms preserve similar features as natural images, and moreover, have very similar classification results. The top row of figure 7 shows the NCC distributions for texforms across both animacy and size. While texforms do not retain the large straight or flat regions characteristic of inanimate images, we do see slightly higher zero-curvature content in the inanimate distribution, and find that this difference is sufficient for classifying animate from inanimate texforms with relatively high confidence. In addition, the primary difference between large and small texforms is in intermediate positive curvatures, as in natural images. The bottom row of figure 7 shows the true positive/false positive results from 1000 trials of a Bayesian classifier (see electronic supplementary material, Section B, and figure S5 for results across all four animacy/size subcategories).

    Figure 7.

    Figure 7. Cognitive categories of texforms show characteristic NCC histograms. Similar to figure 3, the top two subfigures show the mean ± s.d. of the NCC distributions for texforms corresponding to images of animate/inanimate and large/small objects (adapted from [51]). The bottom two subfigures show the corresponding histograms of true positives and false positives from 1000 runs of a binary classifier (equation (3.1)).

    Overall, we see that the accuracy of the classifier for texforms is very similar to those of natural images; in fact, two of the categories yield marginally higher results. Additionally, figure 8 shows false positive and false negative samples across each of the four categories. When classifying samples by animacy, we find that some images with considerable straight regions—such as the bull and bird—are confused for inanimate objects, while some highly textured samples—such as the stroller and wreath—are confused for animals. On the other hand, the samples mistakenly classified as small—such as the seal and armchair—are very rounded, contributing intermediate, positive curvatures, while the samples mistakenly classified as large—such as the Chihuahua and mouse running wheel—have a disproportionate amount of high or near-zero curvatures for their category.

    Figure 8.

    Figure 8. Classification errors using NCC are informative about cognitive categories of texforms. Following our analysis of natural images, we examine the errors made by a Bayesian classifier over the texform dataset (adapted from [51]). We find similar qualitative characteristics in misclassified samples to those described in figure 6: round large objects may be misclassified as small; inanimate objects with textured features may be misclassified as animate; animals are more difficult to classify by size.

    4. Normalized contour curvatures distribution as an image feature

    Our results so far suggest that NCC statistics are consistent with experimental observations of cognitive categories across size and animacy, and thus raise the question of its use as an image feature in other downstream neural classification tasks. Complementing fMRI studies with human subjects, studies on macaque monkeys that directly measure neural activity in the brain have shown spatial localization and distinct topographical ordering of the neural responses in the visual cortex [54]. The results show that data types such as alpha-numeric characters in the Helvetica font, Tetris-like shapes and simple cartoon faces [52], each of which are clearly distinguished by different geometric statistics, as shown in figure 9a,b are topographically ordered within distinctive regions in the IT cortex. This raises the question: can normalized contour curvature reveal any structure across these categories? In line with standard image classification approaches, we can think of the calculated NCC distribution as a ‘feature vector’ representing an image. Thus, the NCC distribution can be combined with any other image features (such as average intensity, colour information, etc.) by concatenation into a larger feature vector.

    Figure 9.

    Figure 9. NCC provides an interpretation of topographical ordering of image categories in monkey IT cortex. (a) Images of Tetris, Helvetica and cartoon face stimuli used by Srihasam et al. [52] to demonstrate proto-structure of the macaque IT cortex. (b) Spatial structure of neural activation measured in macaques in response to these stimuli, with cartoon faces in cyan, Helvetica fonts in blue, Tetris motifs in green and monkey faces in red (by Srihasam et al. [52]). (c) We apply principal component analysis of the NCC distributions across all stimuli. The first three components yield interpretation: Component 1 represents the amount of zero-curvature content, or straightness, Component 2 represents relative concave content and Component 3 represents relative convex content. (d) By visualizing the three-dimensional space of PCA coefficients corresponding to the first three components, we see clustering across categories consistent with the neural activation shown in (b).

    (a) Methods

    To visualize the relative clustering of NCC distributions across the different image categories, we project the features to a lower-dimensional space by applying principal component analysis (PCA). We first construct a data matrix in which each column corresponds to an NCC distribution for a specific image, and calculate its singular value decomposition (SVD). Then, any NCC distribution can be approximated as a linear combination of orthonormal basis vectors (components). As we will see, the first three principal components have a natural geometric interpretation, and the projections of the data vectors into the three-dimensional space corresponding to these components allows us to evaluate separability across categories using the NCC as a feature vector.

    (b) Results

    In calculating the NCC for these, we use a relative filter size of ρ=0.018 (as for the analysis of other images, the choice of this parameter does not change our results qualitatively). In figure 9c, we see clear structure in the principal components. In particular, the first component, which captures 89% of the variation in the data, represents a simple peak at zero curvature, therefore describing the amount of ‘straightness’ in the data. The second component can be interpreted to represent concave content—from regions of the intensity surface which contain both positively and negatively curved contours—and the third component can be interpreted to represent convex content—which contains positive but not negative curvature content. This observation agrees with the finding that convexity and concavity drive differential neural responses to visualstimuli [53].

    Visualizing the projections of the images onto the first three principal components shown in figure 9d, we see that the Tetris pieces with straight sides and corners cluster tightly due to a high contribution from the first component. Similarly, the cartoon faces also cluster tightly due to their distinct structure—large positive curvatures (due to the round face), with small amounts of negative and zero-curvature content due to the eyes, nose and other facial features. Finally, the Helvetica glyphs span the space between the latter two categories, as they contain a varied combination of straight rectangular portions and curved segments. In particular, a few are structurally very similar to ‘Tetris pieces’ (such as the letters ‘I’, ‘L’ and ‘H’). This ordering is particularly evident when we project the images onto the first two singular vectors; in fact, it roughly matches the topographical organization observed in the brain (see [52], fig. 3). Overall, we see that using the NCC statistics as a feature vector allows for linear separability of image classes in terms of a few geometrically interpretable principal components. We note that NCC permits interpretable geometric explanation for qualitative dimensions of ‘animate–inanimate’ and ‘stubby–spiky’ discovered by an artificial neural network in [54]: animate–inanimate distinctions are defined by zero-curvature content, while stubby–spiky distinctions are defined through high-curvature content.

    5. Normalized contour curvatures as a classifier under varying illumination and viewpoint

    Robust cognitive classification based on vision not only needs to be invariant to transformations of scaling, translation and rotation, but also to variations in illumination and viewpoint. In the current context, we can ask if NCC-statistics based classifiers are robust to these variations. While it is difficult to provide theoretical guarantees, we performed an experimental evaluation by quantifying the variation in NCC across illumination/viewpoint changes for individual objects, and comparing against inter-object variation using the Amsterdam Library of Object Images (ALOI) [55]. The ALOI is a dataset of 1000 household items, systematically photographed to vary viewing angle, lighting direction and lighting colour, for a total of 24 varied lighting conditions and 72 varied viewpoints, samples from which are shown in figure 10a,c.

    Figure 10.

    Figure 10. Robustness of NCC to viewpoint and illumination. We examine robustness of NCC to two additional perceptual factors—viewpoint and illumination—by analysing images of household objects taken under varying conditions from the Amsterdam Library of Object Images (ALOI) [55]. We consider six objects: a shoe, spool of thread, cat statue, small clock, tennis shuttlecock/birdie and a seashell. After calculating NCC for images of the objects taken at varying conditions (varying viewpoint in the top row, or illumination in the bottom row), we project the NCC distributions to two dimensions through multi-dimensional scaling (MDS). The relative clustering of points in two dimensions can be considered to be a representation of the similarity of the NCC distributions for a given object. For instance, in the analysis of viewpoint variation, we find the birdie (black) and the spool of thread (green) have a very tightly clustered distribution, while objects such as the clock (cyan) and the seashell (purple) do not. This is understandable, as the former pair are radially symmetric, while the latter pair are difficult to recognize from the back. In the analysis of illumination variation, we find that reflective objects (such as the clock and shell) once again have more variance in the distributions, while objects with a more matte texture (such as the shoe and cat) have more consistent NCC.

    (a) Methods

    For our analysis, we choose six example object categories from the ALOI dataset, and randomly sample 15 images of varying lighting and viewpoint per object. To measure the similarity between probability distributions, we calculate the Jenson–Shannon divergence—a symmetric and smooth extension of the KL divergence—between NCC values for each pair of images. We then perform a multi-dimensional scaling (MDS) analysis to find a two-dimensional mapping which best preserves the ‘distances’ between every pair of image samples, allowing us to visually evaluate inter- and intra-object similarity for varying illumination and viewpoint conditions.

    (b) Results

    In figure 10b,d, we show that the two-dimensional representations derived from NCC distributions represent consistent, and almost separable, clustering of images within object categories for variations in viewpoint and illumination. This is notable, as NCC is a simple metric which retains no correlative spatial information for an image. For example, in the case of illumination, there is larger variability for the seashell—for which some illumination angles cause part of the shell to fall into deep shadow, making it difficult to recognize—and the clock—in which the light casts shadows through the glass surface. In the case of changing viewpoint, we see very low variability within the spool of string and the shuttlecock, as these are radially symmetric, and higher variability in objects that look very different from behind, such as the cat and clock. All together, our results show that NCC is relatively robust to changes in illumination and viewpoint. Additionally, the spread of the clusters reflect sources of error that may be similar to those of human observers.

    6. A generative model for normalized contour curvatures distributions

    Our use of the normalized contour curvature distribution as an cognitive classifier has highlighted how its invariance properties allow for an interpretable differentiation between image categories. We now turn to ask if it possible to create a generative model for constructing artificial images with specified NCC statistics related to known cognitive categories. To do so, we consider the theoretical distributions for the curvature statistics of a spatially varying Gaussian field with a correlation length distribution corresponding to a given NCC probability density, determined through an optimization procedure. This allows us to generate artificial images patch-by-patch with sampled correlation lengths, and thus compare the empirical, theoretical and generated NCC statistics (see electronic supplementary material, Section F for more details on Gaussian random fields).

    (a) Methods

    We start with the assumption that Gaussian-filtered natural images can be roughly described in terms of distinct local patches drawn from Gaussian-correlated Gaussian fields with different correlation lengths ξ. Formally, this can be expressed as

    P(κ^)=0dξP(κ^|ξ)P(ξ),6.1
    where P(κ^) is the NCC distribution, P(ξ) is the correlation length distribution for a given cognitive category (e.g. animate), and P(κ^|ξ) represents the single-pixel distribution for the NCC given that the local patch is described by a Gaussian-correlated Gaussian field with correlation length ξ. In discrete form, the probability distribution P(κ^) becomes the probability vector pmodelκ^, P(ξ) becomes pξ, and P(κ^|ξ) becomes Ξ, a matrix whose columns represent the NCC distribution for a Gaussian field with the corresponding correlation length ξ, such that
    pmodelκ^=Ξpξ.6.2
    Formally, the elements of Ξ can be written in terms of the conditional cumulative distribution function C(κ^|ξ) (see electronic supplementary material, Section G):
    Ξnm=C(κ^n+1|ξm)C(κ^n|ξm).6.3
    Then, we fit the correlation length probability vector pξ by taking it to be the probability vector minimizing the Kullback–Leibler divergence of the model distribution pmodelκ^ from the measured distribution pdataκ^,
    pfitξ=argminpξDKL(pdataκ^||pmodelκ^).6.4

    Note that this is a convex problem, which can be computed with standard optimization solvers (we used the ‘cvxpy’ toolbox). Once we extract pfitξ, we can start generating artificial images. We first divide the image into patches of a chosen size. Then, for each patch, we draw a correlation length ξ from pfitξ, and fill in pixel values from a Gaussian-correlated Gaussian field with this correlation length, conditioning on pixels in the neighbouring frames to maintain continuity across patches. Finally, once all panels have been filled in, we threshold the image, ignoring all pixels with intensities less than σf (where σf is the standard deviation of the image).

    (b) Results

    In figure 11a, we show the fit for the correlation length distribution and see that it has a sparse three-peak structure. The first peak, occurring close to ξ=0, is a boundary artefact, as calculated NCC distributions will have non-zero densities in the very first and last bin, which is theoretically impossible with nonzero correlation length. The second peak captures the majority of the distribution, while the last peak contributes to the bump at intermediate positive curvature values (around [0.5, 0.75])—which, as we have argued earlier, is indicative of the circular characteristics of small objects. The relative weights associated with the peaks in the correlation length distribution allow P(ξ) to effectively capture the differences across both dimensions of animacy and size. Inanimate images have an additional peak at the maximum normalized correlation length, corresponding to a peak at κ^=0, while large images have higher second peaks relative to third peaks, indicating greater probability density at the edges of the NCC distribution (see electronic supplementary material, Section E figure S7, for additional intuition and analysis of the structure of P(ξ)). In figure 11b, we show that there is a good match between the NCC distribution of images from a category (e.g. animate), the theoretical distribution computed using the above optimization procedure, and the NCC of the images created using the generative algorithm. However, when the intensity fields associated with the generative approach are visualized, as shown in figure 11c, we note that they do not correspond to either natural images or even texforms. This is because the simple generative procedure captures only local—not global—spatial correlations, leading to generated images that exhibit smooth characteristics with irregular global geometric and topological structure, but do not represent individual objects.

    Figure 11.

    Figure 11. Generative model for NCC distributions. (a) Correlation length probability distribution for the animate image class extracted according to equation (6.4). (b) Comparison of animate NCC distribution, the model fit pfitA from equation (6.2) and the NCC distribution calculated from 100 thresholded (at an intensity of σf) artificial images. (c) Three images generated by constructing a Gaussian-correlated Gaussian random field with correlation length drawn from the distribution in (a).

    7. Discussion

    There is a need for simple, interpretable, properly invariant metrics to describe cognitive categorization and thereby illuminate the underlying mechanisms in both artificial neural networks and the neural processing behind human cognition. Our use of the normalized contour curvature distribution aims to rejuvenate an old geometric idea, but in a probabilistic setting, recognizing that we need a metric that is invariant to the Euclidean group of rotations, translations, and intensity scaling, while accounting for statistical variability within and across images. We have shown that a simple metric based on the NCC distribution is consistent with multiple experimental findings in humans and monkeys: it can distinguish between cognitive categories, is robust to changes in lighting conditions and viewpoint, and has plausible implementability in neural circuitry given that curvature has roots in characterizing orientational information that can be pooled.

    Though the origins of neural networks are deeply rooted in neurobiology, many recent computational methods have diverged from their biological motivations, inspiring questions about how to reconnect these disciplines [5658] in both a biological context, as well as in the development and refinement of computer vision. For instance, encoding transformation invariances within network architectures has been shown to improve performance and sample complexity in some computer vision tasks [59,60]. Moreover, curvature has been incorporated into deep networks for recognition of three-dimensional objects [61]. Additionally, while efforts to understand and interpret trained deep learning systems are still underway [62,63], a number of studies point to the significance of curvature, e.g. convolutional neural networks have revealed tuning of neurons to boundary curvature within simple images [64], curvature-driven mid-level network structure [65] and differences in the perception of curvature between human vision and computer vision [66].

    Our study of biologically plausible image representations using normalized contour curvature distributions is a step in the direction of improving interpretability using a probabilistic framework for a simple geometric object. By considering intensity level set contours within an object, NCC extends the study of boundary contour stimuli to consider perception of natural images. We note that the curvature of internal contours is likely to be correlated with that of the boundary contour—but there are also many examples where this is not the case. For instance, within the illumination dataset demonstrated in figure 10, the outlines of some objects would blend into the background and therefore not be well defined, but the objects are still identifiable by visible internal structure. On the other hand, while the boundary of the floral-patterned cup in figure 6, top right has low curvature associated with inanimacy, the internal texture likely contributes to its misclassification as an animate object. Further work could consider the integration of NCC within network architecture, examine correlations of input NCC with corresponding layer activation to understand primitives learned by networks, and assess the consistency of cognitive categorization between NCC and deep learning.

    One limitation of the proposed NCC distribution as a metric is that it does not account for the spatial distributions of curvature content. When perceiving complex natural scenes, or even more detailed objects, we clearly rely on non-local and/or higher-order statistics of the visual field [67]. To account for this, future development of this metric could integrate NCC with a pyramid framework, in which a feature descriptor is applied at varying locations and scales [68]. However, it has also been shown that the neural response to an image is not distinct. By estimating the size of receptive fields it is possible to construct artificial images with distorted peripheral intensities which are perceptually indistinguishable for a human observer, known as metamers [69], as well as alternative artificial stimuli, such as texforms [51,70]. Both metamers and texforms preserve some (scale-dependent) measures of spatial correlations in contour curvature, and may be analogous to cubist, pointillist and other styles of representation in art. Our analysis is at one extreme of the coarse-graining of spatial information and suggests that local pooling of neural activation is an important aspect of image processing, consistent with results that the NCC distributions of modified texforms contain similar defining curvature characteristics across animacy and size as in natural images. Finally, our generative model for NCC highlights a particularly difficult problem by failing to capture global geometric and topological considerations in images; how we might augment a statistical-geometric approach such as the one used here with ideas from integral geometry [71] remains an open question.

    Data accessibility

    The code and materials are available from the Zenodo digital repository: https://zenodo.org/badge/latestdoi/339746850 [72].

    The data are provided in electronic supplementary material [73].

    Authors' contributions

    A.M.: conceptualization, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft; I.T.: data curation, formal analysis, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing; L.M.: conceptualization, formal analysis, funding acquisition, methodology, project administration, resources, supervision, validation, writing—original draft, writing—review and editing.

    All authors gave final approval for publication and agreed to be held accountable for the work performed therein.

    Conflict of interest declaration

    We declare we have no competing interests.

    Funding

    This work was supported in part by the US National Science Foundation (grant no. DMS-1764269 to A.M., I.T. and L.M.), the Simons Foundation (L.M.) and the Henri Seydoux Fund (L.M.).

    Acknowledgements

    We thank Margaret Livingstone for asking a question that launched this study, and Talia Konkle for providing us with the animate/inanimate and large/small image sets and both of them for discussions.

    Footnotes

    Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.6644081.

    These authors are first co-authors and contributed equally to this study.

    Published by the Royal Society. All rights reserved.