By Abhisesh Silwal
Bud counts or node counts are important dormant season measurements that influence dormant pruning level decisions for yield management. Often pruners use the count of buds or nodes to decide where to prune and how many buds to retain after pruning. Additionally, bud counts are also used to calibrate mechanical pre-pruners for desired outcomes. Currently, these processes are manual and practiced sparsely using human efforts to count individual buds. So, the question under scrutiny here is if computers can excel the process of counting nodes from images.
Before answering that question let’s see how computers actually detect objects in images. Deep learning or Deep Neural Network is the current hot research topic in Computer Vision. Deep neural nets are increasingly getting popular and are taking over a lot of Artificial Intelligence tasks, including but not limited to language understanding, image recognition, and autonomous driving. A core component of deep neural nets is the Convolutional Neural Networks (CNN). In brief, CNN’s use special filters to extract features from images which the computer learns to do by itself during the training process. There are multiple layers of convolution along with other specific layers in the architecture, hence the term “Deep”. The outcome of this process is often referred to as feature maps. The localization of the objects, i.e. where the object of interest is in a given image, builds on top of this feature map. Again in short, the region of interest in the feature map, goes through a classifier (a technique to decide what type of object) and a bounding box regressors (laterally to draw a box around the object).
To detect buds from images, our vision team from CMU adopted the process described above. Deep neural networks are data driven approach and require large amount of data to train the system. As described in our earlier blog, we imaged the dormant vines during the winter of 2017 and hand-labeled a large data set as training samples from multiple varieties of vines.
Figure: The result of deep-learning based vision system for detecting buds in images. The yellow rectangle encloses an individual node. Riesling (left), Vignoles (Right).
Typically, a node consists of a compound bud with a primary, secondary and tertiary shoot. In normal healthy vine development, we assume the primary shoot will grow and bear fruit and suppress the auxiliary shoots from growing. If the primary shoot somehow gets damaged, the auxiliary shoots can grow as replacement for the primary. However, the secondary and tertiary buds generally bear significantly less fruit compared to the primary. Thus, in this study, we assume that each node has a single compound bud which will grow one fruiting shoot. The above mentioned technique approximately requires 0.5 seconds computation time per image and typically 3 to 4 images are enough to cover an entire vine, thus significantly improving counting time. We are further testing this system at large scale at CLEREL.