Javascript required
Skip to content Skip to sidebar Skip to footer

Click Here to Draw and Dowload by Bounding Box

In this tutorial you will learn how to train a custom deep learning model to perform object detection via bounding box regression with Keras and TensorFlow.

Today'due south tutorial is inspired by a bulletin I received from PyImageSearch reader Kyle:

Howdy Adrian,

Many thanks for your four-part serial of tutorials on region proposal object detectors. It helped me empathize the nuts of how R-CNN object detectors work.

Only I'grand a bit dislocated by the term "bounding box regression." What does that mean? How does bounding box regression work? And how does bounding box regression predict locations of objects in images?

Bang-up questions, Kyle.

Basic R-CNN object detectors, such as the ones we covered on the PyImageSearch blog, rely on the concept of region proposal generators.

These region proposal algorithms (e.chiliad., Selective Search) examine an input paradigm and and so place where a potential object could be. Keep in mind that they have absolutely no idea if an object exists in a given location, simply that the area of the epitome looks interesting and warrants further inspection.

In the classic implementation of Girshick et al.'s R-CNN, these region proposals were used to excerpt output features from a pre-trained CNN (minus the fully-continued layer head) and and then were fed into an SVM for final classification. In this implementation the location from the regional proposal was treated as the bounding box, while the SVM produced the course label for the bounding box region.

Essentially, the original R-CNN architecture didn't really "larn" how to detect bounding boxes — it was not end-to-terminate trainable (future iterations, such as Faster R-CNN, actually were terminate-to-stop trainable).

But that raises the questions:

  • What if we wanted to train an terminate-to-end object detector?
  • Is it possible to construct a CNN architecture that can output bounding box coordinates, that way we tin actually railroad train the model to brand ameliorate object detector predictions?
  • And if so, how do we get about grooming such a model?

The primal to all those questions lies in the concept of bounding box regression, which is exactly what we'll exist covering today. By the end of this tutorial, y'all'll have an terminate-to-end trainable object detector capable of producing both bounding box predictions and class label predictions for objects in an image.

To acquire how to perform object detection via bounding box regression with Keras, TensorFlow, and Deep Learning, just keep reading.

Looking for the source code to this post?

Spring Right To The Downloads Department

Object detection: Bounding box regression with Keras, TensorFlow, and Deep Learning

In the first part of this tutorial, nosotros'll briefly discuss the concept of bounding box regression and how information technology can be used to railroad train an terminate-to-end object detector.

We'll then hash out the dataset nosotros'll be using to train our bounding box regressor.

From there, we'll review our directory structure for the projection, forth with a elementary Python configuration file (since our implementation spans multiple files). Given our configuration file, we'll be able to implement a script to actually train our object detection model via bounding box regression with Keras and TensorFlow.

With our model trained, we'll implement a 2nd Python script, this i to handle inference (i.due east., making object detection predictions) on new input images.

Let's get started!

What is bounding box regression?

Figure one: Bounding box regression, a grade of deep learning object detection, explained (paradigm source: Cogneethi). In this tutorial, we'll build such a system with Keras, TensorFlow, and Deep Learning.

We are all probable familiar with the concept of image nomenclature via deep neural networks. When performing image classification, we:

  1. Present an input paradigm to the CNN
  2. Perform a forward pass through the CNN
  3. Output a vector with N elements, where Northward is the total number of form labels
  4. Select the class label with the largest probability equally our final predicted class label

Fundamentally, we can think of image classification as predicting a class label.

Just unfortunately, that blazon of model doesn't translate to object detection. It would be impossible for us to construct a class characterization for every possible combination of (x, y)-coordinate bounding boxes in an input image.

Instead, we demand to rely on a different type of machine learning model called regression. Unlike classification, which produces a characterization, regression enables us to predict continuous values.

Typically, regression models are practical to issues such as:

  • Predicting the price of a home (which we actually did in this tutorial)
  • Forecasting the stock market
  • Determining the rate of a affliction spreading through a population
  • etc.

The indicate here is that a regression model's output isn't limited to being discretized into "bins" like a classification model is (remember, a classification model can just output a class label, cipher more).

Instead, a regression model can output whatsoever real value in a specific range.

Typically, nosotros scale the output range of values to [0, 1] during training so calibration the outputs dorsum after prediction (if needed).

In gild to perform bounding box regression for object detection, all nosotros need to practise is adjust our network architecture:

  1. At the head of the network, place a fully-continued layer with 4 neurons, corresponding to the elevation-left and lesser-correct (10, y)-coordinates, respectively.
  2. Given that 4-neuron layer, implement a sigmoid activation role such that the outputs are returned in the range [0, 1].
  3. Train the model using a loss function such as hateful-squared mistake or mean-absolute error on training data that consists of (1) the input images and (2) the bounding box of the object in the epitome.

Later training, we tin can present an input image to our bounding box regressor network. Our network will then perform a forrad laissez passer and then really predict the output bounding box coordinates of the object.

Nosotros'll be roofing object detection via bounding box regression for a single class in this tutorial, but adjacent calendar week we'll extend it to multi-class object detection as well.

Our object detection and bounding box regression dataset

Effigy two: An aeroplane object detection subset is created from the CALTECH-101 dataset. This dataset, including its bounding box annotations, will enable us to train an object detector based on bounding box regression.

The example dataset we are using here today is a subset of the CALTECH-101 dataset, which tin be used to train object detection models.

Specifically, nosotros'll be using the airplane course consisting of 800 images and the corresponding bounding box coordinates of the airplanes in the image. I have included a subset of the airplane instance images in Figure 2.

Our goal is to railroad train an object detector capable of accurately predicting the bounding box coordinates of airplanes in the input images.

Annotation: There'due south no need to download the full dataset from CALTECH-101's website. I've included the subset of plane images, including a CSV file of the bounding boxes, in the "Downloads" section associated with this tutorial.

Configuring your development environment

To configure your system for this tutorial, I recommend following either of these tutorials:

  • How to install TensorFlow ii.0 on Ubuntu
  • How to install TensorFlow 2.0 on macOS

Either tutorial will assist you configure your arrangement with all the necessary software for this blog post in a convenient Python virtual surroundings.

That said, are y'all:

  • Short on time?
  • Learning on your employer's administratively locked laptop?
  • Wanting to skip the hassle of fighting with package managers, bash/ZSH profiles, and virtual environments?
  • Fix to run the code right at present (and experiment with it to your heart's content)?

So join PyImageSearch Plus today! Gain access to PyImageSearch tutorial Jupyter Notebooks that run on Google's Colab ecosystem in your browserno installation required!

Project construction

Get ahead and catch the .nothing from the "Downloads" section of this tutorial. Within, you'll discover the subset of data too as our project files:

$ tree --dirsfirst --filelimit ten . ├── dataset │   ├── images [800 entries] │   └── airplanes.csv ├── output │   ├── detector.h5 │   ├── plot.png │   └── test_images.txt ├── pyimagesearch │   ├── __init__.py │   └── config.py ├── predict.py └── train.py  four directories, eight files

As previously discussed, I'one thousand proving the dataset/ — an airplanes-only subset of CALTECH-101 — in the projection directory. The subset consists of 800 images and one CSV file of bounding box annotations.

We'll review 3 Python files today:

  • config.py : A configuration settings and variables file.
  • railroad train.py : Our training script, which loads the data and fine tunes our VGG16-based bounding box regression model. This training script outputs each of the files in the output/ directory including the model, a plot, and a listing of test images.
  • predict.py : A demo script, which loads input images and performs bounding box regression inference using the previously trained model.

We'll dive into the config.py file in the next section to become the political party started.

Creating our configuration file

Before we tin implement our bounding box regression training script, we need to create a elementary Python configuration file that will store variables reused across our preparation and prediction script, including image paths, model paths, etc.

Open upward the config.py file, and allow's accept a peek:

# import the necessary packages import bone  # define the base path to the input dataset and and so use it to derive # the path to the images directory and annotation CSV file BASE_PATH = "dataset" IMAGES_PATH = bone.path.sep.join([BASE_PATH, "images"]) ANNOTS_PATH = bone.path.sep.bring together([BASE_PATH, "airplanes.csv"])

Python's os module (Line 2) allows united states to build dynamic paths in our configuration file. Our first 2 paths are derived from the BASE_PATH (Line 6):

  • IMAGES_PATH: A path to our subset of CALTECH-101 images
  • ANNOTS_PATH : The bath to the folder containing our bounding box annotations in CSV format

We have iii more paths to define:

# define the path to the base output directory BASE_OUTPUT = "output"  # define the path to the output serialized model, model preparation plot, # and testing image filenames MODEL_PATH = os.path.sep.join([BASE_OUTPUT, "detector.h5"]) PLOT_PATH = os.path.sep.bring together([BASE_OUTPUT, "plot.png"]) TEST_FILENAMES = os.path.sep.join([BASE_OUTPUT, "test_images.txt"])

Our next three paths volition exist derived on the BASE_OUTPUT (Line 11) path and include:

  • MODEL_PATH : The path to our TensorFlow-serialized output model
  • PLOT_PATH : The output training history plot consisting of accurateness and loss curves
  • TEST_FILENAMES : A text file of image filenames selected for our testing set

Finally, we have three deep learning hyperparameters to fix:

# initialize our initial learning rate, number of epochs to train # for, and the batch size INIT_LR = 1e-four NUM_EPOCHS = 25 BATCH_SIZE = 32

Our deep learning hyperparameters include the initial learning rate, number of epochs, and batch size. These parameters are in i user-friendly place then that you tin can keep rail of your experimental inputs and results.

Implementing our bounding box regression grooming script with Keras and TensorFlow

Figure 3: Bounding box annotations in CSV format extracted from the CALTECH-101 dataset are used in this tutorial for deep learning object detection via bounding box regression with Keras and TensorFlow.

With our configuration file implemented, nosotros can motion to creating our bounding box regression grooming script.

This script will be responsible for:

  1. Loading our airplane training information from disk (i.east., both class labels and bounding box coordinates)
  2. Loading VGG16 from disk (pre-trained on ImageNet), removing the fully-connected classification layer head from the network, and inserting our bounding box regression layer head
  3. Fine-tuning the bounding box regression layer head on our training data

I'll be assuming that y'all're already comfortable with modifying the compages of a network and fine-tuning it.

If you are not already comfortable with this concept, I suggest yous read the commodity linked in a higher place before continuing.

Bounding box regression is a concept best explained through code, so open up up the railroad train.py file in your project directory, and let's get to work:

# import the necessary packages from pyimagesearch import config from tensorflow.keras.applications import VGG16 from tensorflow.keras.layers import Flatten from tensorflow.keras.layers import Dense from tensorflow.keras.layers import Input from tensorflow.keras.models import Model from tensorflow.keras.optimizers import Adam from tensorflow.keras.preprocessing.image import img_to_array from tensorflow.keras.preprocessing.prototype import load_img from sklearn.model_selection import train_test_split import matplotlib.pyplot every bit plt import numpy every bit np import cv2 import os

Our training script begins with a option of imports. These include:

  • config : The configuration file we adult in the previous section consisting of paths and hyperparameters
  • VGG16 : The CNN architecture to serve as the base network for our fine tuning approach
  • tf.keras : Imports from TensorFlow/Keras consisting of layer types, optimizers, and image loading/preprocessing routines
  • train_test_split : Scikit-learn's convenience utility for slicing our network into preparation and testing subsets
  • matplotlib : Python'southward de facto plotting parcel
  • numpy : Python's standard numerical processing library
  • cv2 : OpenCV

Again, you lot'll need to follow the "Configuring your development surround" section to ensure that you lot take all the necessary software installed, or elect to run this script in a Jupyter Notebook.

Now that our environment is ready and packages are imported, allow's work with our data:

# load the contents of the CSV annotations file impress("[INFO] loading dataset...") rows = open up(config.ANNOTS_PATH).read().strip().split("\n")  # initialize the list of data (images), our target output predictions # (bounding box coordinates), along with the filenames of the # individual images data = [] targets = [] filenames = []

Here, we load our bounding box annotations CSV data (Line 19). Each tape in the file consists of an image filename and whatsoever object bounding boxes associated with that image.

Nosotros then brand 3 listing initializations:

  • data : Volition soon hold all of our images
  • targets: Will shortly hold all of our predictions and bounding box coordinates
  • filenames : The filenames associated with the actual prototype data

These are 3 carve up lists that stand for to one another. We'll at present begin a loop that seeks to populate the lists from the CSV information:

# loop over the rows for row in rows: 	# break the row into the filename and bounding box coordinates 	row = row.split(",") 	(filename, startX, startY, endX, endY) = row

Looping over all rows in the CSV file (Line 29), our outset step is to unpack the item entry's filename and bounding box coordinates (Lines 31 and 32).

To become a feel for the CSV data, let's take a peek within:

image_0001.jpg,49,30,349,137 image_0002.jpg,59,35,342,153 image_0003.jpg,47,36,331,135 image_0004.jpg,47,24,342,141 image_0005.jpg,48,18,339,146 image_0006.jpg,48,24,344,126 image_0007.jpg,49,23,344,122 image_0008.jpg,51,29,344,119 image_0009.jpg,fifty,29,344,137 image_0010.jpg,55,32,335,106

Every bit y'all can see, each row consists of five elements:

  1. Filename
  2. Starting x-coordinate
  3. Starting y-coordinate
  4. Catastrophe x-coordinate
  5. Ending y-coordinate

These are exactly the values that Line 32 of our script has unpacked into convenience variables for this loop iteration.

Withal working through our loop, side by side nosotros'll load an image:

            # derive the path to the input prototype, load the image (in OpenCV 	# format), and catch its dimensions 	imagePath = os.path.sep.join([config.IMAGES_PATH, filename]) 	image = cv2.imread(imagePath) 	(h, w) = prototype.shape[:2]  	# calibration the bounding box coordinates relative to the spatial 	# dimensions of the input epitome 	startX = float(startX) / w 	startY = float(startY) / h 	endX = float(endX) / w 	endY = float(endY) / h

Line 36 concatenates our configuration IMAGES_PATH with the CSV filename, and subsequently Line 37 loads the epitome into memory using OpenCV.

Nosotros then rapidly grab the image dimensions (Line 38) and scale the bounding box coordinates to the range [0, 1] (Lines 42-45).

Allow'due south wrap up our loop:

            # load the paradigm and preprocess it 	image = load_img(imagePath, target_size=(224, 224)) 	prototype = img_to_array(epitome)  	# update our list of data, targets, and filenames 	information.append(epitome) 	targets.suspend((startX, startY, endX, endY)) 	filenames.append(filename)

Now, using TensorFlow/Keras' load_img method, we overwrite the image we loaded with OpenCV. This time, nosotros ensure that our epitome size is 224x 224 pixels for training with VGG16 followed past converting to array format (Lines 48 and 49).

And finally, nosotros populate those three lists that we initialized previously: (1) information, (ii) targets, and (3) filenames.

At present that nosotros've loaded the data, let's sectionalization it for grooming:

# convert the data and targets to NumPy arrays, scaling the input # pixel intensities from the range [0, 255] to [0, 1] data = np.array(data, dtype="float32") / 255.0 targets = np.assortment(targets, dtype="float32")  # partition the data into preparation and testing splits using 90% of # the data for grooming and the remaining ten% for testing divide = train_test_split(data, targets, filenames, test_size=0.10, 	random_state=42)  # unpack the data split (trainImages, testImages) = split[:two] (trainTargets, testTargets) = split[2:iv] (trainFilenames, testFilenames) = split[four:]  # write the testing filenames to disk so that we can use then # when evaluating/testing our bounding box regressor impress("[INFO] saving testing filenames...") f = open(config.TEST_FILENAMES, "w") f.write("\northward".join(testFilenames)) f.close()

Here we:

  • Convert data and targets to NumPy arrays (Lines 58 and 59)
  • Construct preparation and testing splits (Lines 63 and 64)
  • Unpack the information split up (Lines 67-69)
  • Write all testing filenames to disk at the destination filepath specified in our configuration file (Lines 74-76); these filenames will exist useful to us later in the predict.py script

Shifting gears, let'south prepare our VGG16 model for fine-tuning:

# load the VGG16 network, ensuring the caput FC layers are left off vgg = VGG16(weights="imagenet", include_top=Simulated, 	input_tensor=Input(shape=(224, 224, three)))  # freeze all VGG layers so they will *not* exist updated during the # training process vgg.trainable = Faux  # flatten the max-pooling output of VGG flatten = vgg.output flatten = Flatten()(flatten)  # construct a fully-connected layer header to output the predicted # bounding box coordinates bboxHead = Dense(128, activation="relu")(flatten) bboxHead = Dense(64, activation="relu")(bboxHead) bboxHead = Dense(32, activation="relu")(bboxHead) bboxHead = Dumbo(4, activation="sigmoid")(bboxHead)  # construct the model we will fine-tune for bounding box regression model = Model(inputs=vgg.input, outputs=bboxHead)

Accomplishing fine-tuning is a four-step process:

  1. Load VGG16 with pre-trained ImageNet weights, chopping off the former fully-continued classification layer head (Lines 79 and 80).
  2. Freeze all layers in the body of the VGG16 network (Line 84).
  3. Perform network surgery past constructing a new fully-continued layer head that volition output four values respective to the pinnacle-left and bottom-right bounding box coordinates of an object in an image (Lines 87-95).
  4. Finish network surgery by suturing the new trainable head (bounding box regression layers) to the existing frozen body (Line 98).

And now permit's railroad train (i.eastward., fine-tune) our newly formed beast:

# initialize the optimizer, compile the model, and show the model # summary opt = Adam(lr=config.INIT_LR) model.compile(loss="mse", optimizer=opt) print(model.summary())  # railroad train the network for bounding box regression print("[INFO] training bounding box regressor...") H = model.fit( 	trainImages, trainTargets, 	validation_data=(testImages, testTargets), 	batch_size=config.BATCH_SIZE, 	epochs=config.NUM_EPOCHS, 	verbose=i)

Lines 102 and 103 compile the model with mean-squared fault (MSE) loss and the Adam optimizer.

Grooming commences by making a call to the fit method with our training and validation sets (Lines 108-113).

One time our bounding box regression model is ready, we'll serialize information technology and plot the training history:

# serialize the model to disk print("[INFO] saving object detector model...") model.save(config.MODEL_PATH, save_format="h5")  # plot the model training history Northward = config.NUM_EPOCHS plt.way.utilise("ggplot") plt.figure() plt.plot(np.arange(0, North), H.history["loss"], label="train_loss") plt.plot(np.arange(0, Due north), H.history["val_loss"], label="val_loss") plt.title("Bounding Box Regression Loss on Training Set") plt.xlabel("Epoch #") plt.ylabel("Loss") plt.legend(loc="lower left") plt.savefig(config.PLOT_PATH)

Closing out this training script calls for serializing and saving our model to deejay (Line 117) and plotting grooming loss curves (Lines 120-129).

Notation: For TensorFlow 2.0+ I recommend explicitly setting the save_format="h5" (HDF5 format).

Training our basic bounding box regressor and object detector

With our bounding box regression network implemented, permit's move on to grooming it.

Beginning by using the "Downloads" department of this tutorial to download the source code and instance airplane dataset.

From there, open up a concluding, and execute the following command:

$ python train.py [INFO] loading dataset... [INFO] saving testing filenames...

Our script starts by loading our aeroplane dataset from disk.

We then construct our training/testing split and then save the filenames of the images inside the testing set to disk (and so nosotros can utilise them subsequently when making predictions with our trained network).

From there, our training script outputs the model summary of our VGG16 network with the bounding box regression head:

Model: "model" _________________________________________________________________ Layer (blazon)                 Output Shape              Param # ================================================================= input_1 (InputLayer)         [(None, 224, 224, 3)]     0 _________________________________________________________________ block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792 _________________________________________________________________ block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928 _________________________________________________________________ block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0 _________________________________________________________________ block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856 _________________________________________________________________ block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584 _________________________________________________________________ block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0 _________________________________________________________________ block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168 _________________________________________________________________ block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080 _________________________________________________________________ block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080 _________________________________________________________________ block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0 _________________________________________________________________ block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160 _________________________________________________________________ block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808 _________________________________________________________________ block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808 _________________________________________________________________ block4_pool (MaxPooling2D)   (None, xiv, xiv, 512)       0 _________________________________________________________________ block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808 _________________________________________________________________ block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808 _________________________________________________________________ block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808 _________________________________________________________________ block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0 _________________________________________________________________ flatten (Flatten)            (None, 25088)             0 _________________________________________________________________ dense (Dense)                (None, 128)               3211392 _________________________________________________________________ dense_1 (Dumbo)              (None, 64)                8256 _________________________________________________________________ dense_2 (Dense)              (None, 32)                2080 _________________________________________________________________ dense_3 (Dumbo)              (None, 4)                 132 ================================================================= Total params: 17,936,548 Trainable params: 3,221,860 Non-trainable params: 14,714,688

Pay attention to the layers following block5_pool (MaxPooling2D)these layers correspond to our bounding box regression layer caput.

When trained, these layers will learn how to predict the bounding box (x, y)-coordinates of an object in an paradigm!

Next comes our actual training process:

[INFO] preparation bounding box regressor... Epoch ane/25 23/23 [==============================] - 37s 2s/step - loss: 0.0239 - val_loss: 0.0014 Epoch ii/25 23/23 [==============================] - 38s 2s/stride - loss: 0.0014 - val_loss: 8.7668e-04 Epoch 3/25 23/23 [==============================] - 36s 2s/step - loss: 9.1919e-04 - val_loss: 7.5377e-04 Epoch 4/25 23/23 [==============================] - 37s 2s/step - loss: 7.1202e-04 - val_loss: eight.2668e-04 Epoch 5/25 23/23 [==============================] - 36s 2s/stride - loss: half-dozen.1626e-04 - val_loss: 6.4373e-04 ... Epoch 20/25 23/23 [==============================] - 37s 2s/stride - loss: half dozen.9272e-05 - val_loss: 5.6152e-04 Epoch 21/25 23/23 [==============================] - 36s 2s/step - loss: 6.3215e-05 - val_loss: five.4341e-04 Epoch 22/25 23/23 [==============================] - 37s 2s/step - loss: 5.7234e-05 - val_loss: 5.5000e-04 Epoch 23/25 23/23 [==============================] - 37s 2s/step - loss: 5.4265e-05 - val_loss: 5.5932e-04 Epoch 24/25 23/23 [==============================] - 37s 2s/stride - loss: 4.5151e-05 - val_loss: 5.4348e-04 Epoch 25/25 23/23 [==============================] - 37s 2s/step - loss: 4.0826e-05 - val_loss: 5.3977e-04 [INFO] saving object detector model...

After preparation the bounding box regressor, the following training history plot is produced:

Effigy 4: Bounding box regression object detection preparation plot. We trained this deep learning model with TensorFlow and Keras.

Our object detection model starts off with high loss simply is able to descend into areas of lower loss during the training process (i.east., where the model learns how to make meliorate bounding box predictions).

After training is complete, your output directory should contain the following files:

$ ls output/ detector.h5	plot.png	test_images.txt

The detector.h5 file is our serialized model after grooming.

We'll exist using this model in the next department, where we learn how to make predictions with our bounding box regressor.

The plot.png file contains our preparation history plot while test_images.txt contains the filenames of the images in our testing set (which nosotros'll make predictions on afterwards in this tutorial).

Implementing our bounding box predictor with Keras and TensorFlow

At this point nosotros have our bounding box predictor serialized to deejay — just how do we use that model to detect objects in input images?

We'll be answering that question in this department.

Open upwards a new file, name it predict.py, and insert the following code:

# import the necessary packages from pyimagesearch import config from tensorflow.keras.preprocessing.image import img_to_array from tensorflow.keras.preprocessing.paradigm import load_img from tensorflow.keras.models import load_model import numpy as np import mimetypes import argparse import imutils import cv2 import os

At this point, you should recognize all imports except imutils (my computer vision convenience packet) and potentially mimetypes (built into Python; can recognize filetypes from filenames and URLs).

Allow'due south parse command line arguments:

# construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument("-i", "--input", required=True, 	assist="path to input image/text file of epitome filenames") args = vars(ap.parse_args())

Nosotros take just one command line argument, --input, for providing either (1) a single image filepath or (2) the path to your listing of examination filenames. The test filenames are independent in the text file generated past running the training script in the previous section. Assuming you haven't changed settings in config.py, then the path will exist output/test_images.txt.

Let'southward handle our --input accordingly:

# determine the input file type, but presume that we're working with # single input image filetype = mimetypes.guess_type(args["input"])[0] imagePaths = [args["input"]]  # if the file type is a text file, and so we demand to process *multiple* # images if "text/manifestly" == filetype: 	# load the filenames in our testing file and initialize our list 	# of paradigm paths 	filenames = open(args["input"]).read().strip().split("\n") 	imagePaths = []  	# loop over the filenames 	for f in filenames: 		# construct the full path to the paradigm filename and then 		# update our image paths listing 		p = os.path.sep.join([config.IMAGES_PATH, f]) 		imagePaths.append(p)

In order to decide the filetype, we have advantage of Python'south mimetypes functionality (Line 21).

We and then take 2 options:

  1. Default: Our imagePaths consist of i lone image path from --input (Line 22).
  2. Text File: If the provisional/check for text filetype on Line 26 holds Truthful, then we override and populate our imagePaths from all the filenames (one per line) in the --input text file (Lines 29-37).

Given 1 or more testing images, let'south showtime performing bounding box regression with our deep learning TensorFlow/Keras model:

# load our trained bounding box regressor from disk print("[INFO] loading object detector...") model = load_model(config.MODEL_PATH)  # loop over the images that nosotros'll be testing using our bounding box # regression model for imagePath in imagePaths: 	# load the input paradigm (in Keras format) from disk and preprocess 	# it, scaling the pixel intensities to the range [0, 1] 	paradigm = load_img(imagePath, target_size=(224, 224)) 	image = img_to_array(epitome) / 255.0 	image = np.expand_dims(epitome, centrality=0)

Upon loading our model (Line 41), nosotros begin looping over images (Line 45). Inside, we showtime load and preprocess the image in the exact same style we did for grooming. This includes:

  • Resizing the paradigm to 224×224 pixels (Line 48)
  • Converting to array format and scaling pixels to the range [0, 1] (Line 49)
  • Adding a batch dimension (Line 50)

And from there, nosotros can perform bounding box regression inference and comment the upshot:

            # make bounding box predictions on the input image 	preds = model.predict(epitome)[0] 	(startX, startY, endX, endY) = preds  	# load the input prototype (in OpenCV format), resize it such that it 	# fits on our screen, and catch its dimensions 	image = cv2.imread(imagePath) 	image = imutils.resize(image, width=600) 	(h, w) = epitome.shape[:ii]  	# scale the predicted bounding box coordinates based on the image 	# dimensions 	startX = int(startX * w) 	startY = int(startY * h) 	endX = int(endX * west) 	endY = int(endY * h)  	# draw the predicted bounding box on the image 	cv2.rectangle(image, (startX, startY), (endX, endY), 		(0, 255, 0), 2)  	# bear witness the output image 	cv2.imshow("Output", image) 	cv2.waitKey(0)

Line 53 makes bounding box predictions on the input image. Find that preds contains our bounding box prediction's (10, y)-coordinates; nosotros unpack these values for convenience via Line 54.

Now we have everything we need for note. To annotate the bounding box on the paradigm, nosotros just:

  • Load the original Image from deejay with OpenCV and resize it while maintaining aspect ratio (Lines 58 and 59)
  • Scale the predicted bounding box coordinates from the range [0, 1] to the range [0, w] and [0, h] where w and h are the width and height of the input image (Lines 60-67)
  • Depict the scaled bounding box (Lines lxx and 71)

Finally, nosotros show the output on the screen. Pressing a central cycles through the loop, displaying results one-by-i until all testing images have been exhausted (Lines 74 and 75).

Great job! Let'south inspect our results in the adjacent section.

Bounding box regression and object detection results with Keras and TensorFlow

We are now ready to put our bounding box regression object detection model to the test!

Make sure you've used the "Downloads" section of this tutorial to download the source code, image dataset, and pre-trained object detection model.

From there, permit's try applying object detection to a single input epitome:

$ python predict.py --input dataset/images/image_0697.jpg [INFO] loading object detector...
Figure 5: Bounding box regression — a form of deep learning object detection — has correctly found the aeroplane in this picture. Using TensorFlow/Keras and OpenCV, we were able to detect the aeroplane and describe its bounding box.

As y'all can encounter, our bounding box regressor has correctly localized the airplane in the input image, demonstrating that our object detection model really learned how to predict bounding box coordinates just from the input image!

Adjacent, allow's apply the bounding box regressor to every image in the examination set by supplying the path to the test_images.txt file as the --input command line argument:

$ python predict.py --input output/test_images.txt [INFO] loading object detector...
Figure six: Await at all those flying machines! We put our bounding box regression model to the exam using multiple airplane testing images. Our Keras/TensorFlow model is working well. Be sure to read the "Limitations" section for a brief notation well-nigh multi-grade object detection via bounding box regression.

As Figure half-dozen shows, our object detection model is doing a keen chore of predicting the location of airplanes in our input images!

Limitations

At this point we've successfully trained a model for bounding box regression — just an obvious limitation of this architecture is that it can only predict bounding boxes for a single class.

What if nosotros wanted to perform multi-class object detection where we not only have an "airplanes" class only also "motorcycles," "cars," and "trucks?"

Is multi-class object detection fifty-fifty possible with bounding box regression?

Yous bet it is — and I'll be covering that very topic in next week'southward tutorial. We'll learn how multi-grade object detection requires changes to the bounding box regression architecture (hint: two branches in our CNN) and train such a model. Stay tuned!

What's next? I recommend PyImageSearch University.

Form information:
35+ total classes • 39h 44m video • Final updated: April 2022
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right instructor you could principal computer vision and deep learning.

Practise you lot remember learning computer vision and deep learning has to exist time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a caste in computer scientific discipline?

That's non the case.

All you lot need to principal computer vision and deep learning is for someone to explain things to y'all in uncomplicated, intuitive terms. And that'due south exactly what I do. My mission is to modify education and how complex Bogus Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV form online today. Here yous'll learn how to successfully and confidently utilize estimator vision to your work, research, and projects. Bring together me in estimator vision mastery.

Within PyImageSearch Academy you'll find:

  • 35+ courses on essential computer vision, deep learning, and OpenCV topics
  • 35+ Certificates of Completion
  • 39+ hours of on-demand video
  • Brand new courses released regularly , ensuring yous can continue up with state-of-the-art techniques
  • Pre-configured Jupyter Notebooks in Google Colab
  • ✓ Run all lawmaking examples in your web browser — works on Windows, macOS, and Linux (no dev environs configuration required!)
  • ✓ Access to centralized code repos for all 450+ tutorials on PyImageSearch
  • Easy ane-click downloads for code, datasets, pre-trained models, etc.
  • Access on mobile, laptop, desktop, etc.

Click hither to join PyImageSearch Academy

Summary

In this tutorial yous learned how to train an end-to-cease object detector with bounding box regression.

To accomplish this task nosotros utilized the Keras and TensorFlow deep learning libraries.

Dissimilar classification models, which output only class labels, regression models are capable of producing real-valued outputs.

Typical applications of regression models include predicting the cost of homes, forecasting the stock market, and predicting the rate at which a affliction spreads through a region.

However, regression models are not express to price forecasting or disease spreading — we can use them for object detection besides!

The play a trick on is to update your CNN architecture to:

  1. Place a fully-continued layer with four neurons (top-left and bottom-correct bounding box coordinates) at the head of the network
  2. Put a sigmoid activation function on that layer (such that output values lie in the range [0, 1])
  3. Train your model by providing (1) the input epitome and (two) the target bounding boxes of the object in the paradigm
  4. Subsequently, train your model using mean-squared mistake, mean-absolute error, etc.

The concluding result is an cease-to-end trainable object detector, like to the one nosotros built today!

Yous'll annotation that our model tin just predict one blazon of class label though — how can nosotros extend our implementation to handle multiple labels?

Is that possible?

You bet it is — stay tuned next calendar week for function ii in this serial!

To download the source code to this postal service (and be notified when future tutorials are published here on PyImageSearch), only enter your e-mail address in the course beneath!

Download the Source Code and Gratuitous 17-page Resources Guide

Enter your email address below to get a .zip of the lawmaking and a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to assistance you principal CV and DL!

divinematuareen.blogspot.com

Source: https://pyimagesearch.com/2020/10/05/object-detection-bounding-box-regression-with-keras-tensorflow-and-deep-learning/