Skip to content

Object Detection


.json file
input_type

type: int or string (only in the *.json file in the settings folder)
possible values (depends on input variable): 1 (or video), 2 (or image), 3 (or camera), 4 (or ip).
default value: camera

Explanation:
Determine what kind of input the program will use. There are four possible integer values for this variable. They are:
1 (video), 2 (image), 3 (camera), and 4 (ip). more details are in the input variable documentation.

Notes:
For your convenience it is possible to use string type as an input but only in the *.json file in the settings folder. for example instead of writing 3 you could write camera or Camera in the default.json file. the load_settings function in your main.py file will take care of converting the input_type variable to an integer.

Also note that the input variable depends on the input_type variable. So if you change the input_type variable you should make sure that the input variable will have a correct value. for example if the input_type variable is a camera (or number 3), then the input variable should only be an integer which represent which camera in your device such as 1. Another example if the input_type is video (or number 1) then the input should be a file path such as C:/filename.mp4

input

type: int, string
possible values (depends on input_type variable):
input_type is 1 or 2 (video or image): possible values are file paths (Example: C:/filename.mp4 or C:/filename.png).
input_type is 3 (camera): possible values are integers from 0 or more (Example: 2).
input_type is 4 (ip): possible values are Live stream URLs (Example: https://192.168.100.25:8080/video).

default value: 0 (for camera)

Explanation:
This variable depends on the input_type variable.
Here is a brief description of the variable:

input_type is video or image:
The input will be a path of the image or video you want to use in you program. usually we use the video or image for testing purposes.

input_type is camera:
The input variable will be an integer that will describe what camera you are using (note that the count starts from 0 not 1). So if you have three cameras on your device and write 2 in the *.json file, then you will use the third camera. if you write 3 then you will get an error.

input_type IP:
The live stream ip address (URL) you want to use in your program. you could stream from your phone if you use the IP Webcam App for Android devices. If you install the app, open it and scroll to the bottom and select "Start server". Your live IP server in the local area network will be after "IPv4:" (you could choose the first or second). the URL that you need to past is what is after "IPv4:" plus "/video" (Example: http://192.168.100.25:8080/video) make sure you are at the same local area network. you can test if the stream is available by simply writing the URL in your web browser such as Chrome or Firefox.

Notes:
If you are using Windows and the input is for an image or video use "/" instead of "\" for the path of the video or image or add additional "\" to "\". For example C:\Users\Ibrahim\filename.mp4 should instead be C:/Users/Ibrahim/filename.mp4 or C:\Users\Ibrahim\filename.mp4

Do not forget to add the extension. Example: filename.mp4 not filename

AI_model

type: int, string (only in the *.json file in the settings folder)
possible values: 1 (or SSD), 2 (YOLO_v3_accurate), or 3 (YOLO_v3_not_accurate)
default value: 1 (SSD)

Explanation:
there are so far three AI models in this software. they are:

SSD: usually this is the best choice since it is relatively fast and is the fastest out of the three models and is generally more accurate than YOLO_v3_not_accurate model (or 3)

YOLO_v3_accurate: This is based on the You Look Only Once AI model. this is the most accurate model of the three but its main disadvantage is that it is very slow. if you have a normal PC it could in some worst cases get as slow as 0.1 frames per seconds.

YOLO_v3_not_accurate: This is also based on the You Look Only Once AI model with some modifications to make it relatively faster but with the cost of losing accuracy. we do not recommend using this model because it is slower than ssd and in most cases ssd outperforms it, only use it if the object you want to detect is not available in ssd or if you noticed it is better than ssd in a certain circumstance.

Notes:
You do not need a GPU in any of these models

show_frames

type: boolean
possible values: true or false
default value: true

Explanation:
if false the display window will not show. this will be useful if you want to use less processing power.

Notes:
if you want to use the code for a certain project with the Raspberry pi 4 (or any lny linux machine) and would like the code to immediately run once you power the processor, then follow the steps in the readMe.md file which will be generated if you click on the "Generate Code" in the "Control Window" and make sure to make this variable false.

resize

type: boolean
possible values: true or false
default value: true

Explanation:
This variable simply will allow you to scale your frame. if it is false, then the scale variable will not have any effect

scale

type: float
possible values (depends on resize variable): more than 0 and preferred to be less than 1
default value: 1

Explanation:
This variable is the percentage scale of your original frame. if you have a frame of 1000 pixels width and 1000 pixels height and you make scale variable to 0.5 then you will have 500 pixels for height and width. this variable will be useful if you have a frame with very high resolution and you want to make your program run faster.

Notes:
This variable depends on the resize variable. if it is false then its value will have no effect.
It will not make sense if you make this variable more than 1 since it will not increase the resolution but simply will make your program much slower since you have increased the number of pixels.

delay_time

type: int
possible values: 1 and more (preferred 1)
default value: 1

Explanation:
time in milliseconds it waits after each frame.

ASCII_close_window_key

type: int
possible values: 0 to 127 (preferred 27)
default value: 27

Explanation:
the key on your keyboard that will close the program. every key on your keyboard has an ASCII value. the ASCII for the Esc key is 27

confidence

type: float
possible values: 0 to 1 (0% - 100%)

Explanation:
the minimum confidence or accuracy accepted for detection. for example if there is a person in a frame and the program is 0.6 (60%) sure of it and the confidence variable is 0.8 (80%) than this person will not be recorded in your results however if the program is 0.95 (95%) sure then the object will be detected

SSD_config_file

type: string
possible values: .pbtxt file path

Explanation:
configuration file for the SSD model

SSD_model

type: string
possible values: .pb file path

Explanation:
the trained SSD model

SSD_labels

type: string
possible values: .txt file path

Explanation:
the names of the objects (labels) for the SSD model

YOLO_v3_accurate_config_file

type: string
possible values: .cfg file path

Explanation:
configuration file for the YOLO accurate model

YOLO_v3_accurate_model

type: string
possible values: .weights file path

Explanation:
the trained YOLO Accurate model

YOLO_v3_accurate_blob_size

type: int
possible values: 0 or more (preferred 608)

Explanation:
not very sure what exactly this variable does. so I have attached a link that I hope could be useful. the link includes the paper for YOLO.

https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html

YOLO_v3_not_accurate_config_file

type: string
possible values: .cfg file path

Explanation:
configuration file for the YOLO not accurate model

YOLO_v3_not_accurate_model

type: string
possible values: .weights file path

Explanation:
the trained YOLO not accurate model

YOLO_v3_not_accurate_blob_size

type: int
possible values: more than 0 and less than 1 (preferred 608)

Explanation:
not very sure what exactly this variable does. so I have attached a link that I hope could be useful. the link includes the paper for YOLO.

https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html

YOLO_v3_display_border_thickness

type: float
possible values: more than 0 and less than 1 (preferred 0.003)
default value: 0.003

Explanation:
the thickness of the rectangle (or border) that will be displayed on the frame on a detected object for the YOLO model. This number represents the proportion of the hight of the frame. for example if the frame hight is 1000 pixels and the YOLO_v3_display_border_thickness variable is 0.003 then the thickness will be 3 pixels. it is relative to the hight of the frame so that the display will not change if the resolution changes.

YOLO_v3_display_text_size

type: float
possible values: more than 0 and less than 1 (preferred 0.0015)
default value: 0.0015

Explanation:
the size of the text that will display the object name. This number represents the proportion of the hight of the frame. for example if the frame hight is 1000 pixels and the YOLO_v3_display_text_size variable is 0.005 then the thickness will be 5 units. it is relative to the hight of the frame so that the display will not change if the resolution changes.

YOLO_v3_display_text_thickness

type: float
possible values: more than 0 and less than 1 (preferred value 0.0025)
default value: 0.0025

Explanation:
the thickness of the text that will display the object name. this number represents the proportion of of the hight of the frame. for example if the frame hight is 1000 pixels and the YOLO_v3_display_text_thickness variable is 0.0025 then the thickness will be 2.5 units. it is relative to the hight of the frame so that the display will not change if the resolution changes.

SSD_display_border_thickness

type: float
possible values: more than 0 and less than 1 (preferred 0.003)
default value: 0.003

Explanation:
the thickness of the rectangle (or border) for the SSD model that will be displayed on the frame on the detected object. this number represents the proportion of of the hight of the frame. for example if the frame hight is 1000 pixels and the YOLO_v3_display_border_thickness variable is 0.003 then the thickness will be 3 pixels. it is relative to the hight of the frame so that the display will not change if the resolution changes.

location

type: string
possible values: any string
default value: KFUPM

Explanation:
This variable will be one of the results that will be recorded once a detection was found.

Use case:
one of the useful cases is when this software is used in a surveillance application. Say that you decided to use 1000 cameras in a certain area. multiple cameras or each one is connected to a computer that uses the software. you can make the location variable for each camera describes exactly the place it is in such as using the x and y GPS coordinate. Another example is if you have a drone you could update the location variable as the drone moves so that you could know the exact coordinate of each detection you had.

font_TTF_file

type: string
possible values: .ttf file path
default value: resources/fonts/times.ttf (The Times New Roman Font)

Explanation:
the font that will be used in displaying most of the texts in the software.

display_text_sizes

type: python dictionary (or javascript object) with 5 variables each is a float
possible values: every variable in display_text_sizes could be from 0 to 100
default value: {"fps":5,"location":5,"SSD_objects":5,"date_and_time":5,"AI_model": 5}

Explanation:
the 5 variables will be displayed in the frame. their values represent their size percentage relative to the frame hight. so if the frame hight is 1000 pixels and for an example fps is 10 then it will be 10% of the hight of the frame which is 100 pixels height.

Notes:
if you want to remove one of the 5 texts displayed in your frame then make the its variable equals to 0.

display_text_colors

type: python dictionary (or javascript object) with 5 variables each has a list of three integers possible values: every variable in the font_sizes has a list that has three values each could be from 0 to 255 default value: [255,0,0] for all (blue)

Explanation:
the values in the list inside the variables inside the font_sizes describes the color of the text. the color order is BGR (Blue, green, Red). For example if fps is [0,0,255] then it will be red or if it is [0,255,255] it will be yellow and so on.

display_text_locations

type: python dictionary (or javascript object) with 5 variables each has a list of two floats
possible values: every variable in font_sizes has a list that has two values each could be from 0 to 1

Explanation:
the values in the list inside the variables inside the font_sizes describes the location of the texts.

the first value is the x axis and the second value is the y axis.
the coordinate starts from the upper left.
the x axis increases from left to right and the y axis increases from up to bottom.
the x and y axes is for the location of the upper left of the text.

for example if fps is [0.5, 0.5] then the upper left point of the text will be at the center.

Note:
if you make fps or any variable's x or y axis to 1 then it will disappear because this is the location of the upper left of the text.

display_objects_colors

type: list that has 5 values each value should be a string
possible values: every string in the list should have three integer values each from 0 to 255

Explanation:
This variable is only applicable for the two YOLO models. the values represent colors (B,G,R) or (Blue, Green, Red). for example (255,0,0) means that it is a blue color. these colors will display when an object is detected. display_objects_colors values are related to the YOLO labels. the file path is: resources/labels/YOLOv3_Labels.txt.

the YOLOv3_Labels.txt file contains 80 labels (or objects names). the display_objects_colors variable will be mapped with the objects names in the YOLOv3_Labels.txt file so that every object from the 80 objects will have one of the 5 colors. for example the first object in the YOLOv3_Labels.txt file is "person" so its color will be the first color in the five colors the third object which is car will have the third color of the five colors in the display_objects_colors. the sixth object which is "bus" will have the first color of the five colors the seventh will have the second color of the five and so on.

occurrence_interval_frames

type: int
possible values: 1 or more

Explanation:
after a certain object is detected the program will save the image and wait a number of frames equal to the occurrence_interval_frames variable (the waiting includes the saved image) than saves again and so on.

For an example if occurrence_interval_frames is 10 and a car was detected at the first frame than the program will count frame one until 10 then will start searching again at frame 11 for the car object to save it if it is available.

Another example is if occurrence_interval_frames is 10 and a car was first detected at frame 15 and then 18,19,22 then 25 and a person was first detected at frame 16 then 17, 18, 19 and 100 then the frames that will be saved are 15, 16, 25 and 100.

Use case:
this variable will be useful in case you want to save the frames which detected what you want but not every single one to reduce the size of your results. for example if there is a car which is not moving in the scene and the program was able to detect it in every single frame and the program is runs at 10 fps then this means that in only one hour there will be 106060 = 6000 images saved which is a lot.

Notes:
make sure that the save_results and save_images variables are true otherwise the images will not be saved. the occurrence_interval_frames is for every object not every detection as it has been explained in the second example.

save_results

type: boolean
possible values: true or false
default value: true

Explanation:
if there was a detection found then this variable will save the results in the Results folder.

Notes:
a .csv file will always be present if this value is true.
if this value is false then the save_images will not have effect even if it true.

save_images

type: boolean
possible values (depends on save_results): true or false
default value: true

Explanation:
if this variable is true then there will be images saved in the Results folder. make this variable false if you want to save space.

Notes:
if the save_results variable is false then the save_images variable will not save images even if it ia true

add_results_on_image

type: boolean
possible values: true or false
default value: true

Explanation:
if false the results images in the Results folder will not display some results such as frames per seconds and date and time, and location etc.

Notes:
only the saved images will not show if this variable is false. the window that will display the

YOLO_v3_objects_chosen

type: list with string values
possible values: any values in the object_detection/resources/labels/YOLOv3_Labels.txt file.

Explanation:
if you choose one of the two YOLO models, then you can choose one of the objects out of the 80 in the YOLOv3_Labels.txt to be detected. if you choose nothing then nothing will be detected if you choose car and person then only these two will bwe detected

Notes:
do not change the YOLOv3_Labels.txt file. if you do you will get wrong results
the names of the objects should exactly be the same as in the YOLOv3_Labels.txt file or you will get an error.

SSD_objects_chosen

type: list with string values
possible values: any values in the object_detection/resources/labels/SSD_Labels.txt.txt file.

Explanation:
if you choose the SSD model, then you can choose one of the objects in the SSD_Labels.txt to be detected. if you choose nothing then nothing will be detected if you choose car and person then only these two will bwe detected

Notes:
do not change the SSD_Labels.txt file. if you do you will get wrong results.
this file used to have a lot of switched objects labels but a lot were fixed. if you choose a unpopular object then you might get another object name.

add_to_previous_results

type: boolean
possible values: true or false
default value: false

Explanation:
if this value is true and you close and open the program multiple times then the .csv file in the Results folder will not change (the results will be accumulated)

add_header_to_next_results

type: boolean
possible values (depends on add_to_previous_results): true or false.
default value: true

Explanation:
this variable will add a header in the .csv file in the Results folder to the accumulated results (if add_to_previous_results is true and you have open the program more than two times).

image_extension

type: string
possible values: jpg or png
default value: jpg

Explanation:
the formate of the image that will be saved. use .jpg if you want less size.

results
results["objects"]

type: python dictionary

Example: {'person': [4, 0.713], 'car': [1, 0.82]}

Example Explanation:

The keys of the dictionary are the object detected for the frame in the detect function. the value of any key is a list that consist of two numbers. the first number is the number of objects that were detected in that frame. the second numbers is the maximum confidence of the objects. in the example there are 4 people that were detected in the frame and out of the 4 the highest confidence is 71.3%. the second object is only one car so obviously that car confidence is 82% (the software is 82% sure that it is a car)

results["location"]

type: string

Example: KFUPM

Example Explanation:

This is simply one of the inputs that is also used as an output for getting more information about the results.

results["date_and_time"]

type: string

Example: 2022/08/03, Wed, 04:46:40 PM

Example Explanation:

the date and time there were detection on the frame.

year/month/day, day name, hour:minute:seconds AM or PM

results["frame_number"]

type: int

Example: 5937

Example Explanation:

the frame number since the beginning of the program.

results["fps"]

type: float

Example: 5.58

Example Explanation:

the frames per second at the time of detection.

results["AI_model"]

type: string

Example: SSD

Example Explanation:

This is simply one of the inputs that is also used as an output for getting more information about the results.

results["objects_summary"]

type: string

Example: {'person': [14, 15, 0.613], 'cat': [6, 11, 0.83], 'car': [4, 21, 0.942]}

Example Explanation:

this result gives the details of the detected objects since the beginning of the program (unlike results["objects"] which only describes the detected objects in the single frame)

the keys of the dictionary are for the detected objects since the beginning of the program. every key has a list of three values.

the first value is how mush a particular object has occurred since the beginning of the program. this value depends on the occurrence_interval_frames input variable in the .json file. for example if the occurrence_interval_frames is 10 and if the program detects a person on frame number 1 until 10 then the object will still be considered that it has only occurred once. if a person was detected in th 11th or any frame after then this will be considers as the second occurrence. the third occurrence will be 10 frames after the second occurrence and the fourth will be 10 frames after the third and so on. another example is if a car was detected at frame 1,2,6,8,15,100,200,201,204 and 2492, then it will be considered that it has occurred 5 times if occurrence_interval_frames is 10.

the second value is the maximum number of objects that were detected from the beginning of the program (this is similar to the first value in the list in results["objects"] but instead of one frame it will consider all of the frames and then get the maximum number).

the third value is the maximum confidence of the objects since the beginning of the program (this is similar to the second value in the list in results["objects"] but instead of one frame it will consider all of the frames and then get the maximum number).

for the example lets consider the car object. one possible scenario for the first value is if the car was detected at frame number 1,2,3,4 11, 55, and 100. the second value means that out of the seven frames there was a frame were 21 cars were detected and this is the maximum number of the seven frames. the last value means that for all of the detected cars in the whole program the is a car were the program is 94.2% sure that it is a car and this is the maximum number.

results_frames

type: numpy object.

possible values: three or sometimes two dimensional array (hight, width, and sometimes color channels with three values which are usually (b,g,r)). for the three or one channel (two dimensional array) the values are usually from 0 to 255 (uint8 type in the numpy library)

Notes:
1-the "type" and "possible values" are true for all of the BV software, but to be more specific for this specific project (object_detection) it is three dimensional array with three (b,g,r) channels.
2-the names of the frames (or one frame) should be self explanatory so we will not go deep in explaining them. Some one with basic knowledge with the numpy library should be able to deal with them without issues.

frames:
results_frames["original_with_results"]