1. What are the reasons encouraging the development of edge devices in computer vision applications?
The development of edge devices in computer vision applications is encouraged by several reasons. Firstly, network load is a concern as sending high-resolution data from a vast number of IoT devices to a computational unit may result in unwanted and unpredicted time delays. Secondly, computational unit load is another factor, as analyzing high-resolution data using current state-of-the-art models may result in a cost-inefficient system. Lastly, safety is a crucial aspect, as sending raw data to the cloud may get targeted by hackers or lower the trust of users who do not want their data stored on an undisclosed server. To address these issues, using a feature extractor on an edge device and sending only anonymous features to the cloud is a better approach. This has led to the development of specialized inference chips and various scalable architectures that can extract features correctly. However, choosing the best algorithm or platform depends on the specific application, making it a time-consuming task for engineers. Profiling different models on various platforms and sharing results on a moderated platform can help in making the selection process more efficient. The research presented in the provided section focuses on profiling popular models available in TensorFlow as feature extractors on multiple platforms, providing valuable insights for ML engineers to make informed decisions.
read more
2. What are the key features of MobileNetV1?
MobileNetV1, introduced by Howard et al. in 2017, is based on a streamlined architecture that utilizes depthwise separable convolutions. It employs two hyperparameters: width multiplier and resolution multiplier. The width multiplier adjusts the number of input channels (M) and output channels (N) for each layer, while the resolution multiplier sets the input resolution. The fast inference of MobileNetV1 is achieved through dense 1x1 convolutions, which are highly optimized using general matrix multiply (GEMM) functions. These convolutions eliminate the need for initial reordering in memory, contributing to efficient computation on mobile devices.
read more
3. What is the purpose of EfficientNet V1 and V2?
EfficientNet V1 and V2 aim to improve model scaling and performance by balancing network depth, width, and resolution. The authors, Tan and Le, focused on optimizing FLOPS rather than latency, as they did not target specific hardware. They used neural architecture search (NAS) to create a scalable baseline network, resulting in the EfficientNets V1 family with eight models (B0 to B7). In their subsequent paper, they optimized training speed and parameter efficiency using NAS and scaling, leading to the EfficientNetV2 family with seven models (B0 to B3 and S, M, L). EfficientNetV2 achieved state-of-the-art top-1 accuracy on ImageNet, surpassing the famous ViT (Dosovitskiy et al., 2020).
read more
4. What are the key features of ResNet V1 and V2?
ResNet V1 and V2 introduced residual connections, allowing deeper networks than predecessors. ResNet V1 includes ResNet50, ResNet101, and ResNet152, achieving state-of-the-art results on ImageNet. ResNet V2 rethought residual blocks and removed ReLU from the 'easiest' path after addition. Both ResNet families were profiled due to their similarities with InceptionV3 and VGGs, highlighting their importance in deep learning research.
read more