Edge Devices Inference Performance Comparison

Question

1. What are the reasons encouraging the development of edge devices in computer vision applications?

2. What are the key features of MobileNetV1?

3. What is the purpose of EfficientNet V1 and V2?

4. What are the key features of ResNet V1 and V2?

Accepted Answer

The development of edge devices in computer vision applications is encouraged by several reasons. Firstly, network load is a concern as sending high-resolution data from a vast number of IoT devices to a computational unit may result in unwanted and unpredicted time delays. Secondly, computational unit load is another factor, as analyzing high-resolution data using current state-of-the-art models may result in a cost-inefficient system. Lastly, safety is a crucial aspect, as sending raw data to the cloud may get targeted by hackers or lower the trust of users who do not want their data stored on an undisclosed server. To address these issues, using a feature extractor on an edge device and sending only anonymous features to the cloud is a better approach. This has led to the development of specialized inference chips and various scalable architectures that can extract features correctly. However, choosing the best algorithm or platform depends on the specific application, making it a time-consuming task for engineers. Profiling different models on various platforms and sharing results on a moderated platform can help in making the selection process more efficient. The research presented in the provided section focuses on profiling popular models available in TensorFlow as feature extractors on multiple platforms, providing valuable insights for ML engineers to make informed decisions.

Accepted Answer

MobileNetV1, introduced by Howard et al. in 2017, is based on a streamlined architecture that utilizes depthwise separable convolutions. It employs two hyperparameters: width multiplier and resolution multiplier. The width multiplier adjusts the number of input channels (M) and output channels (N) for each layer, while the resolution multiplier sets the input resolution. The fast inference of MobileNetV1 is achieved through dense 1x1 convolutions, which are highly optimized using general matrix multiply (GEMM) functions. These convolutions eliminate the need for initial reordering in memory, contributing to efficient computation on mobile devices.

Accepted Answer

EfficientNet V1 and V2 aim to improve model scaling and performance by balancing network depth, width, and resolution. The authors, Tan and Le, focused on optimizing FLOPS rather than latency, as they did not target specific hardware. They used neural architecture search (NAS) to create a scalable baseline network, resulting in the EfficientNets V1 family with eight models (B0 to B7). In their subsequent paper, they optimized training speed and parameter efficiency using NAS and scaling, leading to the EfficientNetV2 family with seven models (B0 to B3 and S, M, L). EfficientNetV2 achieved state-of-the-art top-1 accuracy on ImageNet, surpassing the famous ViT (Dosovitskiy et al., 2020).

Accepted Answer

ResNet V1 and V2 introduced residual connections, allowing deeper networks than predecessors. ResNet V1 includes ResNet50, ResNet101, and ResNet152, achieving state-of-the-art results on ImageNet. ResNet V2 rethought residual blocks and removed ReLU from the 'easiest' path after addition. Both ResNet families were profiled due to their similarities with InceptionV3 and VGGs, highlighting their importance in deep learning research.

Accepted Answer

Inference time is measured using the timeit package in Python. The process involves creating a model, compiling it, and then performing warmup inferences and proper inferences with time measurement. The results are recorded in a CSV file, including model parameters, minimum, maximum, mean, and standard deviation of inference time, and median inference time for Jetson Nano. The default number of proper inferences is 1024, but it may vary for specific cases, such as the MobileNetV3 family on the Neural Stick, where it was set to 128 due to inference duration.

Accepted Answer

The performance difference between Coral and other platforms for MobileNetV2 is significant. Figure 1 reveals that Coral devices achieve the highest frames-per-second performance. Coral USB was 4.42 times faster than Jetson and 9.08 times faster than Neural Stick for an input size of 224. Additionally, Corals performed better for an input size of 512 than other devices, with Coral being 1.16 times faster than Jetson Nano and 2.39 times faster than Neural Stick. The performance gap between Google's platforms and others deepens with increasing input size, indicating better computation optimization for the prior platforms. This insight is valuable for designing systems with strict maximum inference time constraints.

Accepted Answer

Google Coral has limitations in handling large model sizes and specific activation functions. When the model size exceeds on-chip memory limits, data fetching from external memory causes additional latency. Exceeding the unspecified model size limit results in compilation failure. Coral does not support the hard-swish activation function, which is required for MobileNetV3 to fully utilize TPU computation. Consequently, some operations are executed off-chip, increasing model latency. Neural Stick behaves differently from Google's platform, throwing errors for large models. However, it can still work with more models than Coral.

Edge Devices Inference Performance Comparison

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What are the reasons encouraging the development of edge devices in computer vision applications?

2. What are the key features of MobileNetV1?

3. What is the purpose of EfficientNet V1 and V2?

4. What are the key features of ResNet V1 and V2?

5. How is inference time measured in the benchmarking process?

6. What is the performance difference between Coral and other platforms for MobileNetV2?

7. What are the limitations of Google Coral for model size and activation functions?

Citations

The AI Galaxy: A Comparative Study of Cutting-Edge AI Technology

On the Condition Monitoring of Bolted Joints through Acoustic Emission and Deep Transfer Learning: Generalization, Ordinal Loss and Super-Convergence

Herbal leaf classification using deep learning model efficientnetv2b0

Deep transfer learning for sustainable waste management: Real-time waste segregation apparatus using a two-phase CNN framework

References

Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Related Papers (5)

Prediction of Visual Inference Performance

Pyramid: Enabling Hierarchical Neural Networks with Edge Computing

Accelerating Training of Deep Neural Networks via Sparse Edge Processing

Accelerating Training of Deep Neural Networks via Sparse Edge Processing

Tolerating Stuck-at Fault and Variation in Resistive Edge Inference Engine via Weight Mapping