Smart Robot Uses Vision AI to Automate Sorting Tasks

Smart Robot Uses Vision AI to Automate Sorting Tasks

In a significant advancement for automation in manufacturing, a team of engineering students and researchers from Guangdong Ocean University has developed an intelligent robotic system capable of autonomously identifying, sorting, and transporting spherical objects using advanced computer vision technology. The robot, designed primarily for competitive robotics challenges, demonstrates a high degree of accuracy, efficiency, and real-time adaptability—qualities that could translate into broader industrial applications in logistics, quality control, and smart factory environments.

The project, led by undergraduate student Zhipeng Chen and supervised by senior experimental engineer Derong Li, introduces a novel integration of open-source vision libraries with embedded hardware platforms to achieve robust object recognition and navigation under dynamic conditions. Published in the March 2021 issue of Computer Technology and Development, the research outlines a comprehensive design that combines mechanical engineering, real-time image processing, and wireless monitoring into a cohesive autonomous system.

At its core, the robot relies on a dual-processing architecture: a central STM32 microcontroller manages motor control, motion planning, and coordination of physical components, while a Raspberry Pi 4B handles complex visual computation using the OpenCV library. This division of labor allows the system to maintain responsiveness in movement while performing intensive image analysis—a critical balance for real-world deployment where timing and precision are paramount.

One of the most compelling aspects of the design is its ability to operate entirely without external guidance or human intervention. Once activated, the robot begins a self-directed search routine, rotating in place to scan its surroundings for target objects. These targets, in the context of the original competition setting, are red and blue table tennis balls scattered randomly across a defined arena. However, the underlying methodology is scalable and adaptable to various object types and environments.

The visual recognition pipeline begins with color-based segmentation—a technique that isolates pixels within a specific hue range from the rest of the scene. Instead of relying on the standard RGB color model, which can be sensitive to lighting variations such as shadows or glare, the team opted for the HSV (Hue, Saturation, Value) and Lab color spaces. These models better align with human perception of color and offer greater stability under fluctuating illumination, a common challenge in industrial settings.

Using thresholding techniques, the system converts the captured video feed into binary images where only the desired color regions remain visible. From there, contour detection algorithms analyze the shape and size of each segmented blob. Since the targets are spherical, the algorithm prioritizes circularity and selects the largest qualifying contour as the primary candidate. This geometric filtering ensures that distant or partially obscured balls do not trigger false positives, enhancing the reliability of the detection process.

To maintain continuous tracking once a target is identified, the team implemented the CAMshift (Continuously Adaptive Mean-Shift) algorithm—an evolution of the classic mean-shift method that dynamically adjusts the tracking window’s size and orientation based on the object’s movement and scale changes. This capability is essential when the robot approaches a ball; as the object grows larger in the camera’s field of view, the tracking window expands accordingly, preventing loss of focus due to rapid motion or perspective shifts.

However, the researchers discovered through experimentation that the initial window size used in the CAMshift process significantly affects tracking stability. If too small, fast-moving objects may escape the frame before the algorithm can reacquire them. Conversely, if too large, the window risks capturing nearby distractors—such as another ball of similar color—leading to merged detections and misidentification. After extensive testing, the optimal expansion factor was found to be between 1.2 and 1.5 times the original window size, striking a balance between responsiveness and selectivity.

Once the robot locks onto a target, it navigates toward the object using its omnidirectional wheel configuration. Mounted in a circular arrangement beneath the chassis, these wheels enable lateral movement, rotation in place, and smooth trajectory adjustments—maneuvers that would be difficult or impossible with conventional two-wheel drive systems. This agility allows the robot to approach the ball from any angle, minimizing path length and conserving time during repeated sorting cycles.

Upon reaching the target, a custom-built mechanical arm equipped with a fully integrated gripper engages the ball. Unlike modular claw designs, this integrated structure—emphasized in the competition rules—ensures mechanical simplicity and durability.Driven by a high-torque servo motor, the gear-linked fingers close around the ball with sufficient force to lift it without deformation. A secondary articulation joint enables the arm to tilt upward, allowing the robot to reposition the object for verification or transfer.

Color verification is a crucial step in ensuring sorting accuracy. After picking up a ball, the robot performs a secondary check by directing the camera back toward the held object. The main controller sends the expected position of the ball relative to the gripper, and the vision system evaluates whether the pixel values in that region match the intended color profile. This redundancy reduces error rates caused by misclassification during the initial scan, particularly under uneven lighting or when balls are partially occluded.

If the verification passes, the robot proceeds to the next phase: locating the designated drop zone. The same vision system that identifies balls is repurposed to detect large colored areas on the ground—specifically, pink collection bins and white launch zones marked on the competition floor. By applying the same color segmentation and contour analysis techniques, the robot determines the spatial coordinates of these zones and aligns itself accordingly.

Rather than simply dropping the ball, the robot employs a unique launching mechanism consisting of two counter-rotating friction wheels set at an inclined angle. Depending on the speed differential between the wheels, the ball can be propelled forward with variable force and trajectory, enabling precise delivery into elevated containers. This feature eliminates the need for the robot to physically enter the collection area, reducing the risk of collision and increasing operational efficiency.

Throughout the entire mission cycle—from detection to delivery—the robot operates in complete autonomy. No remote commands or external data feeds are permitted during operation, as per the competition guidelines. All decisions are made onboard using sensor input and pre-programmed logic, making the system a true example of edge computing in robotics.

Beyond its functional capabilities, the robot incorporates a real-time monitoring system that enhances debugging and performance evaluation. Using Wi-Fi connectivity, the Raspberry Pi hosts a lightweight web server built on the Flask framework, allowing nearby devices to access a live dashboard through any standard web browser. This interface displays the current video feed with overlaid detection markers, system status indicators, and coordinate data, providing engineers with immediate feedback during field tests.

This remote visualization capability does not interfere with the robot’s autonomous operation, as it runs on a separate network stack and does not rely on cloud processing. Instead, image frames are encoded and served locally within the same subnet, ensuring low latency and uninterrupted service even in environments with limited internet access. The implementation serves as a model for how diagnostic tools can coexist with real-time control systems without compromising performance.

During experimental trials, the robot was subjected to repeated sorting tasks under conditions mirroring the original competition setup. In timed runs lasting two minutes, it successfully retrieved and deposited an average of 15 to 17 balls per session, with error rates consistently below 3%. This translates to a processing speed of approximately one object every 7.64 seconds—an impressive throughput given the complexity of visual search, navigation, and precision launching.

Notably, the system maintained high accuracy even when multiple objects of similar color were present in close proximity. The combination of selective color filtering, geometric validation, and adaptive tracking minimized confusion between adjacent targets, a common failure point in less sophisticated vision systems.

The success of this project underscores the growing accessibility of advanced robotics technologies. By leveraging affordable, off-the-shelf components—such as the Raspberry Pi, USB cameras, and open-source software libraries—the team achieved performance levels typically associated with far more expensive industrial equipment. This democratization of automation tools opens new possibilities for educational institutions, startups, and small manufacturers seeking to innovate without massive capital investment.

Moreover, the modular nature of the system suggests numerous avenues for future enhancement. For instance, integrating depth sensing via stereo cameras or time-of-flight sensors could allow the robot to estimate distances more accurately, further improving navigation and manipulation. Machine learning models trained to recognize non-spherical or irregularly shaped objects could expand its utility beyond simple sorting tasks.

Another potential upgrade involves multi-robot coordination. If multiple units like this one were deployed in parallel, they could cover larger areas and distribute workloads dynamically. With appropriate communication protocols, such a swarm could adapt to changing conditions, avoid collisions, and optimize task allocation—features increasingly relevant in modern warehouse automation.

From an educational standpoint, the project exemplifies the value of hands-on, interdisciplinary learning. The development team included students from both electronic information engineering and mechanical power engineering disciplines, reflecting the collaborative nature of real-world engineering challenges. The integration of hardware design, control theory, computer vision, and network programming provided a holistic experience that goes beyond textbook knowledge.

Faculty advisor Derong Li emphasized the importance of bridging theoretical concepts with practical implementation. “What sets this project apart is not just the technical outcome, but the depth of understanding gained through iterative testing and refinement,” he noted. “Students had to confront real-world variables—lighting inconsistencies, mechanical backlash, signal latency—and develop solutions grounded in physics and mathematics.”

Indeed, the journey from concept to working prototype involved numerous iterations. Early versions struggled with inconsistent color detection under fluorescent lighting, prompting the switch from RGB to HSV space. Initial attempts at object tracking using optical flow failed under rapid motion, leading to the adoption of CAMshift. Even the mechanical gripper underwent several redesigns to ensure reliable pickup without excessive power consumption.

These challenges, while demanding, contributed to a deeper mastery of system integration—the art of making disparate technologies work together seamlessly. As automation continues to reshape industries, the ability to troubleshoot and optimize complex systems will become an increasingly vital skill.

The broader implications of this research extend into the realm of Industry 4.0, where smart machines are expected to communicate, learn, and make decisions with minimal human oversight. While this robot does not employ artificial intelligence in the form of neural networks or deep learning, its rule-based autonomy and sensor-driven behavior represent a foundational step toward more intelligent systems.

In manufacturing environments, similar robots could be deployed for inbound inspection, where raw materials are sorted by type or quality before entering production lines. In pharmaceutical settings, they might handle vials or capsules according to labeling or color coding. In recycling facilities, such systems could separate plastics, metals, or paper based on visual signatures—reducing contamination and improving yield.

Unlike fixed automation systems that require extensive reprogramming for new tasks, this robot’s vision-based approach offers inherent flexibility. By simply adjusting color thresholds or detection parameters, operators can reconfigure the system for different object types or sorting criteria. This adaptability makes it particularly suitable for small-batch or custom manufacturing scenarios where changeovers are frequent.

Security and privacy considerations were also addressed in the design. Since all data processing occurs locally and no internet connection is required for operation, the system avoids many of the vulnerabilities associated with cloud-dependent devices. The optional Wi-Fi monitoring feature can be disabled when not needed, preserving network integrity and minimizing attack surface.

Energy efficiency was another focus. The STM32 microcontroller operates in low-power modes when idle, waking only when triggered by sensor input or internal timers. The Raspberry Pi, though more power-hungry, enters sleep states during inactive periods. Together, these strategies extend battery life and support longer operational cycles—important factors for mobile robots.

Looking ahead, the research team plans to explore integration with robotic arms featuring six degrees of freedom, enabling more complex manipulation tasks. They are also investigating the use of semantic segmentation techniques to interpret entire scenes rather than isolated objects, potentially allowing the robot to understand spatial relationships and context.

The project has already received recognition beyond the laboratory. At the 2019 Guangdong Provincial Engineering Students’ Comprehensive Skills Competition, the robot earned a first-place award, validating its technical merits in a rigorous, peer-reviewed environment. Its publication in Computer Technology and Development with DOI 10.3969/j.issn.1673-629X.2021.03.034 ensures that the methodology is accessible to other researchers and practitioners.

As industries continue to seek ways to enhance productivity while reducing labor costs, autonomous sorting systems like this one will play an increasingly important role. What began as a student competition entry has evolved into a proof-of-concept with tangible real-world relevance. It demonstrates that innovation does not always require billion-dollar labs or proprietary algorithms—sometimes, it emerges from a garage, a handful of sensors, and a deep commitment to problem-solving.

The work of Zhipeng Chen, Guanglai Xu, Xianyang Chen, Yunlyu Chen, Ximing Chen, and Derong Li at Guangdong Ocean University stands as a testament to the power of applied engineering education and open technology ecosystems. Their robot may have started as a tool for a contest, but its principles may well influence the next generation of automated solutions.

Smart Robot Uses Vision AI to Automate Sorting Tasks by Zhipeng Chen, Guanglai Xu, Xianyang Chen, Yunlyu Chen, Ximing Chen, and Derong Li from Guangdong Ocean University, published in Computer Technology and Development, DOI: 10.3969/j.issn.1673-629X.2021.03.034