Amazon Robotics Challenge 2017, Nagoya Japan: "Respect CARTMAN'S Authoritah!"

The winning CARTMAN and his Australian Centre for Robotic Vision creators: a low cost, custom made solution that was built to win…and did. But as the basis for a real world mechanical design for an Amazon FC robot picker? Not likely with their current warehouse layout…IMAGE COURTESY OF ACRV.


I have to admit, the win by the Australian Centre for Robotic Vision’s CARTMAN (CARTesian MANipulator) took more than a few observers by surprise…including this writer: the writing was on the wall for a very quick exit from the competition. CARTMAN found his first few days plagued with major malfunctions and a relatively low score during the qualification rounds – hardly the signs of a surefire a winner. But it all came together when it counted and demonstrated that the best approach to the competition was to purpose build a solution to the exact problem from scratch, rather than take an existing apparatus and try to make it do something it was not optimally designed for – the most common approach adopted by other teams. And it also demonstrated it’s possible to do it all on a shoestring budget…with the help of some well-placed cable ties and the very liberal application of duct tape, the engineers best friend.

But there was more to it than that - one of the most important competitions in the world that addresses a perplexing issue of AI - robotic vision. Below, you’ll find a little more about the background to the competition, the current state of robotics and its Artificial Intelligence branches, Amazon’s thinking and what the future holds in the field.


The inaugural Amazon Robotics Challenge (first called the Amazon Picking Challenge) was held in 2015 in Seattle, WA, as part of the International Conference on Robotics and Automation. It was set up to encourage expert teams to develop the technologies that would lead to practical benefits for Amazon Fulfilment Centers (warehouses), as well as act as a talent scouting ground for Amazon Robotics itself. It focuses on resolving one of the most challenging real world tasks for robotics: that of identifying, sorting, picking and stowing various objects in a warehouse.

The competition is now held in sync with Robocup, an international Robotics competition founded in 1997 – which first took the form of a Robotic Soccer match. But since then, the organizers have recognized the need for broader real world appeal (as if soccer didn’t have enough on its own!), and have created various leagues to focus efforts on resolving real world challenges where robotics could have real positive impact. These are broadly classed as:

RobocupSoccer:                The “classic” challenge with a soccer game played between autonomous robots on a 9x6m field.
Robocup@Home:            Robots making lives easier in a domestic setting - a big issue in Japan with its rapidly aging average population (and increasingly in many advanced OECD nations).
RoboCupIndustrial:         Exploring how robots can work in smarter factories, including a logistics competition which focuses on movement of Work In Progress goods between different machines.
RoboCup Rescue:             Emergency Service robots designs going through obstacles that simulate earthquake after effects – again of great practical need to the Japanese.

And of course, the Amazon Robotics Challenge itself, which was carefully watched by a number of players not just in the field, but in the supply chain, retail and investment communities...and the odd consultant.


The word robot comes from the old Slavic word robota, originally implying low level manual labour, in some cases even forced servitude. That has indeed been the reality of the robotics field for many decades: the automation of repetitive, manual type tasks. By contrast, the robotics depicted by the entertainment industry has focused on technologies, capabilities and  - of course – styles far in advance of the reality of the technologies available (from “False Maria” in Metropolis to Bishop in Alien to TARS of Interstellar). However, the reality of Amazon’s business needs has pushed the concept of robotics to new demands.

Robotics itself is a term that covers a vast array of enabling technologies, something perhaps not yet fully appreciated by the supply chain – and broader - community: robotics is an expansive  interdisciplinary collection of disparate fields. They include mechanical engineering, electrical engineering and computer science as well as the dozens of sub-fields that each of these areas encompass.

Depending on the application, a robotics solution still requires various breakthroughs and levels of maturity, which must then be integrated into a workable, commercially viable system. One thing that was clear during the competition is that systems integration remains the key, and is perhaps the greatest challenge of robotics. Individual technical breakthroughs are challenging enough, but getting an entire field of technologies to reach a certain level and then work together is what struck me as something that the very capable teams here – and far beyond - still need time and resources in excess to master.


Amazon’s success in upending traditional retail models is well known, as is the general understanding of its supply chain strengths being a key part of that success. To ensure that edge continues, Amazon’s purchase of Kiva Robotics in 2012 made it a leader in the automated warehouse field, and the soon renamed Amazon Robotics began to play a key role in shaping the future of logistics automation for the company.

Right now, Amazon’s 8th generation Fulfilment Center (warehouse) reflects the leading edge of robotics best practice, with the giant Robo-stow and the Kiva robots giving the company a real edge in efficiency (see here However, the fact that Amazon has been investing massively into Robotics, Artificial Intelligence and Supply Chain over the past few years has led some to conclude that its successful incorporation of all 3 fields has been responsible for its success.

The reality today, however, is a lot more mundane.

In truth, Amazon’s – indeed just about every company’s – utilization of robotics is focused on working around their current limitations as much as exploiting their capabilities. Thus far, robotics is limited to the movement of trays and pallets at the Amazon FC. Using Artificial Intelligence at the warehouse level for picking, sorting and stowing remains a dream just out of reach. 

Whilst Robo-stow is an impressively powerful and dexterous arm, it has about as much true intelligence as an alarm clock. And the Kiva robots that scurry around carrying the sorting trays are little more than “A bunch of Roombas on steroids, except they suck less.”, as was so colourfully put by one anonymous Amazon executive.

The brilliance of the Amazon 8th generation fulfilment centre is its integration of existing technologies with human capabilities, courtesy of outstanding process design. It is a mastery of human and machine systems engineering, rather than Artificial Intelligence, which remains in its infancy in the robotics field.

The Amazon Robotics Challenge is in part designed to help Amazon take the next step in the evolution of the technology ahead of everyone else.


Let’s get this out the way now: CARTMAN is a brilliant way to win a contest and a tribute to the outstanding team behind it, but it has about as much chance of serving as an advanced prototype for a real world picking and stowing robot for an Amazon FC as a home built go-cart has of serving as a base for championship winning F1 car. CARTMAN was designed to a budget for a limited amount of weight and total movement lifespan. It was held together by tape, ties and hope. The concepts of Design for Manufacture/Assembly and Design for Reliability were non-existent in its execution. That said, the gantry crane design intrigued some of the Amazon executives and others: some experimentation is no doubt being mooted at present.

The ARC is a proof of concept competition that allows researchers to demonstrate new thinking and technologies in a competitive - but still very controlled - environment, one where (thus far) budgets have been measured in tens of thousands, rather than tens of millions. But the basic ideas, technologies and thinking can give direction as to where the industry is going. CARTMAN showed that the ideal design for an Amazon FC could be quite different to what most expect...though it may well be the case that the next generation of Amazon FCs could need to have very different configurations to take full advantage of the idea robotic picker design. It will be interesting to follow how Amazon configures its future warehouse designs, if it may adopt a different approach to what we have seen so far with the 8th generation FC.

Look beyond the hardware, of course it’s the software: the robotic vision CARTMAN employed and the algorithms that interpreted the data and made the choices to identify, pick and stow. By the standards of the competition (see below), CARTMAN was fast in identifying objects, though compared to a human picker it was about 25% the speed, at best.

Much work remains to be done.


In observing the teams and their entries over the 3 days, I was in two minds, the result of having two very different frames of reference, given my prior working experience. The part of me that comprehends the technologies associated with robotics (but certainly not to the level of the world class researchers and students who surrounded me) marvelled at how far the experts in the field have come over the past few years in having software and hardware come closer to mimicking the capabilities of people. But the other part of me, the former Lean Six Sigma program manager who had led the creation of countless process maps and time and motion studies of workers at dozens of warehouses around the world lamented at the sheer, mind numbing weakness of the robots compared to their human forebears: the typical Amazon FC worker is able to pick just over 200 items per hour with error rates below 2% (under idealized conditions). The better pickers push that to just over 250 hour with a sub 1% error rate, and the best pickers push that to 300 items per hours sustained, with an error rate that is around 2%.

The robots I saw, by contrast, took an eternity in object recognition, showed real hesitation in movement, painful slowness of articulation and “wing and a prayer” ways of gripping / picking / stowing objects (at times dropping / damaging them). Add to this the fact that transparent objects and certain arrangements of objects presented challenges that pushed the limits of our current understanding of AI, of algorithms and robotic vision to - and often beyond – the limit. 

The really tough challenge I witnessed was the wine glass and the full mineral water bottle: as easily as our human minds can differentiate the two, this remains a stumbling block for robotic vision: transparent objects confound the best software, which became obvious as the robotic arms crashed into the objects, misidentified objects around them and even when they identified an object, they could not pick it up correctly. With a bottle of mineral water, I was advised that even when the object is detected, there is uncertainty as to if it is one whole object, two separate objects separated by the label or if the label portion is the object. Potential fixes include ultrasonic sensors, stereoscopic vision and UV light. But so far, it is early days to get a reliable solution.

Whilst progress has been remarkable over the past few years overall, we have a long way to go – many breakthroughs are still needed across the technological value stream before the fully automated warehouse becomes even remotely feasible for Amazon and its vast product array.


There were 16 teams in total from around the world at the Nagoya International Exhibition Hall. They were mainly University affiliated (graduate and post graduate level), but with some big commercial teams too, such as Applied Robotics and Panasonic. The teams are listed here

The details of what the teams had to contend with can be found at In summary, the rules mean a competitive environment much closer to the reality of an Amazon FC than in the past 2 years…but still, it is far more controlled than what robots would experience if they were dumped into the real world of an Amazon FC. Especially important to note, the teams were allowed to photograph new items 30 minutes before the start of competition. In the real world of an FC, most items passing through would be unknown in terms of size and shape to the same level of detail, or there would be changes in products and packing with no warning in many cases. Those of you in logistics know well the issues in a warehouse: picking, sorting, stowing, packing and moving an ever changing array of near infinitely variable objects in a dynamic environment of varying noise, lighting, obstacles and layout. It’s at times tricky for humans – for robotics, it’s at another level.

This year, Amazon increased the prize money (which perhaps was a bit less than adequate in the past) to USD 250,000 total prize pool with USD 80,000 for the winner. This certainly led to more genuine enthusiasm by all participants - especially some of the less than generously funded University teams...

In quick summary:

Day 1 was for practice rounds only…and just as well, as almost half the teams failed to score any points: machines simply did not work, acted strangely, dropped objects…you name it. But there were two bright spots: a very strong initial performance by the NimbRo team from the University of Bonn and the University of Nanyang from Singapore: the latter had some very impressive algorithms  to identify objects correctly amidst “clutter” (a background / foreground that made the objects challenging to identify), something that drew a lot of attention from the Amazon executives and other teams.

Day 2 was the Stow run: The joint MIT-Princeton team won the day with a solid performance that featured minimal mistakes, though slow speed. Nanyang University again demonstrated great promise with the days runner up award. The ACRV team scored 55 points, despite a fun technical mishap when CARTMAN went a little off the farm and decided to fire the first shots in The Robot Apocalypse by partially self-destructing. A lot of hard work, sweat and tape got him back up.

Day 3 was the Picking run. ACRV got their act together and score 150…but Nanyang stunned with 257, closely followed by NimbRo with 245, both way ahead of everyone else. Speed had shown some improvement, though there were a few drops that again highlighted just how challenging it is to beat the human eye, mind and hand. Most of the teams performed better but there were technical gremlins of various severity and form plaguing just about everyone, showing that this is indeed technology that is largely in the advanced Alpha build stage…

Day 4 was the Final round, where 8 teams had qualified. Simply, it all came together the best for ACRV: CARTMAN behaved (never underestimate the utility of duct tape, Red Bull and sleepless nights for Gen Ys) and showed just how well the team had developed their Robotic Vision solution. Nanyang and NimbRo were the runners up, performing well but just had a few too many drops and errors to win. But the potential of their solutions has been noted by all.


It was a privilege to meet and talk with many of the teams, the commentary on which I could write far more than most are willing to read. Some key observations:

  • Most of the teams bought standard industrial robot arms and relied on a combination software and optimized cameras (which was what ACRV did last year) to give them an advantage. Workable indeed…but not optimal for the contest. It is looking likely next year there will be a lot more customization of the solution done, assuming the conditions are similar to this year. All the teams I spoke to were keen to re-examine the mechanical movement side of the equation, given CARTMAN’s success.
  • Virtually everyone relied on a combination of vacuum suction and grippers to grab the objects. This is an area that all the teams told me that much work remains: picking up a stuffed plush toy, a soft packet of biscuits, a plastic bottle of mineral water, a crystal champagne flute, a packet of batteries or a softcover book vs. hardcover demands very different techniques and tools to ensure a reliable grip that neither damages the goods nor creates a risk of damage by dropping it accidentally – something that befell just about everyone at some point. The human mind, eyeball and hand and the way we grasp with it when it all works together is indeed a marvel that the brightest minds in the field still struggle to emulate.
  • This competition was MUCH closer than the final scores indicated: in the end, solid teamwork and a systems approach for ACRV got them over the line. In reality, the scores did not reflect how close many teams were, and how innovative a few approaches to AI and robotic vision were employed. Again, we come to the fact that this was an exercise in systems management: numerous separate concepts/ideas/technologies interacting to deliver outcomes in a fairly dynamic environment. However, it was pleasing to see that Amazon and other executives were looking at the individual technical strengths of all the teams.
  • In terms of robot articulation speed, we have great capability in that area now, even if it was not on display at the ARC. One team had individuals who had worked on a Staubli robot - demonstrated here  - that impressed. This clearly shows the speed of articulation possible, one that few humans could approach, and certainly none for any sustained period of time with any accuracy. But that alone is not enough. The Staubli robot has no real intelligence and its gripping solution only works on a very limited range of products.
  • The experts said that we should not hold out hope of a total “Eureka moment”: some miraculous, singular, transcendental breakthrough that somehow brings everything together. What now appears to be the winning combination is a fusion of technologies…and the hideously complex task of making them work together. There need to be developments across a chain of fields, which will happen over time, but there will need to be a lot of tireless work to turn them into a real solution.
  • A key focus needs to be sensor fusion (including stereoscopic [3D] vision, lasers for ranging, ultrasonics for shape confirmation and touch sensors) via Artificial Neural Networks, using a coding language which can fully leverage ANNs, giving the machine general intelligence and the ability to learn quickly, to infer from existing observations. General intelligence and automated reasoning are the ultimately desired capabilities being sought, though most of the teams stated that we could be further away from these goals than many think. I spoke to people at length about these technologies and the general message is the same: we need them to make this work, but they are proving far harder than we ever imagined.
  • There was comparative talk about the progress being made with driverless cars and on the road automation. It was surprising to hear many people who have work in Robotic Vision say that the challenges they face in getting a car to navigate a road safely pale into relative ease compared to getting an arm to pick up items in a tray.
  • The work of Google in the field was being discussed, as was that of IBM, Microsoft and other firms. But Google’s work was noted especially as being of a high order and of great interest to everyone in the competition.
  • This was especially intriguing: Amazon is exploring an entirely new area of knowledge focused around how humans and robots can optimally work together. It includes such aspects as task breakdown according to lean principles, safety standards and protocols, human psychology, learning and performance measurement. Whilst courses in this field are available from a number of Universities, Amazon feels its real world experience makes it an early expert whose knowledge is at the cutting edge.


Discussions with individuals familiar with the situation at Amazon FCs indicated that the ARC developments are in part designed to give the Amazon teams designing future generations of the Amazon FCs (in this case, the 9th generation) some idea of key features that will need to be incorporated into the layout of the FC to allow them to take advantage of the latest developments in robotics and associated technologies relevant to logistics.

Around 2014, it was expected that by 2018 when the 9th generation FC would be introduced, there would be significant developments in robotics that would permit closer interactions between humans and “fairly” autonomous robots. At present, this goal does not appear close to being met, at least not to the level initially expected. But the design of the FC proceeds. Thus, the 9th generation FC may represent less of a leap than was originally expected. There is apparently some debate happening in Amazon if it is worthwhile to delay introduction of a real 9th generation FC and wait until robotic technology allows for a real breakthrough that provides a clear advantage over the current generation. The outcome of this is debate not certain at this time.

Whatever the case may be, warehouse workers should breathe easier. They are not an endangered species.

Not even close, and for some time yet. Whilst many may dismiss Amazon’s claims that robots and humans work together better as simply sugar coating and placating people to the fact their jobs will eventually be done by robots, the reality is that Amazon has gone on a hiring spree for warehouse workers as full time employees, not temporary contractors. Eventually, this may cease…but the key word is “eventually”. And that could mean another decade away. Despite the incredible collection of intellect I have seen in robotics, we are yet to come close to replicating those skills that are needed in commercially viable, real world environment of picking and stowing. Whilst an official answer will not be provided, there is evidence that Amazon is moving towards a hybrid future of humans and robots working alongside each other in the FCs for much longer than originally thought required.

In the end, what happened at the ARC this year is a step towards a destination that is as inevitable as the timeline remains unpredictable. Of course, it is not something that will happen all at once , a “Day 0” where staff find out they have all been replaced by robots. It will be the case where robots will over time undertake more and more roles formally done by people. As to what happens to those people, that is a question that cannot be answered for some time to come.


Whilst I cannot readily see Amazon even start fully automating their warehouses for at least another seven years – and likely much longer – the process has commenced. It will, however, be a slow take-up…and in the meantime, there will be many more opportunities for humans to benefit by working with robots. The ARC demonstrated both the progress made – impressive in many ways – the limitations that still exist and the challenge that lies ahead…and what is will likely be a monumentally costly amount of research and development needed. The race to robotic automation is a marathon, not a sprint, with a timeline that will take longer – and involve far more cost and disappointment – than perhaps many envisaged a few years ago.

But what was clear is this: humans are still incredibly capable in many tasks that one would expect robotics should reign supreme. The human link between the eye, the brain and the hand offers system integration that our current best technologies are still struggling to approach, despite the large number of brilliant people working on the problem. The need for true general intelligence, automated reasoning, deep learning and a myriad of other breakthroughs that will likely depend on Artificial Neural Networks was made clear watching these robots try to perform tasks that human children could easily master. It will require time and huge financial resources, something that Amazon Robotics is already proving it is willing to do… as are many others.

But for now, Amazon Fulfillment Centers function at the limit of our best application of the technology, and the ARC provides a significant boost to help push the technology along just a bit faster – 2018 will be fascinating to watch.