Right now, most global mapping and positioning technology has a few weaknesses. For starters, most are two dimensional. And even the most advanced system usually can’t pinpoint your location to anything more precise than a few meters – and that’s only when you’re connected.
In 2014, Fantasmo set out to reinvent how we approach global positioning. By using camera positioning technology, they were able to build real-time semantic 3-D maps of cities using images from any camera. And now, thanks to an innovative new project using machine learning, they can do it without any kind of connection.
“There are all these verticals with the same need for very precise 3-D maps and accurate positioning technology,” says Ryan Measel, Fantasmo CEO and co-founder. “Micromobility, same day logistics, accessibility, autonomy, augmented reality. We see computer vision as this step to enable this new functionality for all these platforms and enable these verticals to enter into our society.”
When the Fantasmo team was approached by a scooter company that wanted to use this new technology to help them solve a compliance issue, Fantasmo jumped onboard. But there was one caveat: they needed to find a way to put their complex algorithm onto a chip that wouldn’t require connectivity to operate.
"I think the benefit with Tribe is they’re so experienced in the field, they can come in and very quickly assess what needs to be done and start making progress. I think that’s something really special.”
The scooter revolution arrived in cities all over the world. Dockless, electric, and wildly convenient scooters that were all controlled from an app, so users could hop on and hop off wherever they wanted. Unfortunately, the result was also scooters littering sidewalks and creating hazards for pedestrians.
It wasn’t long before cities cracked down. “Cities are creating regulations to enforce that scooters can only park in certain areas and stay off sidewalks,” says Ryan. “So micromobility companies – scooters, ebikes, mopeds – needed a way to enforce compliance. The current technology couldn’t do it, so they came to us.”
Fantasmo already had the technology to collect georeferenced imagery from cities and build that into semantic 3-D maps. They also had a positioning solution that could use images from any camera – say, a mobile device – to give users a hyper accurate position and orientation for that image. “So we could say this photo is at this latitude and that longitude and, hey, that corresponds to a sidewalk,” says Ryan.
Fantasmo also had a cloud API service to provide this functionality to customers, but they wanted to take their offering one step further.
“What was really interesting to our customers was the ability for this entire process to run directly on their hardware platforms,” says Ryan. “So they don’t have to rely on a data connection or bandwidth limitation to be sending this data back and forth over a cellular link.”
Fantasmo saw the opportunity to take the algorithm they’d created and optimize it to run in an embedded environment that could be deployed in the field. And this kind of technology would be applicable across everything from sidewalk robots to augmented reality. But first they had to find the right engineer for the job.
“Beyond any kind of optimization – at first, it was really about, can we get this model to run on our target architecture at all?” says Ryan. “We knew we needed a specific skill set to make that happen. And hiring for it was going to take time.”
So the Fantasmo team explored other options. “Tribe was able to look inside their network and find someone with a highly relevant skill set and field expertise to attack this problem,” says Ryan.
“Beyond any kind of optimization – at first, it was really about, can we get this model to run on our target architecture at all? We knew we needed a specific skill set to make that happen.”
Shalom, the Tribe machine learning engineer and researcher on the project, already had experience enrolling a machine learning model on a small device in an environment that had to create a seamless experience for the user. One of the first hires at Inokyo, a Y-Combinator backed Amazon Go competitor, Shalom had worked on using cameras to track shoppers in order to create a seamless in-store experience.
“Each camera had to know who they were looking at, correlating all the different angles across the entire time the shopper was in the store,” says Shalom. “We had to be able to draw the path of the person and create a seamless experience. The experience is simple, but the technology is hair raising. If a single camera gets misadjusted by a tenth of an inch, we had to make changes to the numbers of how we calibrate every other camera.”
For Shalom and the Fantasmo team, the first challenge was to get the machine learning algorithm to run on the chip.
“The team wanted to use a Luxonis chip which would reduce costs by around 75%, but came with certain challenges. The chip itself is very low power and has limited capabilities,” says Shalom. “It’s less capable than an iphone’s processor. But the model is huge. So we had to look for tricks and methods to reduce the load. I had to try different approaches to get it to work.”
The other challenge was the model was based on a machine learning paper from 2019, but Fantasmo was the first team to actually put it into practice. “There was a small bit of code, but really we had to innovate completely to fit the model on the chip,” says Shalom. “There were no ready-to-use solutions in the literature or on the internet.”
“The model wasn't easily trainable,” says Shalom. “So I built something to help the Fantasmo team train the model. There’s nuances in the way you train it that makes it very fragile, which is unusual. It needed an intuitive understanding of computer vision models and the model’s unique loss function to make it work.”
From there, Shalom made a different architectured version of the model that drastically reduced the compute and memory needed to run the model. “It was really about unblocking their team and getting them to a point where they could take it from there,” says Shalom.
“We really felt Shalom was an extension of the team,” says Ryan. “Tribe’s ability to quickly identify the needs and personnel that can fulfill those needs is hugely impactful. Because, when you look at other contracting firms or options for bringing on engineering support, oftentimes you get a long start up phase where you’re not sure they’re the right fit. Or a really long onboarding process before they’re doing meaningful work.”
“I think the benefit with Tribe is they’re so experienced in the field, they can come in and very quickly assess what needs to be done and start making progress,” says Ryan. “I think that’s something really special.”
The result is a state-of-the-art product that’s new for the industry and positioned to change the way multiple industries approach global positioning, opening up opportunities across verticals.
“We set out to accomplish what we were looking for and even more than that,” says Ryan.
Shalom and the team were able to take the machine learning model from 15MB to 0.6MB – reducing the memory footprint by 96%. That, combined with making it OpenVino compatible reduced compute costs by 72% and sped up the model 3.6 fold over the original.
“Shalom’s expertise in the space was leveraged greatly to identify areas that we hadn’t even considered yet. So we were able to learn about the system and make progress beyond what we’d initially hoped.”
“There’s really been nothing like it,” says Ryan. “An infrastructure-less, connection-less global positioning solution in a box. We’re very excited to see how the market is going to receive it as we get ready to launch it this year.”
“There’s really been nothing like it. An infrastructure-less, connection-less global positioning solution in a box. We’re very excited to see how the market is going to receive it.”