Something interesting always starts with something very stupid
Here's a list of features I planned for the robot originally
- Cute face with emotions and reactions like Tamagotchi or M5 Stack
- Talks jokes
- Finds the user's face with a camera or IR sensor
- Detects emotions
- Keeps the conversation going
- Recognizes speech (records and transcribes conversations)
- Voice generation
- Simple movements like waving or driving on a flat surface
- Tracks user’s tasks, shows reminders
But then I estimated that with current advances in Machine Learning, it will not be possible to put all these features (read: AI models) on a low power, low cost <$100 board. That's why I started reviewing my idea.
Then I left only one feature on the table—cute. Which means that I need to work on face animations. In my notes, I had a plan that looked very simple:
- Design 3D body
- Manufacture PCB for motor/power/sensor board
- Order components: servo, DC, brushless motors, sensors, battery, logic board
- Face animations on LCD
A simple and short plan doesn’t mean that it’s fast and complete, it can miss some important time-consuming aspects. I just have a hunch that my plan has stuck on the first step because “designing a 3D model” without previous experience is not a task for one evening.
Designing capabilities
The realization that my original list of ideas has undergone scope creep made me sad, and I asked myself, What do people find interesting about robots?
In case the robot is just a toy, then probably
- emotional connection
- interactivity
- technological fascination
- novelty
People would love to explore the capabilities of robots. I guess many successful products are trying to provide as much as possible different capabilities and combinations. But is there a way to make designs that offer an unlimited amount of capability combinations? This is what we should keep in mind:
- modular design
- programmability (like Arduino or Raspberry Pi)
- open to new functionality or hackable (like dog robots or radio transmitter gadget Flipper Zero)
- integrations
- interconnectivity (unlock functionality with more than one robot)
- adaptable to the environment (with the help of machine learning)
Encourage customizing robots with accessories or tools. Also, it's feasible to make a series of models that will stimulate people to get the complete collection. For example, I will do different robot designs: one will look like a Viking, another like a cat. But what collective activities will this imply?
Viking set should be continued with different historical eras - medieval knights, ancient Egyptians, and samurai. Other collections will be mythical creatures, nature-inspired, and steampunk.
One weird concept was about an eco-friendly assistant. About tracking challenges, such as reducing water usage or trying meatless meals, helping with waste management and giving composting and recycling tips, and providing gardening tips and reminders for planting, watering, and maintaining a sustainable garden. Although the gardening part is not novel, I will start with making a smart plant pot (like this one). The rest may come later.
Also, the following reasoning will be for the future, when I’ll be ready for outside offroad traveling.
Precision farming
Did you know that diverse, well integraed farms support significantly more native biodiversity than big fields of monocultures? As plant diversity and habitat decline, animal species diminish too.
Illustration from the article Bringing diversity back to agriculture: Smaller fields and non-crop elements enhance biodiversity in intensively managed arable farmlands by Martin Šálek
Precision farming involves targeted and data-driven agricultural practices. The robot can use its sensors and data processing to apply water or fertilizer precisely where and when needed, to selectively remove weeds, minimizing waste and environmental impact. And it can easily alter between different species that grow mixed in more natural habitats.
Let’s consider an agricultural robot that is focused on the following tasks: sow, water the fields, locate and remove weeds, and harvest. We start with the basics: how does it know when to do all these tasks?
There’s an interesting technique—transfer learning. We train the robot on potatoes, and it will figure out how to do similar species like tomatoes and strawberries and such. With trial and error, of course, but that’s where the following techniques will help.
- scheduling based on agricultural best practices based on a crop type, growth stage, weather conditions, and regional guidelines
- sensor-based monitoring such as soil moisture, temperature, humidity, and light
- computer vision to identify the growth stage and health by analyzing leaf size, color, fruit development
- machine learning to find patterns in historical data
So how this transfer learning is implemented is a question. The knowledge every agronomist must know is semantic knowledge and we need to find a way to use it in neural networks. Take transformer architecture, train on a broad knowledge, fine-tune on agriculturist’s expertise. Technically LLMs are neural networks, thus we converted semantic data into neural features.
Another question that bothers me is what training method to choose for Transformers. This, of course, relates to other neural networks as well. The gradient descent is a very slow process. Spiking networks are more efficient and less resource intensive. But how to apply it to Transformers?
And while we theorize about training techniques, let's focus on methods that do not require thousands of training patterns. So how about one-shot training? Is there a method that will not require data augmentation and, in the case of one-shot learning, the training process will end after exactly one sample been presented to the model?
Metric Learning techniques attempt to create embeddings that bring similar examples closer together in the embedding space while pushing dissimilar examples further apart. For example, using the Triplet Loss. We will need an anchor sample, positive and negative sample.
Memory-Augmented Neural Networks (MANN) use memory to store information from previously seen examples and then access this stored knowledge to make predictions on new, unseen examples, even with limited data. One popular memory-augmented neural network architecture is the Neural Turing Machine introduced by Alex Graves et al. in 2014.
No matter what will be your killing feature, don't forget about these steps:
- Find a gap in the market
- Focus on immersive and engaging user experience
- Make it simple, simple even for casual users
- Plan for content updates, replayability
- Challenges and achievements, leaderboard (not always applicable)
- Build with quality materials (sometimes just this brings joy)
- Gather early feedback
- Do not forget about a solid marketing strategy. The goal must be clear
Robotics is a colander, gaps are everywhere. I want to make some type of assistant or companion that is emotionally connected to the user. I plan to spend as little as possible on movements and navigation in the first version. Since it will not have speech recognition, or natural language processing, or text generation, the communication interface should be simple, yet intuitive. I’m thinking about a few buttons or one touch area. But I will need to discuss UX in future posts, and plan if I want to implement such inputs as pat, knock, tap, and touch. Will it be clear what they all mean? How to clearly distinguish between them without a handbook?