Tuesday 15 December Meeting — Knowledge Transfer Between Robotic Systems — Ndivhuwo Makondo

This week Ndivhuwo Makondo from the Tokyo Institute of Technology presented. This was our last meeting of the year.

Abstract: Learning of robot kinematic and dynamic models from data has attracted much interest recently as an alternative to manually defined models. However, the amount of data required to learn these models becomes large when the number of degrees of freedom increases and collecting it can be a time-intensive process. We employ transfer learning techniques in order to speed up learning of robot models, by using additional data obtained from other robots. We propose a method for approximating non-linear mappings between manifolds, which we call Local Procrustes Analysis (LPA), by adopting and extending the linear Procrustes Analysis method. Experimental results indicate that the proposed method offers an accurate transfer of data and significantly improves learning of the forward kinematics model. Furthermore, it allows learning a global mapping between two robots that can be used to successfully transfer trajectories.


 

You can download his slides here.

Tuesday 17 November Meeting — The Mad Hatters — Jonathan Kariv

Jonathan Kariv presented during last week’s meeting.

Abstract: Hats games are a common type of recreational mathematics puzzle. They involve some set of players each of whom has a coloured hat (or sometimes hats) placed on his head. This hat is visible to certain other players. Each player is then required to guess the colour of the hat on his own head. These puzzles vary widely in terms of their rules.  In 2011 Tania Khovanova and Lionel Levine introduced some research-level versions of these problems. In this talk I will discuss both the standard puzzles and the new ones. In particular I will give some upper and lower bounds on the best case strategies in the new cases. I will also give some insight into how the ideas in the traditional puzzles translate into the new ones. Finally I will give some ideas for future work. Joint work with Clint van Alten and Dymtro Yeroshikin.


 

You can download the slides here.

We continue next week, on the 1st of December with our penultimate group meeting of the year.

Tuesday 3 November Meeting — Accelerating Decision Making Under Partial Observability Using Learned Action Priors — Ntokozo Jay Mabena

Hi guys,

We resumed our meetings after a 6 week break with a talk by Ntokozo Jay Mabena.

Abstract: Operational uncertainty largely affects the peformance of a reliable robot in real and complex environments. Identifying operational uncertainty as a significant problem, we consider motion planning in uncertain and dynamic environments as an essential capability for autonomous robots. Partially Observable Markov Decision Processes (POMDPs) provide a principled mathematical framework allowing a robot to reason about the consequences of actions and observations with respect to the agent’s perception of the environment. They solve the problem of robot motion planning and allow an agent to plan and act optimally in uncertain and dynamic environments, an essential capability for any autonomous robot. Although they have been successfully applied to various robotic tasks, they are often disregarded in the field of robotics due to their high computational complexity. The intractability of these algorithms is to a large degree a result of the computation of an exact, optimal policy over the entire belief space. Often, computation of an optimal policy over the entire belief space is not necessary for satisfactory robot control and finding a valuable approximation of the optimal value function for only a subspace of the belief space can be less computationally expensive than computing the full value function. The Successive Approximations of the Reachable Space under Optimal Policies (SARSOP) algorithm, a point based POMDP algorithm, takes advantage of this fact by sampling only the optimally reachable belief space from an initial belief point. We consider this algorithm along with action priors that express the usefulness of each action in each state, because by considering the statistics of action choices over an agent’s operational life cycle, we obtain restrictions to search that may not have been obvious if each task was solved in isolation. Our goal is to show how a mobile autonomous robot can benefit through the use of action priors in a setting where it is uncertain about the current state of the world. We aim to demonstrate the advantages of incorporating this prior knowledge over actions into the SARSOP algorithm by showing that action priors can lead to algorithm performance improvements. In achieving this, experiments will be conducted in a maze domain in which an agent will be required to solve various navigation tasks. We will compare the results of solving these tasks with and without action priors to illustrate the performance differences of these two approaches.


Here are the slides from Ntokozo’s presentation.

Jonathan Kariv will be presenting at our next meeting, on the 17th of November.

Tuesday 22 September Meeting — Adaptive Knowledge Injection for Monte Carlo Tree Search for Imperfect Information Games — Jeremy Lai Hong

Hey all,

This week we had a presentation by Jeremy Lai Hong. Here’s the abstract of his talk:
While agents for large perfect information games such Chess and Checkers have approached expert level play, agents in large imperfect information games such as full scale poker are yet to exceed amateur level. This is largely due to the uncertainty present in imperfect information games. In some games such as Skat, Bridge and Magic: The Gathering (MTG), the uncertainty of a game naturally reduces as the game progresses. This property is known as disambiguation. The proposed research aims to
use this disambiguation, along with prior knowledge, in order to create strong agents for imperfect information games. We identify MTG as an interesting testbed for research, as it has several properties, namely: a large amount of uncertainty, a high branching factor and disambiguation. This talk details the proposed research and discusses related work that might provide useful insights into completing the proposed research. Results from the preliminary phases of the research have been obtained, showing strong correlations between particular groups of cards in MTG.


Here are the slides from Jeremy’s presentation.

The next iteration of this meeting will only take place in 6 weeks’ time, on the 3rd of November. Ntokozo Mabena will be presenting.

Tuesday 8 September Meeting — Regularized Feature Selection in Reinforcement Learning — Benjamin Rosman

Hi guys,

This week’s presentation was given by Dr. Benjamin Rosman. Here’s the abstract:

Any long-lived autonomous agent faced with a changing environment can be made more effective via learning. However, learning how to act is a slow process, which risks exposing the agent to harm. Fortunately, a long-lived agent can ameliorate this problem by abstracting and reusing knowledge gained from prior learning experiences.

In this talk, I will first present a high-level overview of reinforcement learning, and introduce some of the important concepts and topics therein.

I will then discuss some of the work I’ve done on knowledge transfer. This will largely take the form of two questions. Firstly, given a set of previously learnt behaviours, what is the optimal way to select the best one to be re-used in a new environment or interaction? Secondly, how can an agent generalise from previous behaviours to solve new tasks in the same environment more quickly and with less risk? These approaches are presented in the context of reinforcement learning, but I will also discuss some preliminary results in extending them to other decision-making paradigms.


Here are the slides.

Jeremy Lai Hong will be presenting next, on the 22nd of September.

Tuesday 25 August Meeting — Regularized Feature Selection in Reinforcement Learning — Dean Wookey

Hi all,

This week’s presentation was given by Dean Wookey. Here’s his abstract:

We introduce feature regularization during feature selection for value function approximation. Feature regularization introduces a prior into the selection process, improving function approximation accuracy and reducing overfitting. We show that the smoothness prior is effective in the incremental feature selection setting and present closed-form smoothness regularizers for the Fourier and RBF bases. We present two methods for feature regularization which extend the temporal difference orthogonal matching pursuit (OMP-TD) algorithm and demonstrate the effectiveness of the smoothness prior; smooth Tikhonov OMP-TD and smoothness scaled OMP-TD. We compare these methods against OMP-TD, regularized OMP-TD and least squares TD with random projections, across six benchmark domains using two different types of basis functions.


You can download his slides here.

Dr. Benjamin Rosman will be presenting next, on the 8th of September.

Tuesday 11 August Meeting — Brain Computer Interface Technology – Are we there yet? — Phumlani Khoza

Hi all,

This week’s presentation was given by Phumlani Khoza. Here’s his abstract:

Decades of lesion-study based research have provided considerable amounts of information about the basic structure and functioning of the brain; but a general theory is not yet in sight. Armed with new technologies, we are now delving deeper into the structure of the brain and peering into its processes without being forced to call in the surgeon. While the search for the inner continues (and Trinity awaits the virtual return of THE ONE), engineering the outer is rapidly leaving the realm of metaphysical speculation.

We present work aimed towards development of an asynchronous electroencephalography (EEG) brain computer interface (BCI) for text input, using a non-clinical-grade EEG sensor. One of the applications of BCI technology is providing a mechanism for communicating with the outer environment in a manner that bypasses the standard neuro-muscular connection.

Although most of us would find communicating with the environment using this kind of technology to be simply cool. For other members of society, it promises to modify the boundary separating a socially secluded life from a socially
integrated life—come biology or injury.


You can download his slides here.

Dean Wookey will be presenting next on Tuesday the 25th of August.

Tuesday 28 July Meeting — Object Recognition Tutorial — Beatrice van Eden

Hi guys!

This week’s presentation was given by Beatrice van Eden. Here’s the abstract:

This talk will be presented as a tutorial on 2D object recognition, as a first step of my PhD work. After a brief introduction of my topic, the remainder of the tutorial will consist of three sections. Each of which will cover a different approach to object recognition. This will include a discussion of the features that were used in training, followed by some implementation steps and results on the trained classifiers.

The three training methods include Cascading Classifiers, Convolutional Neural Networks and Support Vector Machines. The Cascading classifiers used Haar-like features and Local Binary Patterns. These features look for texture and are more generally used for face recognition. Convolutional Neural Networks were then used for their ability to identify features in the training data. The Support Vector Machine used Histogram of Oriented Gradients.

This work forms the first step in my research into concept formation. This research will allow a robot to learn about its environment autonomously. By exposure to a kitchen setup and an office setup the robot will be expected to build a concept about these environments. The robot must ultimately recognise any of these environments, even if it has not seen that specific instance previously.


You can find Beatrice’s slides here. Phumlani Khoza will be presenting next, on Tuesday the 11th of August.

Tuesday 14 July Meeting — Music Genre Classification – Single Label Automatic Music Genre Classification — Ritesh Ajoodha

Hi, this week’s presentation was given by Ritesh Ajoodha.
Here’s the abstract:
In this work we use content-based features to perform automatic classification of music pieces into genres. We are primarily motivated by a desire to improve music information retrieval and recommendation services, allowing  users to download music and browse databases more naturally and effectively.
The major problem with genre is that it is a subjective concept; people may use information other than just the audio content itself to classify genre; some pieces may belong to multiple genres; and there are expected to be considerable  mislabellings in the available online music databases that may be used to train a classifier.
Content-based features, in this paper, are categorised into four groups: features extracted from the Fourier transform’s magnitude spectrum; features designed to inform on tempo; pitch-related features; and chordal features.
The optimal representation of each feature is explored: the work compares classification performance when a feature is represented by the mean and standard deviation of its distribution; by a histogram of various bin sizes; or by using mel-frequency cepstral coefficients.
Finally, the work uses information gain ranking to present a pruned feature vector used by six off-the-shelf classifiers. Logistic regression achieves the best performance with an 81% accuracy on 10 GTZAN genres.

You can find Ritesh’s slides here. The next meeting’s presentation will be given by Beatrice van Eden.

Tuesday 30 June Meeting — Demystifying deep learning: Learning dynamics in deep linear neural networks — Andrew M. Saxe

Hello!

This week’s presentation was given by Andrew M. Saxe.

Here’s the abstract:

Humans and other organisms show an incredibly sophisticated ability to learn about their environments during their lifetimes. This learning is thought to alter the strength of connections between neurons in the brain, but we still do not understand the principles linking synaptic changes at the neural level to behavioral changes at the psychological level. Part of the difficulty stems from depth: the brain has a deep, many-layered structure that substantially complicates the learning process.

To understand the specific impact of depth, I develop the theory of gradient descent learning in deep linear neural networks. Despite their linearity, the learning problem in these networks remains nonconvex and exhibits rich nonlinear learning dynamics. I give new exact solutions to the dynamics that quantitatively answer fundamental theoretical questions such as how learning speed changes with depth. These solutions revise the basic conceptual picture underlying deep learning systems–both engineered and biological–with ramifications for a variety of phenomena.

Finally I will talk about depth in the context of reinforcement learning. I shall argue that the usual MDP formulation underlying most RL systems is ill-suited to learning compositional hierarchies of actions, and suggest that the recently developed Linearly Solvable Markov Decision Process (LMDP) is a more promising alternative. I will show how the LMDP permits subtasks to be composed together optimally to perform novel tasks; how the goals of an agent or the causal structure of the world can be readily inferred from intentional behavior; and how skills may be learned efficiently. The LMDP framework breaks down the dichotomy between model-based and model-free reinforcement learning, offering a middle road with characteristics of each.


Here are the slides from Andrew’s presentation.

The next meeting will be held on Tuesday the 14th of July. Ritesh Ajoodha will be presenting on content based features for genre classification.