2004/12/28

Introduction to Yamashita Lab.

Publications

Address

Tokyo Institute of Technology Ookayama Campus
South 6th building: Room no. 209 (Server room), Room no. 210 (Experimental room), Room no. 217 (Yamashita), Room no. 218 (Students).

Catch copy of lab.

Creating knowledge and developing technology for finding path for human beings.

Idea for research

In order to solve various problems in the world, there is no other way to develop science and technology. Only science and technology can be a common culture for human beings. Even an alien has almost the same science and technology or its improved versions. Furthermore, since science and technology as the Pandora's box had been opened, in order to survive in future we behave rationally according to science and technology. Based on this idea, we are developing technologies to improve the welfare of human beings and studying to provide a new scientific way of thinking.

Research Themes

Based on the ideas, we are studying the following themes.

Pattern recognition
Image coding
Computer Architecture
Text Oriented Bi-Stream Explaining (TOBE) System

Now, we present explanations for the themes

Pattern recognition and learning

Recently, the number of digits of postal number in Japan is changed from 5 to 7. Its reason is that the recognition rate for recognizing Chinese characters is low. Since 7 digit postal number includes the information for the part of address written with Chinese characters, computers don't have to read Chinese characters and improve the automation of postal services. This example shows that the recognition accuracy for pattern recognition is not enough. In order to improve the recognition accuracy, we conduct the following themas

Extension of kernel method and its applications
normal distribution of manifold
relative principal component analysis

Extension of kernel method and its applications

In kernel methods, patterns are mapped nonlinearly to very high or infinite dimensional space and are recognized by applying a linear classification method to the mapped patterns. By this method, linear methods can be extended to nonlinear methods. Let

be such a nonlinear mapping. However, it is difficult to use

directly since the result of the mapping

is a very high dimension vector. However, only their inner product

is known with a kernel function

, we can calculate many kinds of discriminating functions in the very high dimensional space indirectly. The method by which we can transform calculations in very high dimensional space to those of a low dimensional space by the kernel function is called 'kernel trick'.

The traditional kernel method is given by the inner product of results of the same nonlinear mapping. In our laboratory, we extend it as the following formula.

We assume that the kernel function is given as the inter product of results of different nonlinear mappings. We call it 'extended kernel methods' and research on their applications to pattern recognition.

The following table shows the result of recognition experiment with standard test set by SVM with the ordinary kernel method and by SVM with the extended kernel method.

Method Error rate (Data: Banana) Error rate (Data: Diabetes) Error rate (Data: Thyroid)
SVM 11.53 23.53 4.80
Extended SVM 10.44 23.24 4.05

We can see advantages our methods.
Since the extended kernel method uses two nonlinear mappings, the number of parameters is increased and we have to research about this method.

Normal distribution on manifold

The normal distribution is widely used in statistical methods. For example, the distributions of measurement error and velocity of ideal gas are given by normal distributions. Furthermore, many distributions such as chi-square and F distributions are derived from normal distribution. By the way, what is a normal distribution? In many books, it is defined as the following probability density function.

Here, we introduce the characterizations of the normal distribution.

Characterization by C.F.Gauss
Assume that a random variable distributes around the true value . Its probability density function depends only on . Assume that samples are given independently. In this case we are happy if the maximum likelihood estimator is given by their mean. The distribution which satisfies this feature should be a normal distribution.
Maxwell distribution of ideal gas
Assume that pairs of random variablesand, andand satisfies the following relation. . Furthermore, if and are independent and andare also independent, the distribution of andshould be a isotropic normal distribution.
Maxwell-Boltzmann distribution
Assume that particles, which can be distinguished, are distributed into cells. We also assume that in order to put a particle into the cell we need energy . Even if the number of particle put into each box is fixed, we can consider the number of cases since particles are distinguished. The number is given as When we fix the total energy as as well as the number of particle and maximize the number of cases, the number of particles in the box is proportional to . Since the kinetic energy are proportional to the square of velocity, a normal distribution are derived.
Maximum entropy
A distribution that maximizes the entropy under the condition that the variance is fixed, a normal distribution is derived. This proof is similar to the one of Maxwell-Boltzmann distribution.
Central limit theorem
For any distribution, we assume that its samples can be given independently. Their mean divided the square root of variance converges to a normal distribution of which variance is one as the number of samples is increased.

We will extend such normal distributions to those in a manifold. As for the method by Gauss and the central limit theorem, we have to define the mean. However, it is difficult to define it in a manifold. Furthermore, for spherical surface the average by using the distance on spherical surface does not provide a suitable solution. As for the method by Maxwell Boltzmann and the maximum entropy We have to define the variance of a distribution. It is also difficult in a manifold. Thus, we are extending the definition of Maxwell distribution to one of a normal distribution in a manifold. However, since the definition of independence is given as

, global coordinates are needed. We have to change the definition to one for local coordinates. We define geometrically local independence as the changing rate of distribution function for a coordinate, is not changed for another coordinate. Furthermore, we express the condition by a covariant form for coordinate transform named the geometrically locally isotropic independence equation as

, where

denotes a covariant differential. A distribution that satisfies this equation in a Euclid space is a normal distribution in a ordinary sense. We investigate pattern recognition and Maharanobis metric by using this equation. Until now, although advantages in a engineering sense are not obtained, we consider it is a nice research theme since the equation is beautiful. (Don't you think so.)

Relative principal component analysis (RPCA)

Principal component analysis (PCA) is used to extract principal components from a stochastic signal

. It can be achieved by the following criterion for a matrix

with the restriction of its rank.

denote expectation with respect to

. Our laboratory are proposing relative principal component analysis that is an extension of PCA. RPCA can extract components that which approximate the signal

while suppressing to extract components of another signal. Its criterion is given as

PCA is used for pattern recognition for extracting features of a category. By using RPCA we can extract features that is included in a category but is not included in other categories. We made kernerized version of the criterion and applied for various applications. The following table shows the error rates of the experiment with US postal handwritten number database.

method error rate [%]
PCA 5.38
RPCA 4.91
kernel PCA 4.43
kernel RPCA 3.49

We can know RPCA and kernel PCA provide better performance than PCA and kernel PCA.

Image coding (Image compression)

Image coding, by which we can compress the data of an image, is very important technology for digital broadcasting and using an image in computers. However, as shown the following figures, the block distortion or ringing around edges is observed in a restoration image by JPEG that is the existing image coding standard. The left image is an original image.

The right image is its restored from the compressed data by JPEG. Its data size is reduced to 1/20 of the original image. We may not know the difference of the two images. The images below are their magnified versions around the parasol.

We can see the block distortion and ringing around edges. Block distortion due to the discrete cosine transform which JPEG or MPEG uses since the transform is done for each block in an image. In order to solve this problem we are developing image coding methods which use subband transform. Since the blocks of a subband transform have overlapping regions with other blocks around it, the block distortion is decreased.

These image are examples The left image is an original image. The center one is an decoded image by an existing subband coding. The left one is an decoded image by CPAST (Convex Projections Adaptive Subband Transform) that we developed. Each compression ratio is 1/16. We can see advantages of CPAST.

In order to apply subband transform for moving image coding, we are investigating the motion compensation prediction with each pixel.

Computer architecture

The high speed processor is very important for image processing since the data size of image is very large. We need simple but very large amount of computations. The recent main theme for increasing performance is parallel processing such as chip multi processing. However, the ability of one processor is also essential even for parallel processing since the non-parallel part of computation limits the total performance. Then, our theme is increase of the performance of a processor.

Now we are investigating a new type of instruction cache memory system. In that system, operation codes and operands are separately stored into different cache memories. Operands has the information of data dependency so that its records are arranged in order. The record of an operation code has a pointer to operand table so that they are rearranged for increasing calculation speed. The following figure illustrates the block diagram of PACCS-5.

We can see that the cache memories for operation codes and operands are separated. The following figure illustrate the pipeline of PACCS-5.

The largest feature is the pipeline is decoupled between system unit and functional units.

A new processor is developed by , many researchers (from several ten to several hundred) so that our developing speed may be less than one of a company. However, since various kinds of ideas can be introduced to our architecture, this theme is suitable for students who are interested in a micro-architecture of a processor since labs where they research a micro-architecture are not many.

TOBE System (Text Oriented Bi-stream Explaining System)

We are developing a system for international development called TOBE System. When we describe the content of board, speech, action by tobeML (Text Oriented Bi-stream Explanation Markup Language) text, a teacher drawn by CG gives an explanation with board, speech, and action. We are aiming a system which can do a simple lecture, an explanation of TIPS, and the manual of machine with explanation. And we are making a system like the Wiki. It enables us to write tobeML text by many writers. Then, since we can distribute our task to many teachers, the load for an individual teacher is decreased. Furthermore, we can accumulate very wide area of knowledge. We call such system 'TOBE system' and are developing it.

We are dreaming as follows. When a man invented characters, we can accumulate knowledges. When a man invented printing, we can distribute knowledges to very many people. When a man invented Internet, everyone cat distribute their knowledge to people. However, in each case above, the knowledge are provided only by text or figures. Although, a man invented video system, it is difficult to create the content and its size of data is very large compared to text. The tobeML document is a text. Then, we can solve such problems. However, since the teacher created by CG explains with pointing the target on a blackboard, we can know easily what 'it' or 'that' in speech means. Therefore, TOBE System can improve the knowledge level of human beings and realize peaceful and wealthy world!!

Yamashita Lab in news paper

Bus Instruction Set Computer (BISC) architecture (in Japanese) (Nihon Kougyou Shinbun, daily, 1998, 3, 13, Friday, p16)
VR system for robot (in Japanese) (Nihon Kougyou Shinbun, daily,1998, 3, 20, Friday, p27)
Introduction of Yamashita Lab. (paper by students) (in Japanese)

We want a student who

likes linear algebra,
likes differential manifold,
wants to design a high performance processor,
wants to create a new method for pattern recognition or image coding,
wants to make a system that contribute human beings by Java programming,
worries about the future of human beings,
is cheerful.

We use Linux as the OS for personal computers (Fedora Core or Vine Linux). It is convenient to have learned Linux before come to our lab.

When you have a questions on this page, please mail to