Transform Methods & Signal Processing Mini-project
Mini-project description
The goal of the project is to simulate a research project involving
the use of transform methods to solve a signal processing problem.
Note that the project does not require original work, but it is up to
the research group to
- Formulate the problem,
Describe what you want to achieve and why it is interesting.
- Work out an approach,
This approach should involve something you have learnt about in
class: transforms, filters, etc. However you are expected to go into a
little more detail than is covered in class. You need to find out something
you haven't been given in your notes.
- Design a method for testing the approach,
Typically, this will involve implementing your idea, and testing it to see
how well it performs. Tests can be subjective and qualitative, but preferably
they are objective and quantitative. For instance, for noise removal, one
approach would be to deliberately add noise to an image, then try to remove it,
and compare the final image to the original image. Both quantitative
(e.g. RMS) and qualitative tests can be useful. You also need to be systematic in testing:
e.g. compare different algorithms with different images, types of noise, or simply repeat
experiments multiple times to get better statistics on the results.
- Write a report describing the results
Your report is what I will see, and so needless to say, effort must be made to make this
clear, and concise.
The work will be undertaken in groups of two to three, decided by the
students, unless there is some problem.
The group needs to choose a type of data, and an
application on that data, implementing that application with
the help of the Fourier, or closely related transforms (e.g. the DCT,
2D FT, STFT, Harmonic transform, Radon transform, Hankel transform,
Wavelets, or many others).
Example applications
A non-exhaustive list of possible applications is below.
- Denoising (e.g. add some noise to a data set and then test methods to remove the noise)
- Anomaly or change detection (e.g. add anomalies to a data set and then test methods to find these)
- Filtering (to produce a particular effect)
- Synthesis (artificial generation, for instance of musical tones, or monster noises for a kids toy) for instance see
- Compression (don't bother with encoding part)
- Steganography/Watermarking
- Audio fingerprinting
- Automated transcription (turning sounds into musical notation)
Note that, the approach used to solve the particular problem does not
have to be successful, as long as it is well motivated, and well
tested. It is acceptable to say "we tried approach X, and showed using
methodology Y that the approach does not work". This is a negative
result in the sense of success of the algorithm, but not in terms of
the success of the project.
Project Logistics
The goal at the end of the day is to write a report, describing your
problem and results.
The project will be conducted in two phases, for two reasons. Firstly,
I expect a draft report (3-5 pages) to be handed in before the
break. The goal of the draft is to make sure you are on the right
track (e.g. chosen a reasonable problem), and have made appropriate
progress. The second phase of the project will also involve extending
your work from the first stage, and therefore extending the report,
with the project being due by the end of week 12, i.e. 26th
Oct. (extensions will be considered for good reasons).
Final project submission should be in electronic form, either by
- putting all files (clearly labelled) into a CD or DVD, and handing it up.
- sending a .tar.gz or .zip file to me via email. If this option
is used include the string "TM&SP Project, 2006" in the subject
line. Only use this option if the total file is less than 5MB.
In either case, you should also submit a hard copy of your report, and you
must submit the appropriate plagiarism cover sheet [DOC]
[PDF].
Optionally, you may also submit a "permission to use for teaching" form, to allow me to display your project to
future students, to inspire them.
Supporting materials
Provide data to support your approach (unless you only use one of my
data sets), in which case you must still carefully describe the data
in the report. Externally sourced data should be carefully referenced,
and provided along with the report on a CD, or other suitable
media.
Also, provide code you develop for the report as an appendix to the
report, and include on a disc containing the report. Software does not
have to be original, as long as you still clearly display your
understanding of the techniques used, and also carefully document
exactly what software you used, and how it was used. For instance, it
is acceptable to use software from the Internet. This will make
writing your report harder, because you must provide clearer, and
stronger evidence that you understood the techniques and algorithms
being used. You do not have to document everything --- it is quite
acceptable to use Matlab's canned FFT routine for instance. Focus on
the parts of the algorithm specific to the application in question.
Learning and Teaching Goals
There are several goals for the project
- Learning to work with real data:
Real data always has issues: data format, noise or various
sorts, missing data, irregularities, or ambiguities. Learning to work
with such data can only be done through experience.
- Learning to work in a group
- Communication skills:
It is important that one be able to communicate results, or
they will have little impact on business or society.
- Practical knowledge of transform methods:
The project gives you a chance to explore one of the topics
described in lectures in more detail.
- Teaching me something:
I am always ready to learn something interesting and new, and
this provides a welcome opportunity.
- Learning to think critically through
- working out what project area is interesting
- working out what aspects are important
- working out what problem to solve
- critically assess your results
Assessment
Assessment will be based on three factors:
- 50% demonstration of understanding of one or more techniques
relating to transform methods. One may demonstrate your understanding by
- careful description of principles
- implementation of an algorithm
- using an algorithm in practise to perform a particular application
- providing critical assessment of the results of the algorithm
- all of the above
- 40% presentation of the report, important factors being:
- a good overall report structure
- illustration of the understanding of the research methods
- motivation
- clear explanation of methodology
- accurate, concise description of results
- provision of appropriate citations
- 10% how interesting your choice of application is.
I am imagining that a successful project will take about 20 hours work
from each participant (half in the first stage, and half in the
second). I do not expect you to solve any fundamental problems, or do
novel research. I do expect you to be able to think for yourself.
Typically, the marks will be the same for all members of the group,
though I reserve the right to give different marks. Also note that
material presented in the project is examinable!
Make sure the project involves transforms in some way: it is not enough
just to apply signal processing, without some illustration of why
transforms are needed.
Some starting points
I realize the non-specific nature of the project may cause
problems. So I am providing a bunch of data-sets to help you get
going. Obviously real research would be based on larger datasets than
those provided here, but the size of these should give you are
realistic understanding of what I expect from the report.
-->
Data
- ECG data:
drawn from the
PhysioBank Archive
- Images:
- Audio data:
- Financial data from http://au.finance.yahoo.com
- Internet traffic data
- Bellcore Ethernet data from the archive at
LBL
- SNMP traffic data from Abilene
- Artificial test data: in various works it has become common to use certain test signals, listed below.
- Blocks
- Bumps
- HeaviSine
- Doppler
Get this for details of
each, and here is a matlab snippet that generates
each. These originate from these papers:.
- Donoho, D.L., & Johnstone, I.M, "Ideal spatial
adaptation by wavelet shrinkage". Biometrika, 81, 425-455, 1994.
Software help
- ECG data: the data is already in a simple format, as converted
by me from the
PhysioBank Archive. You can get the original data, and
software to parse it at this location.
The data format is a series of numbers giving (sampling #,
sample value), so it should be trivial to process. The samples were taken at 720 Hz with 12-bit
resolution.
- Images: Matlab has a group of routines for reading and
writing various image formats, for example see
More general image manipulation can be done with a free (opensource) tool called the
Gimp which is available for
Unix, Windows and Mac systems.
Note that the images provided above are in TIFF format, because
this does not compress the image, so all information is present. These
may be converted to other formats using the various tools described
above, but for testing algorithms, it is preferable to use raw
data. It would be acceptable to reduce it to a gray-scale image
though.
- Audio data
- Formats:
- WAV format is a common uncompressed audio format
- MP3 is a common compressed audio format
- There are many others (.au, )
Note that most audio formats have two tracks for stereo,
though some formats may include more tracks.
- Audio capture may be done using many tools:
- from CD use a CD 'ripper'
(e.g. cdparanoia for Linux)
- From audio input, e.g. using arecord (for Unix), or GarageBand (for Mac),
or Matlab routine wavrecord for Windows.
- General audio manipulation:
- Matlab routines: there are Matlab routines for read, and loading WAV files
- wavread
to read a WAV file
- wavwrite
to write a WAV file
- sound
to play a sound.
- The following file
is an example of the use of these routines
- Financial data: The Yahoo data is in CVS (Comma Separated Variable) format.
- CVS is easily read into a spreadsheet, e.g. Excel
- See the matlab routine read_yahoo_data.m
for reading the data into matlab.
- Internet traffic data
- Bellcore Ethernet data, originally
consists of a packet trace, giving the time of arrival, and the packet
size for each packet over an extended period.
The archive
for this data contains the original 4 packet traces, along with
information for parsing this data.
I have simplified the traces, by "binning" it, i.e. counting the
number of bytes, and packets per time interval. The new data format is
where the timestamp is a time in seconds from the start of the trace,
and the bytes and packets are just counts of these quantities during
the time interval. The bin size used is 100 ms, so there are 10 bins
per second, or a sampling rate of 10 Hz.
There is a simple matlab script ethernet_plot.m for reading and
plotting this data (once it is uncompressed), and a log-log plot of
the power-spectrum of the data.
- Misc
- Generation of white noise is easy in Matlab, e.g. simply use the function
randn(N,M) which generates an NxM matrix of uncorrelated
Gaussian random variables.
Matthew Roughan
Last modified: Mon Jul 23 10:54:33 2007