UNIVERSITY OF MASSACHUSETTS,
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
Project 1
ECE 241: Advanced Programming I
1 INTRODUCTION
In today’s Internet-dominated world, we take for granted computers’ ability to search through
vast quantities of data to find information that is of interest to us. Search engines’ ability to
find data quickly results from many programmers’ and algorithm designers’ efforts over many
years. In Project 1, you will have the opportunity to learn the basics to become a successful
future programmer by implementing algorithms for the Million Song Dataset. These algorithms include sorting and searching through a database of information related to songs. In
this project, you will use a subset of the Million Song Dataset. This dataset is an example of
the mass quantities of data evaluated by computers many times a day.
In this assignment, you will read in a subset of the Million Song Dataset from an Excel file
(Downloadable from here.). It contains information for 9,709 songs. The information includes a number of fields including title, artist, song duration, track ID. You will then perform
search and sort operations on the data to demonstrate the performance of different sorting
and searching algorithms.
2 IMPLEMENTATION
In this project, you will perform the following specific tasks:
1. Manage your data in Python classes and objects. For example, define two classes, Song
and SongLibrary. You will use one instance of the Song class for each song and one
instance of the SongLibrary class for your song library.
1
a) In the Song class, you should have variables for the different attributes of the
songs (title, artist, song duration, track ID).
b) In the SongLibrary class, you should have the necessary information about the
library, which includes an array for all the Song objects, an integer that indicates
how many songs the library holds, and a Boolean variable to show whether the
library is sorted or not. We will introduce a few more variables later on. To make
it efficient to search the songs, you also need a Binary Search Tree (BST) for the
song library based on the song titles.
In your SongLibrary class:
2. Develop a loadLibrary method. This method takes the input data file name, opens it,
and loads the songs in the file into the library. Make sure the order of the songs is the
same as in the input file.
3. Implement a linear search algorithm to search the songs based on either song title or
artist. Return the number of the songs found in the song array that satisfy your search.
The search function should have two parameters, one for the query and one for the
attribute (string as ’title’ or ’artist’).
4. Implement a quick sort algorithm to sort the song database based on the song title. The
sorted songs should be stored in the same song array. Make sure to change the status
of the Boolean variable that shows whether the library is sorted or not.
5. Implement a function to build a balanced BST of the songs based on the song title.
Ideally, the height of a balanced BST for 9,709 songs is 14. In this task, you need to
build a tree with height AT MOST 24.
6. Implement a search function for the song title on the BST. The input arguments and
the output are the same as the linear search.
3 EXECUTION INSTRUCTIONS
In your main function, using proper methods and objects, read in the library and test the developed methods one by one. Think of convincing testing scenarios to prove the correctness
of your code.
When running the method to build the BST, time the execution of it. Add a variable to your
library class with the initial value of 0. The first time you build the BST, save the time it took
to build it, in that variable.
In the next 3 steps, we will compare the performance of searching on a BST against the linear
search algorithm.
1. Use a random number generator (uniform distribution) to select 100 song titles arbitrarily. Save them in a list. Perform a linear search in the sorted database for those 100
songs. Record the total time spent on the linear searches.
2
2. Perform the searches for the same 100 songs, now in the BST. Record the total time
spent.
3. To take advantage of the fast search operation on a BST, we first need to spend some
additional time building the BST. If we have many songs to search for, it is worthwhile
to spend the extra time. Using the information calculated above (Steps 1 and 2 and
the saved BST build time), determine the minimum number of searches, n, for a BST
to show better performance than linear search. In other words, n times the average
linear search time would be higher than the total time spent on building a BST (only
performed once) and n searches on the BST.
4 SUGGESTIONS
Completing the project and achieving a good grade requires completing the project as described above and clearly commenting on the code. As always, it makes sense to start the
project early. Even if you are already a fantastic programmer, you probably won’t be able to
finish the project in one day. Build your project code step by step. For example, verify that
you have successfully read in the database before attempting a linear search. Make sure the
linear search works before writing and testing code for QuickSort, BST, etc. Additional hints
are as follows:
1. Make sure that there are NO print statements in the code you are attempting to time.
The use of these statements will negatively affect recorded time values and lead to incorrect results. Your submitted QuickSort, linear search, and search on BST methods
should not include these statements.
2. For each line of the song database ("0,Qing Yi Shi,Leon Lai,203.38893,5237536"), you
can split it based on the ’,’.
3. Use the random module to identify the 100 random titles for searching. You can use the
random numbers to locate specific song indices in the song database array. Note that
you must save these song titles in a new array so you can perform the same search.
4. For a balanced BST, you can implement the AVL tree. Another solution you can try is
to randomize the song title to insert. Theoretically, you can get a better-balanced tree,
but you need to test the random seed to make sure the tree height is less than 24. (You
can consider implementing an additional function to compute the tree height.)
5. The QuickSort, BST, and linear search methods can follow in a similar format from the
book. Please do not use available library modules and functions for these functions
and classes.
5 SUBMISSION GUIDELINE
For this project, submit a single .zip file on Moodle. In this archive:
3
1. Place your code.
2. Place a single PDF document. This document should include your recorded running
time of building the BST, the average time for linear search, and search on the BST. In
this document, you should also explain how you calculate the number of searches, n,
for step 3.
0 comments:
Post a Comment