CST 338 - Software Design
Students learn important data structures in computer science and acquire fundamental algorithm design techniques to get the efficient solutions to several computing problems from various disciplines. Topics include the analysis of algorithm efficiency, hash, heap, graph, tree, sorting and searching, brute force, divide-and-conquer, decrease-and-conquer, transform-and-conquer, dynamic programming, and greedy programming.
Word Analysis Application
-
Description
-
Code
<
>
Synopsis
We're going to create a new GUI application that does some basic book analysis. The application will take a book, and count the number of unique words in the book. The GUI will display a heat map of the top 50 words. We're going to process some big books, which means it could take some time. We're going use multithreading to make sure that the application is not blocked. Additionally, we're going to update the UI every so often to provide updates to the user.
We're going to be applying the a form of divide and conquer through this implementation. When a book starts the analysis, a number of threads will be created. Each thread will process a line, and then merge the work back into the book object. It will then pick up the next line available to process. The idea is that the more threads the book uses, the faster it will accomplish the task.
We're going to be applying the a form of divide and conquer through this implementation. When a book starts the analysis, a number of threads will be created. Each thread will process a line, and then merge the work back into the book object. It will then pick up the next line available to process. The idea is that the more threads the book uses, the faster it will accomplish the task.
Specifications
For this final project we're going to be using a using more GUI stuff, threads, as well as some other patterns.
The application will have the following components:
The application will have the following components:
- GUI Interface
- Safe multithreaded application.
- MVC pattern which will be hooked to the GUI.
- A book class which uses multiple threads to do a word count.
- A word count class that does the basic sanitation and keeps track of words.
Phase 1 - The WordCount class
The word count will be the most basic component. It takes a sentence, and removes all punctuations, and separates each word to generate a unique count.
We're also going to keep what the highest and lowest count for the word is. That will allow us to generate good stats. We're also going to implement the Clonable interface to make sure data is safe when it moves from thread to thread or into the interface.
Variables
We're going to need some getters for other instances as well.
Methods
Constructors
We're going to need a basic constructor, and a copy constructor.
Functions
We're also going to keep what the highest and lowest count for the word is. That will allow us to generate good stats. We're also going to implement the Clonable interface to make sure data is safe when it moves from thread to thread or into the interface.
Variables
- private static String punctuation = "[^\\p{L} ]";
- private HashMap<String, Integer> wordMap;
We're going to need some getters for other instances as well.
- private String topWord;
- private String bottomWord;
Methods
Constructors
We're going to need a basic constructor, and a copy constructor.
- public WordCount(): Constructor
- public WordCount(WordCount wordCount)
Functions
- public void analizeSentence(String sentence): This function will work break down the sentence and process into unique words.
- private void addRawWord(String word, int count): Adds the word to the dictionary. It will also keep track of the top and bottom words.
- public Map<String, Integer> getWordCount(): Returns a copy of the map.
- public void merge(WordCount wordCount): Merges an instance of the WordCount into the current instance. It will be used when we split it into different threads.
- public WordCount getTopWords(int number): Returns a WordCount instance that contains the top count of the words. The GUI will only display a small subset. so it will be useful to be able to get only a few.
- public int getNumberOfHits(String word): Returns the count for a given word.
Phase 2 - The Book Class
The Book class will be the coordinator for a word count. It will implement the Runnable interface which will allow us to work with multithreading. The book class will take a full book as a string. Split the break lines into a string array and using using a number of threads, process the book. Each thread will then merge their work into a single instance of the word count.
We will use the synchronized property to make sure that reads from the book do not overlap each other, and to make sure when data is modified that there are no collisions.
Variables
We're going to need some getters and setters.
Methods
Constructor
Functions
We will use the synchronized property to make sure that reads from the book do not overlap each other, and to make sure when data is modified that there are no collisions.
Variables
We're going to need some getters and setters.
- private int threadCount;: The number of threads to spin to process
- private boolean running;: Set only when the analysis is running.
- private int currentLine;: The current line being red.
- private String[] book;: The book to analyse.
- private WordCount wordCount;: The instance of word count that holds this the book data.
Methods
Constructor
- public Book(String continousBook, int threadCount)
Functions
- private void init(String[] book): Initilizes the book.
- public String nextLine(): A synchronized method that reads the next line of the array and moves the cursor forward.
- private void analyzeBook(): Runs the analysis of a book in a thread, and merges the results into a current data set.
- public void doNothing(int milliseconds): Sleeps the current thread for some time. Reading books can be very fast, so this simulates a large data set.
- public boolean setThreadCount(int count): This method needs to check the analysis is not running before updating the number of threads to run.
- public void analyzeMultithread(): Runs the multithreaded analysis. Spins the set number of threads, and joins them at the end safely.
- public int getUniqueWordCount(): Gets the unique word count for the book.
- public int isRunning(): An accessor method to check if the analysis is currently running.
Phase 3 - MVC
We're going to create GUI Interface to run the book analysis. The GUI should have the following:
The BookModel
The book model can hold multiple instances of the book. The book model provides the layer of the interaction between the book and the controller. The model also keeps the data between books consistent.
Since we're going to run multiple book analysis, we're going to also make sure they all use the same data.
Variables
Functions
The BookController
The book controller will also need to be multithreaded. The controller should be able to spin a book analysis and continue running/working as well as updating the data in the interface.
We'll need to keep in mind that the needs to start the analysis, and run the loop with the update checking on the progress. Each of those will also be on their own thread. Because each book needs it's own thread, we will make things easier by using an inner class that implements the Runnable interface.
The buttons should also be blocked from any action while the analysis is running to prevent strange things from happening.
The BookView
We're going to run the analysis on two books at the same time so we can compare speeds.
The view will display a few items:
- Area that displays the title of the book being processed.
- The ability to select how many threads to run for the analysis.
- A button to start the analysis.
- An area displaying a heat map of the top 50 words (no sorting).
- An area displaying the top 50 words with their count (no sorting).
- An area displaying important information about the current analysis such as number of threads running, percentage completed, etc.
The BookModel
The book model can hold multiple instances of the book. The book model provides the layer of the interaction between the book and the controller. The model also keeps the data between books consistent.
Since we're going to run multiple book analysis, we're going to also make sure they all use the same data.
Variables
- public final static int BOOK_COUNT = 2;
- public final static int THREAD_COUNT = 5;
- public final static int MAX_THREADS = 10;
- private String title;
- private Book[] books;
Functions
- public BookModel(String path, String title) throws IOException: We're going to read from a file, we'll need to make sure we throw and catch the IO exception. The data should be loaded for each and every instance of the book.
- public String getTitle(): Getter method for the title.
- public boolean setThreadCount(int index, int count): Sets the thread count for an specific book.
- public int getThreadCount(int index): Gets the current thread count for a given book.
- public void startAnalysis(int index): Starts the analysis for a given book. This will be used by the controller.
- public boolean isAnalysisRunning(): Checks every instance of the book to verify that no analysis is running.
The BookController
The book controller will also need to be multithreaded. The controller should be able to spin a book analysis and continue running/working as well as updating the data in the interface.
We'll need to keep in mind that the needs to start the analysis, and run the loop with the update checking on the progress. Each of those will also be on their own thread. Because each book needs it's own thread, we will make things easier by using an inner class that implements the Runnable interface.
The buttons should also be blocked from any action while the analysis is running to prevent strange things from happening.
The BookView
We're going to run the analysis on two books at the same time so we can compare speeds.
The view will display a few items:
- The title of the book to process.
- The top 50 words color coded and sized in a heat map / word cloud fashion for each book.
- The list of the 50 words with their corresponding counts for each book.
- 2 Sliders to select the number of threads for each side.
- A button to start the analysis.
- Some statistics such as progress.