What is machine learning ?

Machine Learning is a statistical science where the goal is to find the statistical regularities in the environment and to model a system to work as if a physical system might have performed in that environment or even better.

As for any intelligent living being’s need to be aware of it’s environment to learn, Machine Learning(ML) systems also require to understand its environment to learn these regularities. We provide this information to the ML system by a set of vectors called the input pattern vectors. Input pattern vectors are a subset of the feature space and feature space is a vector space containing all the events in the environment in a transformed representation. This type of feature transformations are important because it helps us to reduce the dimensionality of the original vector space in many cases.

We call the output or action of the ML system on the environment as the output pattern vector and the output we expect from the ML system as the desired output vector or desired output response.

Ok now we have the inputs to the system and we know what to expect out of the system. So how do we know if the system is working as we expected? One good way to know that would be to find the difference between the output of our system and the desired output. Had it been a static system, things were simple and you only had to literally do what was said in the previous sentence. But in a dynamic system things are a bit different.

So for the time being let’s consider our ML system to be a black box. Let’s also assume that the system’s action can be determined by a parameter vector \theta. So given an input pattern vector \mathbf{s}, we can write our ML system’s output response as \hat{y}(s,\theta) .

Here comes the idea of an empirical risk minimization framework. Usually, given the input and output vectors we can define a loss function that minimizes the error between the predicted response and the desired response. This is called the true risk. But in real world scenarios we never have access to the whole population of data. So we assume that the data we have at our had have the same distribution as that of the population and hence we approximate this as the whole population data distribution and hence the term empirical. We now try to find a function that minimizes the risk (error) between the output response and the desired response. This process is called empirical risk minimization.

So if \mathbf{c}(\mathbf{s},\mathbf{\theta}) is the loss function that computes the error between predicted response and the desired response, we can define our empirical risk function \hat{l}_n (\theta) as

\displaystyle \hat{l}_n (\theta) = \frac{1}{n} \sum_{i=1}^n \mathbf{c}(\mathbf{s}_i,\mathbf{\theta})

Our objective here is to find a \theta that minimizes the above function. To start with we give a random value to \theta and call it \theta_{0} and compute the loss function. By monitoring the loss function we can see if we are getting closer to our optimal \theta . The change in loss function with respect to \theta can be calculated by taking its derivative i.e. \displaystyle \frac{d\hat{l}_n(\theta)}{d\theta}.

So given an initial parameter \theta_{0} (remember we choose the value for this), we can compute the \theta at iteration \mathbf{n+1} as

\displaystyle \theta_{n+1} = \theta_{n} - \gamma_{n} \frac{d\hat{l}_n(\theta)}{d\theta}

where \gamma_n is called the learning rate.

This idea is called the method of gradient descent and is the essence of a huge number of practical machine learning algorithms.

On Publishing Research Articles

These are the notes that I have scribbled down in my notepad, in the same order as it was mentioned (so some points might be misplaced), from the panel discussion that took place at University of Texas at Dallas as part of their Graduate Professional Week – 2016

Ingredients for Publishable Article:

Subject to say about
— Study something novel or new methodology or both
— Tell a nice story — Where this fits?
— For introduction, why this study is important
— Abstract should have a great appeal to general audience

— Good Knowledge of the field
— Find the primary audience of the journal before start writing the article, and address it accordingly

Advice for the 1st  Research Article:

— Be ready to be rejected (expect high rejection) – Don’t take it personal. Rejection is the norm
Write everyday. Get into the habit of writing
— Don’t publish the 1st one, publish the 2nd one. You can publish the 1st one later (pun)
— Writing a good paper is difficult. You can/will/should improve with time
— Break the process down and take one at a time (creating figures, making charts, creating story-line)
Get organized with references (make use of RefWork)
Keep plagiarism in check
Feedback is important. Try to get adequate feedback before sending them to journals
— Have multiple levels of feedback (peers/colleagues, mentors)
— Take comments from editors seriously
— Revise your paper. Revision is key
— Have a fabulous abstract/introduction/1st paragraph
— Don’t submit sloppy work. Affects your’s as well as your institute’s reputation
— Use RefWork [UTD students have life long free access]
Accept criticism 
— Start writing only after understanding the work completely

Problems students face while trying to write the article:

— Perfectionism
— Procrastination

Advice for finding the right journals:

— Read & Read more
—  1 day of reading = 1 week at the lab
— Read what people are writing about in your field, understand where the field is going
— The more you read, the better writer you become
— Keep up with “methods”
— Look at citation to find it’s place in your field
— Find who else is working in your field
— The most cited journal in your bibliography is the best journal to publish your article
— Writing has to be specific as well as generic. Should make sense for both experts as well as newbies

For Read & Review sessions:

— Always thank your reviewer
— Never get into an argument tone
— Spend time to understand the reviewer’s comment

Contents from the handout

Books on Academic Writing:

  1. Academic Writing: A Handbook for International Students by Stephen Bailey
  2. Destination Dissertation: A Traveler’s Guide to Done Dissertation by Sarah K Foss
  3. From Inquiry to Academic Writing: A Practical Guide by Stuart Greene
  4. Academic Writing and Publishing: A Practical Guide by James Hartley
  5. Handbook for Academic Authors by Beth Luey
  6. Academic Writing: A Guide for Management Students & Researchers by Mathukutty M Monippally 
  7. Handbook of Academic Writing: A Fresh Approach by Rowena Murray
  8. How to Write A Lot: A Practical Guide to Productive Academic Writing by Paul J Silvia
  9. Academic Writing for Graduate Students: Essential Tasks and Skills by John M Swales
  10. Stylish Academic Writing by Helen Sword 

 

Panelists

  • Dr. Marion Underwood, Dean of Graduate Studies
  • Dr. Ellen Safley, Dean of Libraries 
  • Dr. Yves Chabal, Professor of Material Science & Engineering
  • Dr. Julia Chan, Professor of Chemistry
  • Dr. Frank Dufour, Professor of Arts and Technology
  • Dr. John Goosh, Associate Dean of Graduate Studies
  • Dr. Shayla Holub,  Associate Professor of Developmental Psychology 
  • Dr. Alex Piquero, Associate Dean of Graduate Programs
  • Dr. Karen Prager, Professor of Psychology
  • Dr. Sumit Sarkar, Professor of Information Systems