Hi,
I have worked on very similar projects (i.e. text classification if I understand correctly the description) using a variety of representations (bag of words, n-grams, vector space models) and algorithms (SVM, Deep Neural Nets, Logistic Regression).
I have placed a bid, but it would be beneficial to first know:
1)The amount of data available for training (statistics like: column A - column B pairs, count of column B values, average length of column A phrases, vocabulary in column A phrases)
2)The technology that can be used. I am good with Python/R and standalone toolkits that implement ML algorithms.
In this kind of problems the steps required are:
1)Data preprocessing/normalization
2)Feature selection
3)Algorithm evaluation
I have gone through this pipeline for a number of problems dealing with different domain/languages.
The final product (if Python is used) would be a python file that will take as input:
i) the trained model, provided by me (accompanied by the code used to create it)
ii)a csv file (column A text)
and output:
i) a csv file (column A - column B)
If necessary the output could be populated also with confidence scores (but this would probably be a follow-up project).
Feel free to contact me with any question/clarification you might have.
Greetings,
giannis