I require a Java command line program that can perform text classification (binary) using Naive Bayes (maybe include TF-IDF if possible).
This is very similar to a Spam Filter (i.e. Ham or Spam). Here are several examples
[login to view URL]
[login to view URL]
The program should use Java 8 and use a open source machine library like Apache MLLib, Apache Mahout, DeepLearning4J, [login to view URL], JATECS, or some other (do not use WEKA). If you know of another library that you want to use, let me know.
The program should have the following functionality:
- Reads in a CSV file (the first column will be the binary outcome, and the second column will be the Text)
- Process Data File (like stemming, remove stop words, etc.)
- Train on 70% of data; Test on 30% of the data
- Output Result should be a Confusion Matrix and an Accuracy Calculation. If a sentence is provided then should print out the predicted binary outcome.
The java program should take the following parameters:
Java -jar [login to view URL] [login to view URL] “sentence”
Note: “sentence” parameter is optional and should print the predicted binary outcome.
I have provided 3 sample data files here: [login to view URL]
I would prefer someone that has done text classification before. If you can suggest other techniques let me know.
Deliverables include the following:
- Source code with documentation
- Jar file
Sir, I have gone through your project description and am looking to provide my service for Naive base classification work.
For more than 7 years I’ve worked in this field.
Education: Masters in Machine Learning.
Hi,
I would love to do text classification using native Bayes ,i am a professional java developer also having a team of java developers ,so please come to private chat i have to ask some thing about your project.
Let's connect