As I am not able to continue work on this project at this time, I am releasing the source code into the public domain. Please note that there is some source included that I did not write: out of convenience, I occasionally copied some source code off of web sites. Where I was able to identify such instances, I have noted them and the source of the original code, if possible. But I cannot promise that the information is accurate, nor that I have identified all such code. (If you are the original author of some piece of code in my program and do not wish me to redistribute it please let me know and I will remove it immediately.) But, for anything that I did not label otherwise, it is my belief that I wrote it, and anyone is free to use the code I wrote however they please.
Please note also that the source code is incomplete: I have not had time to review the learning code yet, so it is not included. What is there is just the code to download the opinions and extract the citations. The scraping code is also out of date. For some circuits it may need to be modified or reworked in order to resume getting opinions.
Required for this code are the FontBox and PDFBox libraries as well as a Java-MySQL connector.
You can also download the data running the site so far here. The only part missing is the text field in the opinions table, to save space. To see what the fields mean please see the source code.