Error cHecking andEdge Cases
Well, what if you can't find any hints of a Middlebury ID in an inputted image?
We have coded-in cases for these situations.
​
For example, if Pytesseract cannot find any 8-digit ID number in any of the thresholds, the two returned values from that definition are returned blank. If the code after the first definition gets passed in an empty variable, it immediately knows that we received an invalid input, and we return the string: "No text was found in this input. Please take another picture."
​
Additionally, this string, if searched in the directory, will return no information on a potential Middlebury student, and the user will be prompted again to take another picture of his or her student ID.
​
We also add a section in our code to remove words from any potential ID that may skew our results, like "STUDENT", "Middlebury", and "College." That way, we don't accidentally return a student's name mixed in with any other word on the ID. We also make sure to remove any numbers or erroneous punctuation that could possible mess up a search on a student's name. Even the best OCRs are not 100% accurate; often times in our results, pytesseract would return a "I" as a bar, or "|", due to the font type or input lighting. There's not a perfect way to correct this solely using pytesseract, so we figured that removing all irrelevant punctuation like slashes and parentheses would reduce the frequency we run into those results.
​
Click here to return to our results page to see the rest of our application.
​
Click here to see how we scheduled our work flow and tasks over the semester.
​
​
