Issues and Future Work

ISSUES

One big issue that we initially ran into was trying to make our input images and the pytesseract OCR compatible in giving back legitimate results and not just returning blank strings. When devising how we would handle situations of overexposed and underexposed images, we at first tried many different filtering techniques. For example in our original test file, we tried using Gaussian filters on certain thresholded images, but the OCR was unresponsive to the Gaussian filtered images because they were often too blurry to use. In other situations where we tried highlighting edges in the image and running it through pytesseract, images after the edge-highlighting process would have too many dark spots or holes in the text we wanted to look at. We also tried changing the contrast, both manually through an Apple Photos slider and automatically by manipulating the picture input using Pillow, a Python library for editing images, before sending it to the OCR, but we were not able to get consistent results from solely changing the contrast. Though we were finally able to figure out that using thresholds and median filters on those images, these issues did not affect our timeline for developing our Python script; we understood that creating this script would require a lot of trial-and-error work by just playing with how Pytesseract works, and we set aside that time in our schedule accordingly.

However, one issue that we weren't completely able to solve, in terms of image processing, was that we were not able to get consistent results when testing the OCR output with damaged IDs. As you can see in our figure 3, Reid's ID has seen severe damage, and his information is only readable to human interpretation. In the future, figuring out a way to fill in the holes (literally) of damaged cards to still return directory information from our scanner would be something to look at with more time to develop.

In the front-end and backend of our application, the main issues we came across were integrating these npm libraries in React Native as React Native is a relatively new framework, there were often many issues we came across. Another challenge from a development standpoint was the challenge of working across two different languages: Javascript and Python. The context switching was definitely a challenge as we had to have a backend language in which we could conduct the image processing on with libraries that we were familiar with.

Screen Shot 2018-11-29 at 10.46.20 PM.pn

fig. 1: Gaussian filtering on a student ID

Screen Shot 2018-11-29 at 10.51.53 PM.pn

fig. 2: edge highlighting on a student ID

FUTURE WORK

fig. 3: a case of a damaged ID

We found a react-native library called get-pixels (https://www.npmjs.com/package/get-pixels) that handles image processing within JavaScript. Using this package would let us perform image processing on user uploads within our react native code scope, before we sent the image to our Flask server, which allow us to pre-process the images and reduce the runtime of our python script. We could use this package to check the exposure level of an image before sending the image to Flask and throw an error to the user in the UI if the image is over or underexposed and make the user take a new image. This would help us eliminate some of the edge cases that need to be handled by the image processing Python script on the Flask server.

We also would also have liked to have thought up a way to identify more edge-cases, both related to image processing techniques and not. For example, our ID scanner is able to recognize a Middlebury ID through the text format of "first name", "last name", and an 8 digit college ID number, but it wouldn't be able to recognize a picture of other college IDs. We would want to handle that and create a case that returns a special error message that the given ID is not registered to Middlebury's directory, even prior to searching the directory for a name. Another case we would want to work out is the situation when two people in the Middlebury directory have the same listed first name and last name. We don't have a method that gives consistent results, but then again, the only situation that we are aware of this happening is with a student and professor both named "Will Nash."

Lastly, an area for improvement for our application if given more time would be to focus on how our code itself runs, and how we could increase efficiency and robustness. A few edits, suggested by Prof. Vacarri, were that we could incorporate a way to compile all 18 thresholds and then vote on which one had the most clear results. This would increase the robustness of our code by understanding simple details of the ID scanning process.

References