Data science on the iPad with RethinkDB and Pythonista
Pythonista is an iOS application that brings Python development to Apple’s mobile devices. It includes a built-in Python runtime, an interactive REPL console, and a text editor with features like syntax highlighting and autocompletion. It also comes with its own canvas system and user interface toolkit, offering developers a way to build games and simple graphical applications.
In addition to the standard Python libraries, Pythonista bundles some useful extras that extend its functionality. For example, it comes with NumPy and matplotlib, which support scientific computing and advanced graphing. It also comes with a set of libraries that expose native platform and device capabilities, including geolocation and the system camera.
Used to its full potential, Pythonista is a surprisingly capable environment for mobile data science. I can use Pythonista on my iPad to crunch numbers at the coffee shop or on my living room couch. I couldn’t resist the temptation to add RethinkDB to the mix, giving me backend persistence and expressive queries.
Install the RethinkDB driver with Pipsta
With a little bit of tinkering, I figured out how to get RethinkDB’s
Python client driver into the Pythonista environment. Pythonista has its
own internal site-packages
directory that you can use to store reusable
libraries. Of course, adding an entire framework to the built-in
site-packages
directory by hand would prove prohibitively
time-consuming. Fortunately, a third-party script called
pipsta
offers a lightweight pip
implementation for
Pythonista. You can use the script to install libraries, including
RethinkDB’s Python driver, from the Python Package
Index.
I copied the script from GitHub and pasted it into a new file in
Pythonista’s site-packages
directory. After adding the script, I typed
the following lines in the Pythonista REPL:
>>> import pipsta
>>> pipsta.pypi_install("rethinkdb")
The pipsta
script automatically downloaded the RethinkDB package and
extracted the module into a pypi-modules
subdirectory inside the
Pythonista directory structure. After installing the module, I decided to
move it directly to the site-packages
directory so that I could import
it in the REPL or any Pythonista script without altering the import path.
Use RethinkDB in Pythonista
You can access RethinkDB in Pythonista on iOS in exactly the same manner that you would access RethinkDB in any other Python environment. I have a RethinkDB server running on my local network, so all I have to do is specify its internal network IP when I set up a connection in Pythonista:
import rethinkdb as r
conn = r.connect("192.168.0.10", 28015)
print r.db("rethinkdb").table_list().run(conn)
conn.close()
Now I’m going try a more substantive example, one that uses the same USGS
earthquake data that I used last year to demonstrate RethinkDB’s
geospatial features. I’m going to create a table to store
the earthquakes, set up a geospatial index, and then use the r.http
command to fetch the data. Next, I’ll use a simple merge
transformation
to turn the epicenter coordinates for each earthquake into an actual
r.point
object:
import rethinkdb as r
url = "earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_month.geojson"
conn = r.connect("192.168.0.10", 28015)
r.db("test").table_create("quakes").run(conn)
r.table("quakes").index_create("geometry", geo=True).run(conn)
r.table("quakes").insert(
r.http(url)["features"].merge(lambda quake: {"geometry": r.point(
quake["geometry"]["coordinates"][0],
quake["geometry"]["coordinates"][1])})).run(conn)
Now that I have the earthquake records in a table, I’m going to use
Pythonista’s geolocation module with the r.get_intersecting
command to
find all of the earthquakes that occurred within 100 miles of the user:
import rethinkdb as r
import location
conn = r.connect("192.168.0.10", 28015)
location.start_updates()
where = location.get_location()
nearby_quakes = r.table("quakes").get_intersecting(
r.circle([where["longitude"], where["latitude"]], 100, unit="mi"),
index="geometry").run(conn)
for quake in nearby_quakes:
print quake["properties"]["title"]
location.stop_updates()
Pythonista’s built-in location
module provides an abstraction layer over
the platform’s built-in geolocation APIs, making it easy to access the
user’s GPS coordinates. To activate location tracking, call
location.start_updates()
. After tracking begins, call
location.get_location()
to get the user’s coordinates.
In the example above, I pass the user’s coordinates into the
get_intersecting
command and then display the output in the console.
Any content that you print to stdout
in Pythonista will display in the
REPL console. You can switch there now to see the output of the script.
Generate graphs with RethinkDB and matplotlib
As I previously mentioned, Pythonista includes matplotlib and NumPy right out of the box. I use RethinkDB with matplotlib to generate quick data visualizations.
In the following example, I use a complex ReQL query to compute the number of earthquakes that occur on each day of the month. I pass the output to matplotlib, which generates a bar graph:
from maplotlib import pyplot
import rethinkdb as r
conn = r.connect("192.168.0.10", 28015)
quakes = r.db("quake").table("quakes") \
.merge({"date": r.epoch_time(r.row["properties"]["time"] / 1000).date()}) \
.filter(r.row["date"].month() == r.now().month()) \
.group(r.row["date"].day()).count() \
.ungroup().order_by(r.row["group"]) \
.do([r.row["group"], r.row["reduction"]]).run(conn)
print quakes
pyplot.bar(quakes[0], quakes[1])
pyplot.show()
When I call the show
method, Pythonista automatically displays the
resulting image in its REPL console alongside the other application
output. This makes it pretty easy to iterate on a script, experimenting
with different chart styles and approaches to visualizing data.
Build interactive user interfaces in Pythonista
Pythonista provides a simple user interface toolkit that developers can use to build graphical frontends for their scripts. The toolkit includes a handful of user interface controls, a layout system, and support for triggering callbacks for user interface events.
Pythonista offers a built-in visual editor that you can use to create views and manage layouts. I used the visual editor to create a simple graphical frontend for the previous matplotlib example.
I made a form with two text inputs that lets the user specify a minimum and maximum earthquake magnitude. When the user hits a button, the application will perform the query, but filter for only the earthquakes that fall within the specified boundaries. It generates a bar graph, which it displays to users in a image viewer control in the user interface. This is what the user interface form looks like in Pythonista’s visual view designer:
I used the visual designer to set up an action for the “Find” button. When
a user presses the button, the application will execute a function called
find
. The following is the source code for the application:
from matplotlib import pyplot
from cStringIO import StringIO
import rethinkdb as r
import ui
conn = r.connect("192.168.0.10", 28015)
def find_quakes(min, max):
return r.db("quake").table("quakes") \
.merge({"date": r.epoch_time(r.row["properties"]["time"] / 1000).date()}) \
.filter(r.row["date"].month() == r.now().month()
and r.row["properties"]["mag"] >= min
and r.row["properties"]["mag"] <= max) \
.group(r.row["date"].day()).count() \
.ungroup().order_by(r.row["group"]) \
.do([r.row["group"], r.row["reduction"]]).run(conn)
def find(sender):
min = float(view["text_min"].text)
max = float(view["text_max"].text)
quakes = find_quakes(min, max)
sio = StringIO()
pyplot.clf()
pyplot.bar(quakes[0], quakes[1])
pyplot.savefig(sio)
view["image_result"].image = ui.Image.from_data(sio.getvalue())
view = ui.load_view()
view.present()
The ui.load_view
method instantiates the view and all of the embedded
controls. You can use Python’s index sugar to access a control by name.
For example, the code above uses view["text_min"].text
to extract the
contents of the text box named text_min
.
When the application actives the find
function, it will retrieve the
minimum and maximum earthquake magnitude so that it can incorporate those
values into the ReQL query. When it uses matplotlib to produce the bar
chart, it saves the resulting binary image content into a StringIO
instance that I can pass to ui.Image.from_data
to display the image in
the user interface.
Next steps
Although this earthquake example is somewhat contrived, it should give you a sense of what you can build with Pythonista. You can make user interfaces that let the user specify the input parameters for a query. You can take advantage of matplotlib or the UI toolkit to display rich query results. You can also take advantage of Pythonista’s REPL to perform interactive experiments with data and data visualization. It’s a great a way to explore your RethinkDB data while you are on the go.
If you want to enable remote access so that you can use your home RethinkDB database from the coffee shop, be sure to take reasonable security precautions. I use a VPN to connect my mobile device to my home network so that I can securely access my database without exposing the port to the public internet.
Want to try using RethinkDB with Pythonista yourself? Install RethinkDB today and check out our 10-minute quickstart guide.
Resources: