# Bark Bark is a standard for electrophysiology data. [![Build Status](https://travis-ci.org/kylerbrown/bark.svg?branch=master)](https://travis-ci.org/kylerbrown/bark) Version: 0.2 ## The Bark philosophy 1. **minimal specification and implementation** 2. **simple file formats** 3. **small, chainable utilities** By emphasizing filesystem directories, plain text files and a common binary array format, Bark makes it easy to use both large external projects and simple command-line utilities. Bark's [small specification](specification.md) and Python implementation are easy to use in custom tools. These tools can be chained together using GNU Make to build data pipelines. ## Why use Bark? Bark takes the architecture of ARF and replaces HDF5 with common data storage formats, the advantages of this approach are: - Use standard Unix tools to explore your data (cd, ls, grep, find, mv) - Build robust data processing pipelines with shell scripting or [make](http://kbroman.org/minimal_make/). - Edit metadata with a standard text editor - Leverage any third party tools that use Bark's common data formats. - Raw binary tools: [Aplot](https://github.com/melizalab/aplot), [Neuroscope](http://neurosuite.sourceforge.net/), [Plexon Offline Sorter](http://www.plexon.com/products/offline-sorter), [Wave_clus](https://github.com/csn-le/wave_clus), [spyking circus](https://spyking-circus.readthedocs.io), [phy](https://github.com/kwikteam/phy), [sox](http://sox.sourceforge.net/sox.html). - CSV tools: R, Pandas (Python), Excel, csvkit and Unix tools like sed and awk. - Include non-standard data such as images or video in Bark entries. ## The elements of Bark Bark trees are made from the following elements: - **SampledData** stored as raw binary arrays. Metadata is stored in another file with ".meta.yaml" appended to the dataset's filename. - **EventData** stored in CSV files. As above, metadata is stored in a "X.meta.yaml" file. - **Entries** (often trials) are directories containing Datasets that share a common time base. These directories contain a `meta.yaml` file and any number of Datasets. - **Root** is a directory containing a collection of related Entries. Often a Root is used to contain all data from an experimental session. This repository contains: - The specification for bark (in [specification.md](specification.md)) - A python interface for reading and writing Bark files - Tools to accomplish common Bark tasks ## Installation The python interface requires Python 3.5+. Installation with [Conda](http://conda.pydata.org/miniconda.html) is recommended. git clone https://github.com/kylerbrown/resin cd resin pip install . cd .. git clone https://github.com/margoliashlab/bark cd bark pip install -r requirements.txt pip install . # optional tests pytest -v You'll also probably want to install [Neuroscope](http://neurosuite.sourceforge.net/). [Sox](http://sox.sourceforge.net/sox.html) is also useful. ## Shell Commands Every command has help accessible with the flag `-h` (e.g. `bark-entry -h`). ### Transformations - `bark-entry` -- create entry directories for datasets - `bark-attribute` -- create or modify an attribute of a bark entry or dataset - `bark-column-attribute` -- create or modify an attribute of a bark dataset column - `bark-clean-orphan-metas` -- remove orphan `.meta.yaml` files without associated data files - `dat-select` -- extract a subset of channels from a sampled dataset - `dat-join` -- combine the channels of two or more sampled datasets - `dat-split` -- extract a subset of samples from a sampled dataset - `dat-cat` -- concatenate sampled datasets, adding more samples - `dat-filter` -- apply zero-phase Butterworth or Bessel filters to a sampled dataset - `dat-decimate` -- down-sample a sampled dataset by an integer factor, you want to low-pass filter your data first. - `dat-diff` -- subtract one sampled dataset channel from another - `dat-ref` -- for each channel: subtract the mean of all other channels, scaled by a coefficient such that the total power is minimized - `dat-artifact` -- removes sections of a sampled dataset that exceed a threshold - `dat-enrich` -- concatenates subsets of a sampled dataset based on events in an events dataset - `dat-spike-detect` -- detects spike events in the channels of a sampled dataset - `dat-envelope-classify` -- classifies acoustic events, such as stimuli, by amplitude envelope - `dat-segment` -- segments a sampled dataset based on a band of spectral power, as described in [Koumura & Okanoya](dx.doi.org/10.1371/journal.pone.0159188) There are many external tools for processing CSV files, including [pandas](http://pandas.pydata.org/) and [csvkit](https://csvkit.readthedocs.io). ### Visualizations - `bark-scope` -- opens a sampled data file in [neuroscope](http://neurosuite.sourceforge.net/). (Requires an installation of neuroscope) Note for MacOS users: run this command in the terminal: `$ ln -s /Applications/neuroscope.app/Contents/MacOS/neuroscope /usr/local/bin/neuroscope` - `bark-label-view` -- Annotate or review events in relation to a sampled dataset, such as birdsong syllable labels on a microphone recording. - `bark-psg-view` -- Annotate or review on mutiply channels of .dat files. ### Conversion - `bark-db` -- adds the metadata from a Bark tree to a database - `bark-convert-rhd` -- converts [Intan](http://intantech.com/) .rhd files to datasets in a Bark entry - `bark-convert-openephys` -- converts a folder of [Open-Ephys](http://www.open-ephys.org/) .kwd files to datasets in a Bark entry - `bark-convert-arf` -- converts an ARF file to entries in a Bark Root - `bark-convert-spyking` -- converts [Spyking Circus](https://spyking-circus.readthedocs.io/en/latest/) spike-sorted event data to a Bark event dataset - `csv-from-waveclus` -- converts a [wave_clus](https://github.com/csn-le/wave_clus) spike time file to a CSV - `csv-from-textgrid` -- converts a [praat](http://www.fon.hum.uva.nl/praat/) TextGrid file to a CSV - `csv-from-lbl` -- converts an [aplot](https://github.com/melizalab/aplot) [lbl](https://github.com/kylerbrown/lbl) file to a CSV - `csv-from-plexon-csv` -- converts a [Plexon OFS](http://www.plexon.com/products/offline-sorter) waveform CSV to a bark CSV - `dat-to-wave-clus` -- convert a sampled dataset to a [wave_clus](https://github.com/csn-le/wave_clus) compatible Matlab file - `dat-to-audio` -- convert a sampled dataset to an audio file. Uses [SOX](http://sox.sourceforge.net/) under the hood, and so it can convert to any file type SOX supports. ### Control Flow - `bark-for-each` -- apply a command to a list of Entries. ### bark-extra More tools with less generality can be found in the [bark-extra](https://github.com/gfetterman/bark-extra) repository. ## Python interface ```python import bark root = bark.read_root("black5") root.entries.keys() # dict_keys(['2016-01-18', '2016-01-19', '2016-01-17', '2016-01-20', '2016-01-21']) entry = root['2016-01-18'] entry.attrs # {'bird': 'black5', # 'experiment': 'hvc_syrinx_screwdrive', # 'experimenter': 'kjbrown', # 'timestamp': '2017-02-27T11:03:21.095541-06:00', # 'uuid': 'a53d24af-ac13-4eb3-b5f4-0600a14bb7b0'} entry.datasets.keys() # dict_keys(['enr_emg.dat', 'enr_mic.dat', 'enr_emg_times.csv', 'enr_hvc.dat', 'raw.label', 'enr_hvc_times.csv', 'enr.label']) hvc = entry['enr_hvc.dat'] hvc.data.shape # (7604129, 3) ``` The `Stream` object in the `bark.stream` module exposes a powerful data pipeline design system for sampled data. Example usage: ![Example usage](bark-stream-example.png) ## Pipelines with GNU Make Some links to get started with Make: + http://kbroman.org/minimal_make/ + https://bost.ocks.org/mike/make/ + https://swcarpentry.github.io/make-novice/ ## Related projects - NEO - NWB - NIX - neuroshare () is a set of routines for reading and writing data in various proprietary and open formats. ## Authors Dan Meliza created ARF. Bark was was written by Kyler Brown so he could finish his damn thesis in 2017. Graham Fetterman also made significant contributions.