API reference

Opening files

open(path, mode='r')

Open a BigWig or BigBed file for reading or writing.

Parameters:
  • path_url_or_file_like (str or file-like object) – The path to a file or an http url for a remote file as a string, or a Python file-like object with read and seek methods.

  • mode (Literal[``”r”, ``"w"], optional [default: "r"]) – The mode to open the file in. If not provided, it will default to read. “r” will open a bigWig/bigBed for reading but will not allow writing. “w” will open a bigWig/bigBed for writing but will not allow reading.

Returns:

The object for reading or writing the BigWig or BigBed file.

Return type:

BigWigWrite or BigBedWrite or BBIRead

Notes

For writing, only a file path is currently accepted.

If passing a file-like object, concurrent reading of different intervals is not supported and may result in incorrect behavior.

Reading

class BBIReader(rust_reader)

Interface for reading a BigWig or BigBed file.

Returned by open() in read mode. Use the methods below to query chromosomes, intervals, records, zoom levels, and summary statistics, or extract values as NumPy arrays. Supports the context-manager protocol.

average_over_bed(bed, names=None, stats=None)

Gets the average values from a bigWig over the entries of a bed file.

Parameters:
  • bed (str or Path) – The path to the bed.

  • names (bool or int, optional) –

    If None, then no name is returned and the return value is only the statistics value (see the stats parameter).

    If True, then each return value will be a 2-length tuple of the value of column 4 and the statistics value.

    If False, then each return value will be a 2-length tuple of the interval in the format {chrom}:{start}-{end} and the statistics value.

    If 0, then each return value will match as if False was passed.

    If a 1+, then each return value will be a tuple of the value of column of this parameter (1-based) and the statistics value.

  • stats (str or List[str], optional) –

    Calculate specific statistics for each bed entry.

    If not specified, mean will be returned.

    If "all" is specified, all summary statistics are returned in a named tuple.

    If a single statistic is provided as a string, that statistic is returned as a float or int depending on the statistic.

    If a list of statistics are provided, a tuple is returned containing those statistics, in order.

    Possible statistics are:
    • size: Size of bed entry (int)

    • bases: Bases covered by bigWig (int)

    • sum: Sum of values over all bases covered (float)

    • mean0: Average over bases with non-covered bases counting as zeroes (float)

    • mean or None: Average over just covered bases (float)

    • min: Minimum over all bases covered (float)

    • max: Maximum over all bases covered (float)

Return type:

Generator of float or tuple.

Notes

If no name field is specified, returns a generator of statistics (either floats or tuples, as specified by the stats field). If a name column is specified, returns a generator of 2-length tuples of the form ({name}, {average}). Importantly, if the statistics value is itself a tuple, then that tuple will be nested as the second value of the outer tuple.

chroms(chrom=None)

Return the names of chromosomes in a BBI file and their lengths.

Parameters:

chrom (str or None) – The name of the chromosome to get the length of. If None, then a dictionary of all chromosome sizes will be returned. If the chromosome doesn’t exist, returns None.

Returns:

Chromosome length or a dictionary of chromosome lengths.

Return type:

int or Dict[str, int] or None

info()

Return a dict of information about the BBI file.

records(chrom, start=None, end=None)

Return the records of a given range on a chromosome.

The result is an iterator of tuples. For BigWigs, these tuples are in the format (start: int, end: int, value: float). For BigBeds, these tuples are in the format (start: int, end: int, …), where the “rest” fields are split by whitespace.

Parameters:
  • chrom (str) – Name of the chromosome.

  • start (int, optional) – The range to get values for. If end is not provided, it defaults to the length of the chromosome. If start is not provided, it defaults to the beginning of the chromosome.

  • end (int, optional) – The range to get values for. If end is not provided, it defaults to the length of the chromosome. If start is not provided, it defaults to the beginning of the chromosome.

Returns:

An iterator of tuples in the format (start: int, end: int, value: float) for BigWigs, or (start: int, end: int, *rest) for BigBeds.

Return type:

Iterator[tuple[int, int, float] or tuple[int, int, ]]

Notes

Missing values in BigWigs will results in non-contiguous records.

See also

zoom_records

Get the zoom records of a given range on a chromosome.

values

Get the values of a given range on a chromosome.

sql(parse=False)

Return the autoSql schema definition of this BBI file.

For BigBeds, this schema comes directly from the autoSql string stored in the file. For BigWigs, the schema generated describes a bedGraph file.

Parameters:

parse (bool, optional [default: False]) – If True, return the schema as a dictionary. If False, return the schema as a string. Default is False.

Returns:

schema – The autoSql schema of the BBI file. If parse is True, the schema is returned as a dictionary of the format:

{
    "name": <declared name>,
    "comment": <declaration coment>,
    "fields": [(<field name>, <field type>, <field comment>), ...],
}

Return type:

str or dict

See also

is_bigwig

Check if the BBI file is a bigWig.

is_bigbed

Check if the BBI file is a bigBed.

info

Get information about the BBI file.

zooms

Get the zoom levels of the BBI file.

values(chrom, start, end, bins=None, summary='mean', exact=False, uncovered=None, oob=nan, fillna=None, arr=None)

Return the values of a given range on a chromosome as a numpy array.

For BigWigs, the returned values or summary statistics are derived from the unique signal values associated with each base.

For BigBeds, the returned values or summary statistics instead are derived from the number of BED intervals overlapping each base.

Parameters:
  • chrom (str) – Name of the chromosome.

  • start (int, optional) – The range to get values for. If end is not provided, it defaults to the length of the chromosome. If start is not provided, it defaults to the beginning of the chromosome.

  • end (int, optional) – The range to get values for. If end is not provided, it defaults to the length of the chromosome. If start is not provided, it defaults to the beginning of the chromosome.

  • bins (int, optional) – If provided, the query interval will be divided into equally spaced bins and the values in each bin will be interpolated or summarized. If not provided, the values will be returned for each base.

  • summary (str, optional [default: "mean"]) – The summary statistic to use. One of mean, std, min, max, sum, sum_squares, bases_covered, bin_covered.

  • exact (bool, optional [default: False]) – If True and bins is specified, return exact summary statistic values instead of interpolating from the optimal zoom level. Default is False.

  • uncovered (float or None, optional [default: None]) – The value assigned to all uncovered bases. If None, uncovered bases are excluded from summary statistic calculations, and empty positions or bins will be returned as NaN (subject to fillna). To treat uncovered bases as having a value of zero in summary statistics (like UCSC’s mean0) set this parameter to 0.0. Empty positions or bins will also be returned as 0.0. Other finite values are also valid and will be used in the same way. This parameter is ignored in the cases of bases_covered and bin_covered summaries since they exclude uncovered bases by definition.

  • oob (float, optional [default: NaN]) – Fill-in value for out-of-bounds regions. Default is NaN.

  • fillna (float or None, optional [default: None]) – Post-rasterization fill applied to in-bounds positions or bins that are returned as NaN due to being empty. Default None leaves NaN values untouched.

  • arr (numpy.ndarray, optional) – If provided, the values will be written to this array or array view. The array must be of the correct size and type.

Returns:

The signal values of the bigwig or bigbed in the specified range.

Return type:

numpy.ndarray

Notes

A BigWig file encodes a step function, and the value at a base is given by the signal value of the unique interval that contains that base.

A BigBed file encodes a collection of (possibly overlapping) intervals which may or may not be associated with quantitative scores. The “value” at given base used here summarizes the number of intervals overlapping that base, not any particular score.

If a number of bins is requested and exact is False, the summarized data is interpolated from the closest available zoom level. If you need accurate summary data and are okay with small trade-off in speed, set exact to True.

See also

records

Get the records of a given range on a chromosome.

zoom_records

Get the zoom records of a given range on a chromosome.

zoom_records(reduction_level, chrom, start=None, end=None)

Return the zoom records of a given range on a chromosome for a given zoom level.

The result is an iterator of tuples. These tuples are in the format (start: int, end: int, summary: dict).

Parameters:
  • reduction_level (int) – The zoom level to use, as a resolution in bases. Use the zooms method to get a list of available zoom levels.

  • chrom (str) – Name of the chromosome.

  • start (int, optional) – The range to get values for. If end is not provided, it defaults to the length of the chromosome. If start is not provided, it defaults to the beginning of the chromosome.

  • end (int, optional) – The range to get values for. If end is not provided, it defaults to the length of the chromosome. If start is not provided, it defaults to the beginning of the chromosome.

Returns:

An iterator of tuples in the format (start: int, end: int, summary: dict).

Return type:

Iterator[tuple[int, int, dict]]

Notes

The summary dictionary contains the following keys

  • total_items: The number of items in the interval.

  • bases_covered: The number of bases covered by the interval.

  • min_val: The minimum value in the interval.

  • max_val: The maximum value in the interval.

  • sum: The sum of all values in the interval.

  • sum_squares: The sum of the squares of all values in the interval.

For BigWigs, the summary statistics are derived from the unique signal values associated with each base in the interval.

For BigBeds, the summary statistics instead are derived from the number of BED intervals overlapping each base in the interval.

See also

zooms

Get a list of available zoom levels.

records

Get the records of a given range on a chromosome.

values

Get the values of a given range on a chromosome.

zooms()

Return a list of sizes in bases of the summary intervals used in each of the zoom levels (i.e. reduction levels) of the BBI file.

Writing

class BigWigWriter

Interface for writing to a BigWig file.

close()

Close the file.

No other operations will be allowed after it is closed. This is done automatically after write is performed.

write(chroms, vals)

Write values to the BigWig file.

The underlying file will be closed automatically when the function completes (and no other operations will be able to be performed).

Parameters:
  • chroms (Dict[str, int]) – A dictionary with keys as chromosome names and values as their length.

  • vals (Iterable[tuple[str, int, int, float]]) – An iterable with values that represents each value to write in the format (chromosome, start, end, value).

Notes

The underlying file will be closed automatically when the function completes, and no other operations will be able to be performed.

class BigBedWriter

Interface for writing to a BigBed file.

close()

Close the file.

No other operations will be allowed after it is closed. This is done automatically after write is performed.

write(chroms, vals, autosql=None)

Write values to the BigBed file.

The underlying file will be closed automatically when the function completes (and no other operations will be able to be performed).

Parameters:
  • chroms (Dict[str, int]) – A dictionary with keys as chromosome names and values as their length.

  • vals (Iterable[tuple[str, int, int, str]]) – An iterable with values that represents each value to write in the format (chromosome, start, end, rest). The rest string should consist of tab-delimited fields.

Notes

The underlying file will be closed automatically when the function completes, and no other operations will be able to be performed.

Iterators

class BigWigIntervalIterator

An iterator for intervals in a bigWig.

It returns only values that exist in the bigWig, skipping any missing intervals.

class BigBedEntriesIterator

An iterator for the entries in a bigBed.

Summary statistics

class SummaryStatistics(size, bases, sum, mean0, mean, min, max)
bases

Alias for field number 1

max

Alias for field number 6

mean

Alias for field number 4

mean0

Alias for field number 3

min

Alias for field number 5

size

Alias for field number 0

sum

Alias for field number 2

Exceptions

exception BBIFileClosed

BBI File is closed.

exception BBIReadError

Error reading BBI file.