ChatGPT解决这个技术问题 Extra ChatGPT

proper/best type for storing latitude and longitude

In a system level programming language like C, C++ or D, what is the best type/encoding for storing latitude and longitude?

The options I see are:

IEEE-754 FP as degrees or radians

degrees or radians stored as a fixed point value in an 32 or 64 bit int

mapping of an integer range to the degree range: -> deg = (360/2^32)*val

degrees, minutes, seconds and fractional seconds stored as bit fields in an int

a struct of some kind.

The easy solution (FP) has the major down side that it has highly non uniform resolution (somewhere in England it can measure in microns, over in Japan, it can't). Also this has all the issues of FP comparison and whatnot. The other options require extra effort in different parts of the data's life cycle. (generation, presentation, calculations etc.)

One interesting option is a floating precision type that where as the Latitude increase it gets more bits and the Longitude gets less (as they get closer together towards the poles).

Related questions that don't quite cover this:

What is the ideal data type to use when storing latitude / longitudes in a MySQL

Working with latitude/longitude values in Java

BTW: 32 bits gives you an E/W resolution at the equator of about 0.3 in. This is close to the scale that high grade GPS setups can work at (IIRC they can get down to about 0.5 in in some modes).

OTOH if the 32 bits is uniformly distributed over the earth's surface, you can index squares of about 344m on a side, 5 Bytes give 21m, 6B->1.3m and 8B->5mm.

I don't have a specific use in mind right now but have worked with this kind of thing before and expect to again, at some point.

You've noted in a couple of answer comments and in this question the issue of resolution. What resolution do you require? It'd also be worth stating what operations you need to perform as well. If you're going to be doing Great Circle calcs then you'll need to convert to a double/float anyway.
Isn't a 64-bit double precision floating point value a better choice than the 32 bit int because it has a greater granularity? This is something you've commented on. It's also way easier to work with.

C
Community

The easiest way is just to store it as a float/double in degrees. Positive for N and E, negative for S and W. Just remember that minutes and seconds are out of 60 (so 31 45'N is 31.75). Its easy to understand what the values are by looking at them and, where necessary, conversion to radians is trivial.

Calculations on latitudes and longitudes such as the Great Circle distance between two coordinates rely heavily on trigonometric functions, which typically use doubles. Any other format is going to rely on another implementation of sine, cosine, atan2 and square root, at a minimum. Arbitrary precision numbers (eg BigDecimal in Java) won't work for this. Something like the int where 2^32 is spread uniformly is going to have similar issues.

The point of uniformity has come up in several comments. On this I shall simply note that the Earth, with respect to longitude, isn't uniform. One arc-second longitude at the Arctic Circle is a shorter distance than at the Equator. Double precision floats give sub-millimetre precision anywhere on Earth. Is this not sufficient? If not, why not?

It'd also be worth noting what you want to do with that information as the types of calculations you require will have an impact on what storage format you use.


a valid point, but not addressing what I was hoping to have addressed.
this answer is what I would have given also - think you need to clarify your question if it does not answer it.
I agree that this answers the question as asked. Most commercial systems that I've used employ FP degrees or radians internally. The lack or uniformity is a function of the definitions of lattitude and longitude, not the way they're stored.
Do not forget that the math functions in C (and most languages) expect arguments in radians. Any system storing the value in degrees will need to convert to radians before using the math functions. That doesn't mean that storing in degrees is wrong; indeed, it is probably best. But be aware.
J
John D. Cook

Longitudes and latitudes are not generally known to any greater precision than a 32-bit float. So if you're concerned about storage space, you can use floats. But in general it's more convenient to work with numbers as doubles.

Radians are more convenient for theoretical math. (For example, the derivative of sine is cosine only when you use radians.) But degrees are typically more familiar and easier for people to interpret, so you might want to stick with degrees.


a valid point, but not addressing what I was hoping to have addressed. (I've edited the question to clarify)
What resolution do you require?
@cletus, Uniform resolution is more of interest than high resolution but 32 bits uniformly spread is within an order of magnitude of anything I see needing.
Bear in mind that when it comes to longitude at least, the earth isn't uniform. 1 second of longitude at the Arctic Circle is a different distance than at the Equator. Why do you want/need uniformity?
FP works because most of the time the resolution that number is known to is proportional to it's magnitude. for Lat/Long the the zero point is totally arbitrary so that proportionality is not there. It seems bad to use n-bits and then in some places burn some of them for useless resolution.
P
Pykler

A Decimal representation with precision of 8 should be more than enough according to this wikipedia article on Decimal Degrees.

0 decimal places, 1.0 = 111 km
...
7 decimal places, 0.0000001 = 1.11 cm
8 decimal places, 0.00000001 = 1.11 mm

log_2(365*10^8) ~= 35 therefor that takes ~70 bits, round to bytes: 9 bytes. Uniformly distributed, 9 bytes can resolve to regions of 0.1 mm^2.
err what? It takes 35 bits and for 7 it takes just under 32 so an int is enough for storing 365 degrees with 7 decimal places if you then convert them to a double when making calculations.
R
Roland Pihlakas

http://www.esri.com/news/arcuser/0400/wdside.html
At the equator, an arc-second of longitude approximately equals an arc-second of latitude, which is 1/60th of a nautical mile (or 101.27 feet or 30.87 meters).

32-bit float contains 23 explicit bits of data.
180 * 3600 requires log2(648000) = 19.305634287546711769425914064259 bits of data. Note that sign bit is stored separately and therefore we need to amount only for 180 degrees.
If you normalize the value 648000 to some power of 2 then the following calculation applies.
After subtracting from 23 the bits for log2(648000) we have remaining extra 3.694365712453288230574085935741 bits for sub-second data.
That is 2 ^ 3.694365712453288230574085935741 = 12.945382716049382716049382716053 parts per second.
Therefore a float data type can have 30.87 / 12.945382716049382716049382716053 ~= 2.38 meters precision at equator.

The above calculation is precise in case you normalize the 180 degrees value to some power of 2. Else assuming that sub-degree precision is stored after the decimal point, the floating point representation will physically use all 8 bits for the degrees part. That leaves 15 bits for the sub-degree precision. Then 15 - log2(3600) makes 3.1862188087829629413518832531256 bits for sub-second data, or 3.3914794921875 ~= 3.39 meters precision at equator. That is about a meter less than normalization would have provided.


V
V. Wheeler

Great question!

I know this question is 9 years old now, and I only know a part of the answer you were seeking, but I just came here having a similar question, and many things have changed since that question was asked, such as hardware and GPSes available. I work with this subject frequently in firmware dealing with different kinds of GPSes in different kinds of applications, and have lost count of the hours (and days) I have spent working out "the best design" for different applications that I have worked with or developed.

As always, different solutions are going to provide benefits and costs, and ultimately, a "best design" is always going to be a "best fit" of the benefits and costs against system requirements. Here are some things that I have to consider when I ask the same question:

CPU Time Cost

If CPU does not have a built-in floating-point co-processor (as is the case with many microcontrollers), then dealing with 'float', 'double', and 'long double' can be extremely costly. For example, with one 16-bit microcontroller I work with regularly, a multiplication using 'double' values costs 326 CPU clock cycles, and a division costs 1193 clock cycles. Very expensive!

Accuracy Trade-Off

At the equator, a 'float' (IEEE-754 32-bit floating point value), needing to represent a signed degree value, assuming 7 "clean" significant decimal digits able to be represented, the change of one least-significant decimal digit (e.g. from 179.9999 to 180.0000) is going to represent a distance of about 11.12 meters. This may or may not meet hard system accuracy requirements. Whereas a 'double' (with 15 "clean" significant decimal digits represented, thus a change from 179.999999999999 to 180.000000000000) represents about 0.00011 mm.

Input Accuracy Limitations

If you're dealing with input from a GPS, how many digits of real accuracy are you getting, and how many do you need to preserve?

Development Time Costs

An IEEE-754 64-bit double-precision value ('double') and 32-bit single-precision value ('float') are VERY convenient to deal with in the C language since math libraries for both come with virtually every C compiler, and are usually very reliable. If your CPU comes with a hardware floating-point processor, this is an easy choice.

RAM and Storage Costs

If you have to keep a large number of these values in RAM (or storage e.g. MYSQL), available RAM (and storage space) might have an impact on the workability of the solution.

Available Data vs Required Data

One example I'm dealing with at this writing (the reason I came here to this question) is that I am dealing with a u-blox M8 GPS which is able to give me binary GPS information (saving the CPU overhead of translating ASCII NMEA sentences). In this binary format (called "UBX Protocol") latitude and longitude are represented as signed 32-bit integers, which representation is able to represent accuracy (at the equator) of down to about 1.11 cm. For example, -105.0269805 degrees longitude is represented as -1050269805 (using all 32 bits) and one LSb change represents about 1.11 cm change in latitude anywhere, and 1.11 cm longitude at the equator (and less at higher latitudes, in proportion to the cosine of the latitude). The application this GPS is in does navigation tasks, which (already existing and well-tested code) requires 'double' data types. Unfortunately, converting this integer to an IEEE-754 64-bit 'double' cannot be easily done just by moving the base-2 bits of the integer into the internal representation bits of the 'double' since the decimal shift to be performed is a base-10 decimal shift. Were it a base-2 decimal shift instead, then the base-2 bits of the integer could be moved into the bit-fields of the 'double' with very little translation required. But alas, this is not the case with the signed integer I have. So it is going to cost me a multiplication on a CPU that doesn't have a hardware floating-point processor: 326 CPU clock cycles.

double   ldLatitude;
int32_t  li32LatFromGps;
ldLatitude = (double)li32LatFromGps * 0.0000001;

Note this multiplication was chosen over this:

ldLatitude = (double)li32LatFromGps / 10000000.0;

because 'double' multiplication is about 3.6X faster than 'double' division on the CPU that I'm dealing with. Such is life in the microcontroller world. :-)

What would have been BRILLIANT (and may be in the future if I can spare the time on weekends) is if the navigation tasks could be done directly with the 32-bit signed integer! Then no conversion would be needed.... But would it cost more to do the navigation tasks with such an integer? CPU costs, probably much more efficient. Development time costs? That's another question, especially with a well-tested system already in place, that uses IEEE-754 64-bit 'double' values! Plus there is already-existing software that provides map data (using 'double' degree values), which software would have to be converted to use the signed integer as well -- not an overnight task!

One VERY interesting option is to directly (without translation) represent intersections between approximations of "rectangles" (actually trapezoids, which become triangles at the poles) using the raw latitude/longitude integers. At the equator these rectangles would have dimensions of approximately 1.11 cm east-west by 1.11 cm north-south, whereas at a latitude of say London, England, the dimensions would be approximately 0.69 cm east-west by 1.11 cm north-south. That may or may not be easy to deal with, depending on what the application needs.

Anyway, I hope these thoughts and discussion help others who are looking at this topic for "the best design" for their system.

Kind regards, Vic


in other words a double.
Note: I have since implemented the navigation tasks to directly use the 32-bit signed integer instead of 'double's. And it's SCREAMINGLY fast!
a
aij

What encoding is "best" really depends on your goals/requirements.

If you are performing arithmetic, floating point latitude,longitude is often quite convenient. Other times cartesian coordinates (ie x,y,z) can be more convenient. For example, if you only cared about points on the surface of earth, you could use an n-vector.

As for longer term storage, IEEE floating point will waste bits for ranges you don't care about (for lat/lon) or for precision you may not care about in the case of cartesian coordinates (unless you want very good precision at the origin for whatever reason). You can of course map either type of coordinates to ints of your preferred size, such that the entire range of said ints covers the range you are interested in at the resolution you care about.

There are of course other things to think about than merely not wasting bits in the encoding. For example, (Geohashes)[https://en.wikipedia.org/wiki/Geohash] have the nice property that it is easy to find other geohashes in the same area. (Most will have the same prefix, and you can compute the prefix the others will have.) Unfortunately, they maintain the same precision in degrees longitude near the equator as near the poles. I'm currently using 64-bit geohashes for storage, which gives about 3 m resolution at the equator.

The Maidenhead Locator System has some similar characteristics, but seems more optimized for communicating locations between humans rather than storing on a computer. (Storing MLS strings would waste a lot of bits for some rather trivial error detection.)

The one system I found that does handle the poles differently is the Military Grid Reference System, although it too seems more human-communications oriented. (And it seems like a pain to convert from or to lat/lon.)

Depending on what you want exactly, you could use something similar to the Universal polar sereographic coordinate system near the poles along with something more computationally sane than UTM for the rest of the world, and use at most one bit to indicate which of the two systems you're using. I say at most one bit, because it's unlikely most of the points you care about would be near the poles. For example, you could use "half a bit" by saying 11 indicates use of the polar system, while 00, 01, and 10 indicate use of the other system, and are part of the representation.

Sorry this is a bit long, but I wanted to save what I had learned recently. Sadly I have not found any standard, sane, and efficient way to represent a point on earth with uniform precision.

Edit: I found another approach which looks a lot more like what you wanted, since it more directly takes advantage of the lower precision needed for longitude closer to the poles. It turns out there is a lot of research on storing normal vectors. Encoding Normal Vectors using Optimized Spherical Coordinates describes such a system for encoding normal vectors while maintaining a minimum level of accuracy, but it could just as well be used for geographical coordinates.


C
Christoph

Might the problems you mentioned with floating point values become an issue? If the answer is no, I'd suggest just using the radians value in double precision - you'll need it if you'll be doing trigonometric calculations anyway.

If there might be an issue with precision loss when using doubles or you won't be doing trigonometry, I'd suggest your solution of mapping to an integer range - this will give you the best resolution, can easily be converted to whatever display format you're locale will be using and - after choosing an appropriate 0-meridian - can be used to convert to floating point values of high precision.

PS: I've always wondered why there seems to be no one who uses geocentric spherical coordinates - they should be reasonably close to the geographical coordinates, and won't require all this fancy math on spheroids to do computations; for fun, I wanted to convert Gauss-Krüger-Koordinaten (which are in use by the German Katasteramt) to GPS coordinates - let me tell you, that was ugly: one uses the Bessel ellipsoid, the other WGS84, and the Gauss-Krüger mapping itself is pretty crazy on it's own...


@Greg: Gauss-Krüger coordinates are derived from cylinder projections but 'enhanced' so that you get meaningful results when using your ruler on a map (for certain values of 'meaningful' ;)). It all began when I thought: hey, this shouldn't be too complicated...
G
Greg Hewgill

0.3 inch resolution is getting down to the point where earthquakes over a few years make a difference. You may want to reconsider why you believe you need such fine resolution worldwide.

Some of the spreading centres in the Pacific Ocean change by as much as 15 cm/year.


0.3in is for uniform cases. with a 32bit FP who known what you get as lots (most?) of values are no longer valid.
J
John W. Phillips

A Java program for comuting max rounding error in meters from casting lat/long values into Float/Double:

import java.util.*;
import java.lang.*;
import com.javadocmd.simplelatlng.*;
import com.javadocmd.simplelatlng.util.*;

public class MaxError {
  public static void main(String[] args) {
    Float flng = 180f;
    Float flat = 0f;
    LatLng fpos = new LatLng(flat, flng);
    double flatprime = Float.intBitsToFloat(Float.floatToIntBits(flat) ^ 1);
    double flngprime = Float.intBitsToFloat(Float.floatToIntBits(flng) ^ 1);
    LatLng fposprime = new LatLng(flatprime, flngprime);

    double fdistanceM = LatLngTool.distance(fpos, fposprime, LengthUnit.METER);
    System.out.println("Float max error (meters): " + fdistanceM);

    Double dlng = 180d;
    Double dlat = 0d;
    LatLng dpos = new LatLng(dlat, dlng);
    double dlatprime = Double.longBitsToDouble(Double.doubleToLongBits(dlat) ^ 1);
    double dlngprime = Double.longBitsToDouble(Double.doubleToLongBits(dlng) ^ 1);
    LatLng dposprime = new LatLng(dlatprime, dlngprime);

    double ddistanceM = LatLngTool.distance(dpos, dposprime, LengthUnit.METER);
    System.out.println("Double max error (meters): " + ddistanceM);
  }
}

Output:

Float max error (meters): 1.7791213425235692
Double max error (meters): 0.11119508289500799

A
Alex Medveshchek

As @Roland Pihlakas already pointed out, it depends at what precision are you going to use your coords.

I'd just suggest an alternative point of view:

Earth's equatorial circumference (perimeter) is 40.000 Km;

This equals to 40M meters, or 4 Billions of centimetres;

32-bit variable contains 2^32 or ~4.2 Billions different values, which is a bit more than number of centimetres in mentioned circumference.

That means, if we'd choose 32-bit integer values for latitude and longitude, it'll allow us to address a point on Earth with a precision of < than 1 centimetre.

With float values: float32 contains 23 significant bits => ~4.7 metres precision float64 contains 52 significant bits => < 1 mm precision

float32 contains 23 significant bits => ~4.7 metres precision

float64 contains 52 significant bits => < 1 mm precision


n
natevw

If by "storing" you mean "holding in memory", the real question is: what are you going to do with them?

I suspect that before these coordinates do anything interesting, they will have been funnelled as radians through the functions in math.h. Unless you plan on implementing quite a few transcendental functions that operate on Deg/Min/Secs packed into a bit field.

So why not keep things simple and just store them in IEEE-754 degrees or radians at the precision of your requirements?


For in memory, yah, not much will beat IEEE. On the other hand If you are storing lots of points (say high resolution vector maps) on disk or shipping them across a wire...
A
Augustin

The following code packs the WGS84 coordinates losslessly coordinates into an unsigned long (i.e. into 8 bytes):

using System;
using System.Collections.Generic;
using System.Text;

namespace Utils
{
    /// <summary>
    /// Lossless conversion of OSM coordinates to a simple long.
    /// </summary>
    unsafe class CoordinateStore
    {
        private readonly double _lat, _lon;
        private readonly long _encoded;

        public CoordinateStore(double lon,double lat)
        {
            // Ensure valid lat/lon
            if (lon < -180.0) lon = 180.0+(lon+180.0); else if (lon > 180.0) lon = -180.0 + (lon-180.0);
            if (lat < -90.0) lat = 90.0 + (lat + 90.0); else if (lat > 90.0) lat = -90.0 + (lat - 90.0);

            _lon = lon; _lat = lat;

            // Move to 0..(180/90)
            var dlon = (decimal)lon + 180m;
            var dlat = (decimal)lat + 90m;

            // Calculate grid
            var grid = (((int)dlat) * 360) + ((int)dlon);

            // Get local offset
            var ilon = (uint)((dlon - (int)(dlon))*10000000m);
            var ilat = (uint)((dlat - (int)(dlat))*10000000m);

            var encoded = new byte[8];
            fixed (byte* pEncoded = &encoded[0])
            {
                ((ushort*)pEncoded)[0] = (ushort) grid;
                ((ushort*)pEncoded)[1] = (ushort)(ilon&0xFFFF);
                ((ushort*)pEncoded)[2] = (ushort)(ilat&0xFFFF);
                pEncoded[6] = (byte)((ilon >> 16)&0xFF);
                pEncoded[7] = (byte)((ilat >> 16)&0xFF);

                _encoded = ((long*) pEncoded)[0];
            }
        }

        public CoordinateStore(long source)
        {
            // Extract grid and local offset
            int grid;
            decimal ilon, ilat;
            var encoded = new byte[8];
            fixed(byte *pEncoded = &encoded[0])
            {
                ((long*) pEncoded)[0] = source;
                grid = ((ushort*) pEncoded)[0];
                ilon = ((ushort*)pEncoded)[1] + (((uint)pEncoded[6]) << 16);
                ilat = ((ushort*)pEncoded)[2] + (((uint)pEncoded[7]) << 16);
            }

            // Recalculate 0..(180/90) coordinates
            var dlon = (uint)(grid % 360) + (ilon / 10000000m);
            var dlat = (uint)(grid / 360) + (ilat / 10000000m);

            // Returns to WGS84
            _lon = (double)(dlon - 180m);
            _lat = (double)(dlat - 90m);
        }

        public double Lon { get { return _lon; } }
        public double Lat { get { return _lat; } }
        public long   Encoded { get { return _encoded; } }


        public static long PackCoord(double lon,double lat)
        {
            return (new CoordinateStore(lon, lat)).Encoded;
        }
        public static KeyValuePair<double, double> UnPackCoord(long coord)
        {
            var tmp = new CoordinateStore(coord);
            return new KeyValuePair<double, double>(tmp.Lat,tmp.Lon);
        }
    }
}

Source: http://www.dupuis.me/node/35


You can't "losslessly" pack even the mantissas for the two doubles (104 bits) into 64 bits. Pigeonhole principle.
@BCS: But GPS is not taking advantage of the full possible range of the doubles.
You didn't say anything about GPS. It might be true that the above preservers more accuracy than a typical GPS provides, but that's not what you said. -- Also note, that uniformly distributed, 8 bytes has a maximum precision over the earths surface of ~5mm, which is close to the ~10mm that is practically obtainable from high end GPS (and longer than what's obtainable via other tools). -- Also, I know GPS doesn't use the full range, that's why I only counted the mantissas bits.
A
AlejandroAlis

Best precision at smallest size is int32.

Storing 7 decimal places (1.11 cm error) longitude double number give you a +/-1.800.000.000 number, perfect to store in int32, you only have to multiply double number by 10M like

int32_t lng = (int32_t)(double_lng * 10000000);

Explanation (wikipedia)

The equator is divided into 360 degrees of longitude, so each degree at the equator represents 111,319.5 m (111.32 km). As one moves away from the equator towards a pole, however, one degree of longitude is multiplied by the cosine of the latitude, decreasing the distance, approaching zero at the pole. The number of decimal places required for a 1cm precision at equator is 7. If you need to store 180º with 7 decimal places in a integer the result will be 1.800.000.000 number that is in the range of 32 bit integer.

As you can see in Google Maps when you click on any place Golge give you a 6 decimal places float number that fits on 32 bit integer.

Comparison:

vs double -> Half size vs float -> Float don't have enough precision vs 24bit suggestion: 24 bit is not addressable by any 32 or 64 bit processor, you have to get the three bytes and then convert to int32 or double an then opperate, lot of cicles lost and many lines of code


That doesn't address latitude. 32 bits uniformly distributed over the surface of the earth gives about 0.1 km^2 granularity; ~300m. Which is rather granular. Extending your encoding to do the same for latitude requires 64 bits which can resolve down to 27mm^2 which is about half what your encoding would give. An ideal encoding for ~1cm resolution should be possible with ~62 bits.
What makes 6-7 digest per dimension ideal? If you don't mind ~2m accuracy you can go with 4-5 (i.e. 24bits/dim) and save two bytes. Why isn't that ideal? I'm not saying your encoding won't work, I'm saying you have made (very nearly) no attempt to say why it should be used over alternatives.
FWIW: using bit fields in C/C++ allows compact storage with a clean syntax at any bit with, and for many modern CPUs the bit twiddling to extract unaligned values is only visible after pipe-lining if the data is already in L1 cache, and possibly not even then.
t
tegtmeye

After coming across this question after searching for an answer myself, here is another possible scheme based on some precedent.

The Network Working Group, RFC 3825 proposed a coordinate-based geographic location option for DHCP (ie the system that hands out IP addresses on a network). See https://tools.ietf.org/rfc/rfc3825.txt

In this scheme, latitude and longitude are encoded in degrees with fixed-point values where the first 9 bits are the signed degrees, 25 bits are fractional degrees, and 6 bits are used for the accuracy. The value of the accuracy bits indicates the number of the 25 fractional bits that are considered to be accurate (e.g. coordinates collected via a consumer GPS vs a high-precision surveyor's GPS). Using WGS84, the accuracy is 8 decimal digits which is good to about a millimeter regardless of where you are on the globe.

As a couple of others have posted, floating point encoding really isn't good for this type of thing. Yes, it can represent a very large number of decimal places but the accuracy is either ignored or has to be dealt with somewhere else. For example, printing a float or a double with full floating-point precision results in a number with decimal digits very very unlikely to be remotely accurate. Likewise, simply outputting a float or a double with 8 or 10 decimal digits of precision many not be a true representation of the source values based on how floating point numbers are computed (e.g. why 1.2-1.0 does not equal 0.2 using floating point arithmetic).

For for a humorous example of why you should care about coordinate-system precision, see https://xkcd.com/2170/.

Granted, the 40-bit encoding used in RFC 3825 is hardly convenient in a 32 or 64-bit world but this style can be easily extended to a 64-bit number where 9 bits are used for the signed degree, 6 bits are used for the accuracy, leaving 49 bits for the decimal portion. This results in 15 decimal digits of precision which is more than basically anyone will ever need (see humorous example).


b
bluish

You can use decimal datatype:

CREATE TABLE IF NOT EXISTS `map` (
  `latitude` decimal(18,15) DEFAULT NULL,
  `longitude` decimal(18,15) DEFAULT NULL 
);