Gapless Digital Audio Playback – One Solution

Mind That Gap

This has post has nothing whatsoever to do with developments in our T2A API, other than in that our developers like listening to digital music.

One long-recognised problem with the listening experience of an album which has been encoded as multiple mp3 files (or wma, aac etc) is that of unwanted gaps. If your preference is for a collection of separate songs, this does not affect you, but listeners of music collections where tracks segue into each other, would not want there to be any gap when listening to that album as a collection of digital audio files.

Why Gaps are Heard

The main reason is that when an mp3 or some other format lossless compression audio file is created, a short silence is created at the start and end of the track. Some audio formats include information to allow playing hardware to compensate for this, but mp3 does not.

Solving with Hardware

Some newer equipment is able to achieve gapless playback of multiple tracks, either by using  a crossfade, or by using information embedded in the audio file to allow it to compensate for any gaps at the start and end of each track, where the audio file format includes that information.

If your mp3 playing device leaves gaps in audio that should be gapless, read on.

Solving with Software – Create An Audio Book

Introduction

A cumbersome but otherwise completely successful method is now demonstrated.

By creating an audio book containing a single file with no gaps but with chapters to denote the positions of the former individual tracks, we will achieve gapless playback, provided that we have a player that supports the chosen format, and supports the selection of chapters for playback.

Audio Book Format

For this demonstration we created an .m4b file. This is actually idential to an .m4a, which is an mpeg-4 audio file using the AAC codec. The m4b extension was created so that Apple’s iTunes software and iPod players can recognize the file as an audio book rather than a normal audio track and thus allow “bookmarking” the file

An m4b is thus best suited, as one might expect, to Apple devices and Apple software on other devices, but other software support for m4b files with chapters does exist for other devices.

We looked briefly at other formats; .wma and .aac support chapters, as does .mp3 with a later id3v2 addition. Support for these formats is poor both in terms of encoding software and hardware compatibility.

Step by Step

We chose for this demonstration a well known album in which some tracks should have no gaps; The Dark Side of the Moon” by Pink Floyd opens with 3 gapless tracks.

Below is a screenshot of the audio book that we created, playing in Apple’sQuicktime, on a PC.

Note the chapters (which includes 2 extra ones to represent positions within original tracks).

Prepare The Full File

  1. Rip your copy of the CD to a lossless format, such as WAV. This should ensure that there are no unwanted gaps at the start or end of each track. The perfect reproduction of the CD will allow compression to the final format file; recompression of a compressed file should not be executed.
  2. Use a suitable editor to join the lossless tracks together.
  3. Play the joined tracks file, ensure there are no gaps.

You now have a single lossless file, which if it is a stereo 44k PCM audio file is about 650Mb for an hours worth of music.

Choose an Encoder

We used the multiformat video and audio encoder XMedia Recode is a free application. It supports the .m4a format with chapters. We created an m4a and then simply renamed the finished file to m4b.

One alternative means to add the chapters is mp4box which we have not tested.

Create the m4a

Load the complete file. Specify chapters by start and end time, and your chosen chapter name. You may wish, as we did with our Pink Floyd demonstration, insert some extra chapters to allow navigation to a point within a track.

Select a suitable quality for the m4a encoding; the default quality is 128Kkbps but we doubled the bandwidth to 256kbps for our m4a.

XMedia Recode will also allow you to specify title, year and other information about the audiobook.

Rename to m4b

As we have seen, m4a files are idential to m4b. When the m4a encoding is completed. rename the extension to .m4b.

The m4b file is now complete.

A short clip (27 seconds, approx 700Kb) from our m4b audiobook is available here to download. This comprises the 15 seconds before our extra chapter marking the “Breathe (reprise)” section of “Time”, and the first 12 seconds of this section.

Play with Quicktime

If you have Apple’s Quicktime application on your PC, play the m4b using QT; you should see and be able to select from the named chapters, and thus be able to select your track of choice.

Using in iTunes and with Apple Devices

Using iTunes in order to upload the m4b to an iPhone, iPod or iPad, you will note that the m4b does not appear in the “music” section – look in the “books” section. Drag it over to your connected device.

We tested our m4b on an iPhone 4S and an iPad. It works especially well on the former, allowing easy navigation to the chapters / tracks.

iTunes also facilitates the easy addition of album artwork to the m4b file.

Conclusion

This is an effective but quite cumbersome means to achieve gapless playback.

If you’re fed up with annoying gaps, have equipment that supports an audiobook format with chapters, and are dedicated enough, this approach may work for you.

Working with ISO-8859-1 and Unicode Character Sets

Introduction

This article gives a brief and not too technical explanation of character encoding, and of the titular character encoding methods. I also outline how to work with the methods, how to fix some common problems and how to choose which encoding system to use.

Why is Character Encoding Important?

If a web developer includes an image in some HTML markup, he/she does not have to specify in what fomat the media was saved – the browser rendering engine will interpret that using a signature in the media file; similarly a media player will interpret a video file to discover which format the file is in.

Unfortunately character strings have no signature that allows the processing engine to automatically determine the format of the character encoding, in situations where multiple formats may be encountered, such as a web page, or if .NET or Java process external text files. The developer needs to inform the relevant engine what the character encoding format is.

When Character Encoding Goes Bad

This is a common sight on web pages:-

The price is �100 or about �120

… or the same text showing a different error:-

The price is £100 or about €120

The correct text should displayed as:-

The price is £100 or about €120

See below for a detailed explaination of the problem and the solution.

What is Character Encoding?

Character encoding is the means by which the characters are stored in a sequence (or stream) of bytes.

One Byte Per Character

The simplest format is the use of a single byte for a character giving 255 possible characters, 0 is usually the terminating character..This is sufficient to display most characters in most western languages, or most characters in any given language.

Two Bytes Per Character

If you have ever programmed in Java or .NET, you will almost certainly have encountered 2 byte (or 16 bit) character encoding, since strings are handled internally in this format. This allows the representation of 65535 characters which may initially seem to be sufficient to represent every possible character in written worldwide culture, and it usually is, but not always.

Unicode

Unicode simplifies things by allowing any character to be displayed within a single and huge character encoding system, which includes thousands of characters, more than can be represented by a 16 bit character encoding.

It also provides a more space efficient format than the aforementioned 16 bit encoding scheme, the popular UTF-8, and you may also encounter  UTF-16 or even UTF-32.

For a more detailed explaination of Unicode see our earlier blog on the subject.

ISO-8859-1 Encoding

ISO-8859-1 is actually a subset of Unicode. It comprises the first 255 Unicode characters (see below for the full character set) and is also sometimes known as Latin-1 since it features most of the characters that are used by Western European languages.

(The developer should be aware that the first 127 characters are encoded identically in ISO-8859-1 and UTF-8, as a single byte).

Many web pages created by English and other Western European language speakers are still encoded in ISO-8859-1, since this is sufficient to represent any possible character that they wish to display.

ISO-8859-1 vs UTF-8

When faced with the choice of character encoding, the choice is between flexibility and storage space and simplicity.

If only ISO-8859-1 characters are to be used in a project (such as a website), then ISO-8859-1 does offer a slight benefit in terms of storage space, and therefore in the case of a web page, of download size.

Fans of the Swedish/Danish TV show The Bridge will be familiar with the events contained in this sample string:-

Saga Norén leaves Malmö and crosses the Øresund Bridge

The text above comprises 54 characters. All the characters are present within the ISO-8859-1 character set, and so the string can be stored as 54 bytes using a simple one character per byte encoding.

If however the string is stored in UTF-8, it requires 57 bytes. This is because the three non-English characters (which are outside of the lower 1-127 range) are stored in two bytes using UTF-8. There is thus a slight space advantage.

I would nevertheless choose UTF-8 to give flexibility to show any possible future characters. Unicode wins.

Web Page Character Encoding Errors Explained

Remember the incorrectly displayed web page text shown above?

Error 1 was:-

The price is �100 or about �120

Error 2 was:-

The price is £100 or about €120

What has gone wrong? Well, the first example shows what happens when text that has been encoded as ISO-8859-1 is displayed on a web page which has told the viewing web browser that the contents are encoded as UTF-8.

The characters £ and are outside of the lower range (1-127) and are therefore encoded differently in UTF-8 and ISO-8859-1.

The second example shows the opposite; text encoded as UTF-8 is displayed in a page which has informed the web browser that the contents are encoded in ISO-8859-1.

Put simply, the web page encoding information does not match the contents, and horrid errors are shown.

In order to display this correct text…

The price is £100 or about €120

.. the simple solution to both problems is to establish which encoding should be used, and then within the

<head>

…of an HTML 4 or earlier page, use

<meta http-equiv="content-type" content="text/html;
charset=utf-8" />

…to specify UTF-8 contents or

<meta http-equiv="content-type" content="text/html;
charset=iso-8859-1" />

…if the contents are ISO-8859-1.

For HTML 5 specifying the character set is simpler:-

<meta charset="utf-8" />

The above code fragments are suitable for flat HTML pages; PHP programmers would use

header("Content-Type: text/html;charset=utf-8");

and a JSP page would use

<%@ page contentType="text/html;charset=UTF-8" %>

…to show just a couple of common examples.

Working with Text Files

A simple text file, as we have seen, carries no header or signature to indicate in what encoding format the text was saved. The programmer should determine that encoding format carefully.

For example, to read an ISO-8859-1 text file containing our 54 character sentence above, in C#, you would:-

StreamReader tr = null;

try
{
   tr = new StreamReader("saga.txt",
                          Encoding.GetEncoding("iso-8859-1"));
   String testline = tr.ReadLine();
}
catch
{
}
finally
{
   tr.Close();

}

The above code will ensure that the non-English characters are read correctly into the .NET String class instance.

Reference: The ISO-8859-1 Character Set

These are the displayable characters in the ISO-8859-1 character set, with their Hexadecimal values. Characters 0x20 (space) to 0xff are shown.

Character Hex Character Hex Character Hex Character Hex
20 ! 21 22 # 23
$ 24 % 25 & 26 27
( 28 ) 29 * 2A + 2B
, 2C 2D . 2E / 2F
0 30 1 31 2 32 3 33
4 34 5 35 6 36 7 37
8 38 9 39 : 3A ; 3B
< 3C = 3D > 3E ? 3F
@ 40 A 41 B 42 C 43
D 44 E 45 F 46 G 47
H 48 I 49 J 4A K 4B
L 4C M 4D N 4E O 4F
P 50 Q 51 R 52 S 53
T 54 U 55 V 56 W 57
X 58 Y 59 Z 5A [ 5B
\ 5C ] 5D ^ 5E _ 5F
` 60 a 61 b 62 c 63
d 64 e 65 f 66 g 67
h 68 i 69 j 6A k 6B
l 6C m 6D n 6E o 6F
p 70 q 71 r 72 s 73
t 74 u 75 v 76 w 77
x 78 y 79 z 7A { 7B
| 7C } 7D ~ 7E  7F
80  81 82 ƒ 83
84 85 86 87
ˆ 88 89 Š 8A 8B
Œ 8C  8D Ž 8E  8F
 90 91 92 93
94 95 96 97
˜ 98 99 š 9A 9B
œ 9C  9D ž 9E Ÿ 9F
A0 ¡ A1 ¢ A2 £ A3
¤ A4 ¥ A5 ¦ A6 § A7
¨ A8 © A9 ª AA « AB
¬ AC ­ AD ® AE ¯ AF
° B0 ± B1 ² B2 ³ B3
´ B4 µ B5 B6 · B7
¸ B8 ¹ B9 º BA » BB
¼ BC ½ BD ¾ BE ¿ BF
À C0 Á C1 Â C2 Ã C3
Ä C4 Å C5 Æ C6 Ç C7
È C8 É C9 Ê CA Ë CB
Ì CC Í CD Î CE Ï CF
Ð D0 Ñ D1 Ò D2 Ó D3
Ô D4 Õ D5 Ö D6 × D7
Ø D8 Ù D9 Ú DA Û DB
Ü DC Ý DD Þ DE ß DF
à E0 á E1 â E2 ã E3
ä E4 å E5 æ E6 ç E7
è E8 é E9 ê EA ë EB
ì EC í ED î EE ï EF
ð F0 ñ F1 ò F2 ó F3
ô F4 õ F5 ö F6 ÷ F7
ø F8 ù F9 ú FA û FB
ü FC ý FD þ FE ÿ FF

Converting Fixed-Width Text Files To CSV in C++ (and for free)

C++ Logo

Large Padded Data

A recent data acquisition brought forth the requirement to process fixed-width text files that comprise the data. This would not have been much of a discussion point were it not for the fact that some of the files were huge – 60Gb in one case. Most of these large files comprise the space character, serving as padding for the fixed-width fields; this serves to illustrate how inefficient fixed-width text files are, but that is not the point we’re making here today.

Converting Using Existing Applications

We decided to convert each file into a CSV file, which can easily be read, edited and loaded into a database. There are applications that are able to convert massive text files to CSV, but during our brief trial of a few programs we found:-

  • One application was unreliable (crashed)
  • One application was buggy (unable to handle commas and/or quotes)
  • One application was expensive (several hundred dollars)

… and anyway, programming your own is much more fun. We created a C++ application in order to process these large files as quickly as possible.

Our C++ Application

We created a C++ command line application called texttocsv. We complied it as a Windows 32 executable, but the code is ANSI C++, used no Windows API code, uses no other libraries, and will compile for other operating systems.

texttocsv can read an 8 bit character fixed with text file (there is no support for Unicode) and quickly convert each row to CSV. It will enclose each field in quotes only where needed, and correctly escape quotes within fields.

A Short Example

Consider the following fixed-width text file (test.txt) with two fixed columns of 20 and 50 characters:-

Jupiter             A planet                 
Andromeda           A "nearby" galaxy
Sirius              A star
Eros                An asteroid, a rocky body
Titan               A moon

…we used texttocsv.exe to create the following CSV (test.csv):-

Jupiter,A planet
Andromeda,"A ""nearby"" galaxy"
Sirius,A star
Eros,"An asteroid, a rocky body"
Titan,A moon

Note that the “Andromeda” row has the quotes around “nearby” correctly escaped.

How To Use

The parameters for texttocsv are:-

  1. Destination file
  2. Source file
  3. List of column widths in the source file, separated by a comma
  4. Start column number, 0 based. This parameter is optional.

The example above was created with the following command:-

texttocsv.exe test.csv test.txt 20,50

Our C++ Source Code

Introduction

The application was created in ANSI C++. We followed the RAII programming technique, creating classes whose destructors release their resources, and which cannot be created using new.

Execute Sequence

The program is entered at main, and receives the parameters as entered by the user. The paramaters are read, and any missing paramters are reported to the user. If all mandatory parameters are present, an instance of CMyProcess (which is derived from CTextToCSV) is created on the stack. The member function Process is executed, and the returned error code is checked and reported.

CTextToCSV Class

The processing is done using an instance of a class derived from CTextToCSV. The derived class should include the method Progress which is invoked for each row processed; our application simply reports to console every 10000 rows, but if the class was used in a GUI, a progress bar could be displayed. The class must be created on the stack – new is not permitted.

CTextToCSV includes some error codes in an enum named ErrCode. The project code should normally interpret the errors and display to the user, as our simple application does.

CTextToCSV will close any open files in its destructor, and the instances of the other classes used (CCharBuffer and CWidthList) free their allocated buffers in their own destructors. Each class instance is created on the stack, guaranteeing that any resources are released.

The classes include buffer overwrite protection. Memory consumption is modest and is proportional to total width of the fixed text file, and memory allocation errors (if the system is short on memory) are handled, returning the ErrCode value OutOfMem.

All memory allocated is freed by the class destructors.

The Code

#include
#include
#include

//class to hold the array of column widths
class CWidthList
{
private:
   unsigned int* m_Column_Width;
   unsigned int m_Num_Cols;
   unsigned int m_Total_Width;

   //prevent new from being used - force
   //any instance to be on the stack
   void * operator new (size_t);
   void * operator new[] (size_t);

public:
   //return width of column number (zero based)
   //returns 0 if column number was invalid
   unsigned int GetWidth(unsigned int column_number)
   {
      if(column_number>=0 && column_numberm_Column_Width[column_number]);
      }

      return 0;
   }

   //return total width of columns
   unsigned int GetTotalWidth(void)
   {
      return(this->m_Total_Width);
   }

   //return the number of columns
   unsigned int GetNumCols(void)
   {
      return(this->m_Num_Cols);
   }

   //parse the widths string
   //returns false if out of memory
   bool ParseWidths(const char* p_widths)
   {
      unsigned int total = 0;

      unsigned int i=0;
      unsigned int len=strlen(p_widths);

      //count number of commas in the width string
      this->m_Num_Cols=1;
      while(im_Num_Cols++;
         }
         i++;
      }

      //create the buffer for the column widths

      try
      {
         this->m_Column_Width=new unsigned int[this->m_Num_Cols];
      }
      catch (std::bad_alloc)
      {
         //out of memory for the buffer
         return false;
      }

      //set all widths to 0
      i=0;
      while(i      {
         m_Column_Width[i]=0;
         i++;
      }

      char val[8];
      int valpos=0;
      int w;
      i=0;
      unsigned int cur_col=0;
      while(im_Column_Width[cur_col]=w;
            total+=w;
            cur_col++;
            valpos=0;
         }
         else if(c>='0' && cm_Column_Width[cur_col] = w;
      total += w;
      this->m_Total_Width=total;

      return true;
   }

};

//simple 8 bit character buffer class
class CCharBuffer
{
private:

   char* m_Data;
   unsigned int m_Max_Chars;      //maximum number of
                                  //chars allowed
   unsigned int m_Pos;            //current write pos

   //prevent new from being used -
   //force any instance to be on the stack
   void * operator new (size_t);
   void * operator new[] (size_t);

public:

   //constructor
   CCharBuffer()
   {
      try
      {
         //allocate num chars plus an extra
         this->m_Max_Chars = 32;
         this->m_Data=new char[this->m_Max_Chars+1];
      }
      catch (std::bad_alloc)
      {
         //out of memory when creating the buffer
         //so mark the buffer as not created
         this->m_Data=NULL;
         this->m_Max_Chars = 0;
      }

      this->m_Pos=0;
   }

   //destructor
   ~CCharBuffer()
   {
      //free the allocated buffer
      if(this->m_Data!=NULL)
      {
         delete this->m_Data;
         this->m_Data=NULL;
      }
   }

   //return false if failed to allocate a buffer
   bool CheckSpace(const unsigned int num_chars)
   {
      if(num_chars m_Max_Chars)
      {
         return true;
      }

      char* new_dest = NULL;

      try
      {
         //allocate num chars plus an extra
         new_dest = new char[num_chars+1];
      }
      catch (std::bad_alloc)
      {
         //out of memory when resizing destination buffer
         return false;
      }

      if(this->m_Pos>0 && this->m_Data!=NULL)
      {
         //copy the existing data into the new buffer
         memcpy(new_dest,this->m_Data,this->m_Pos);
      }

      if(this->m_Data!=NULL)
      {
         delete this->m_Data;      //delete the OLD buffer
                                   //(if it existed)
      }

      this->m_Data=new_dest;         //and use the new one
      this->m_Max_Chars=num_chars;

      return true;
   }

   //specify the current position
   void SetPos(const unsigned int val)
   {
      this->m_Pos=val;
      //ensure poos
      if(valm_Pos=0;
      }
      else if(val>=this->m_Max_Chars)
      {
         this->m_Pos=this->m_Max_Chars-1;
      }
   }

   const unsigned int GetPos(void)
   {
      return this->m_Pos;
   }

   //make space and add a character
   //returns false if failed to make space
   bool Add(const char c)
   {
      if(CheckSpace(this->m_Pos+1)==false)
      {
         return false;
      }

      this->m_Data[this->m_Pos]=c;
      this->m_Pos++;

      return true;
   }

   //make space and add a string
   bool Add(const char* src,const unsigned int num_chars)
   {
      if(CheckSpace(this->m_Pos+num_chars)==false)
      {
         return false;
      }

      memcpy(this->m_Data+this->m_Pos,src,num_chars);
      this->m_Pos+=num_chars;

      return true;
   }

   //read pointer to the buffer - only valid while the
   //instance is in scope
   char* Read(void)
   {
      return this->m_Data;
   }
};

//class to process a fixed width text file into a CSV
class CTextToCSV
{
public:

   //error codes
   enum ErrCode
   {
      None = 0,
      OutOfMem,
      FileNotFound,
      FileOpenForWriteFailed
   };

private:
//members
   FILE* m_Dest;            //destination (CSV) file
   FILE* m_Src;            //source text file

   CCharBuffer m_Src_Buffer;
   CCharBuffer m_Dest_Buffer;
   CWidthList m_Width;

   unsigned int m_Start_Col;   //start column (optional)

protected:
   //prevent new from being used - force any
   //instance to be on the stack
   void * operator new (size_t);
   void * operator new[] (size_t);

//private nethods
private:

   //read field into m_Dest_Buffer
   //returns OutOfMem if failed to resize dest buffer
   //
   ErrCode ReadField(const int curpos,const int width)
   {
   //first, scan src to get the trimmed extents
   //and to discover if comma is present

      int start=curpos;
      int end=curpos+width;

      //read
      const char* src_buf=this->m_Src_Buffer.Read();

      while(startstart)
      {
         if(src_buf[end]!=0x20)
         {
            //non space found
            break;
         }
         end--;
      }
      //start and end are inclusive

      bool enclose_in_commas = false;
      int i=start;
      while(im_Dest_Buffer.Add(
			 src_buf+start,bytes_to_copy)==false)
         {
            //insufficient space in the destination buffer
            return OutOfMem;
         }
      }
      else
      {
         //enclose in quotes and escape any double quote character

         //add opening quotes
         if(this->m_Dest_Buffer.Add('"')==false)
         {
            //insufficient space in the destination buffer
            return OutOfMem;
         }

         //copy all characters and escape any double quote
         while(startm_Dest_Buffer.Add('"');
               this->m_Dest_Buffer.Add('"');
            }
            else
            {
               //simply add the character
               if(this->m_Dest_Buffer.Add(src_buf[start])==false)
               {
                  //out of memory
                  return OutOfMem;
               }
            }
            start++;
         }

         //add closing quotes
         if(this->m_Dest_Buffer.Add('"')==false)
         {
            return OutOfMem;
         }
      }
      return None;
   }

   //process each row
   //returns ErrCode (normally None)
   ErrCode ProcessRow(void)
   {

      this->m_Dest_Buffer.SetPos(0);

      //if a CR is found, terminate the src buffer before it
      char* src_buf=this->m_Src_Buffer.Read();

      char* sp=strstr(src_buf,"\r");
      if(sp!=NULL)
      {
         sp[0]=0;
      }

      //pad the src buffer with spaces
      int len=strlen(src_buf);
      int pad_len=this->m_Width.GetTotalWidth()-len;
      if(pad_len>0)
      {
         //pad with spaces
         sp=src_buf + len;
         memset(sp,0x20,pad_len);
      }

      //read each field
      unsigned int x=0;
      int curpos=0;

      if(this->m_Start_Col>0 &&
      this->m_Start_Colm_Width.GetNumCols())
      {
         //specified start column is valid
         while(x < this->m_Start_Col)
         {
            curpos += this->m_Width.GetWidth(x);
            x++;
         }
      }

      while(x < this->m_Width.GetNumCols())
      {
         if(ReadField(curpos,
            this->m_Width.GetWidth(x))==OutOfMem)
         {
            //failed to read a field due to memory failure
            return OutOfMem;
         }

         x++;

         //add a comma UNLESS this is the last field
         if(xm_Width.GetNumCols())
         {
            if(this->m_Dest_Buffer.Add(',')==false)
            {
            //insufficient space in the destination buffer
               return OutOfMem;
            }
            //add the width of previous column
            curpos += this->m_Width.GetWidth(x-1);

         }

      }

      fwrite(this->m_Dest_Buffer.Read(),
             1,
             this->m_Dest_Buffer.GetPos(),
             this->m_Dest);

      fwrite("\r\n",1,2,this->m_Dest);

      return None;

   }

   //close files
   void Close(void)
   {
      //close src file
      if(this->m_Src!=NULL)
      {
         fclose(this->m_Src);
         this->m_Src=NULL;
      }

      //close dest file
      if(this->m_Dest!=NULL)
      {
         fclose(this->m_Dest);
         this->m_Dest=NULL;
      }

   }

protected:
   //process progress report as each row is read
   virtual void Progress(unsigned int row_num)
   {
   }

public:
   //constructor
   CTextToCSV()
   {
      this->m_Dest = NULL;
      this->m_Src = NULL;
   }

   //destructor
   ~CTextToCSV()
   {
      //close open files and free allocated buffers
      Close();
   }

   ErrCode Process(const char* p_dest_file,
                   const char* p_src_file,
                   const char* p_widths,
                   const char* p_start_col)
   {
      this->m_Start_Col=0;

      //parse the widths string
      if(this->m_Width.ParseWidths(p_widths)==false)
      {
         return OutOfMem;
      }

      //ensure the src buffer has
      //sufficient space to read total width
      if(this->m_Src_Buffer.CheckSpace(
		  this->m_Width.GetTotalWidth()*2)==false)
		  //ensure src buffer min size
      {
         return OutOfMem;
      }

      //open source file
      this->m_Src=fopen(p_src_file,"rb");
      if(this->m_Src==NULL)
      {
         //failed to open src file

         return FileNotFound;
      }

      //open destination file
      this->m_Dest=fopen(p_dest_file,"wb");
      if(this->m_Dest==NULL)
      {
         //failed to open dest file
         return FileOpenForWriteFailed;
      }

      //read start column number if set
      if(p_start_col!=NULL)
      {
         this->m_Start_Col=atoi(p_start_col);
      }

      unsigned int row=0;      //row counter
      while(1==1)
      {
         void* result=fgets(this->m_Src_Buffer.Read(),
                            this->m_Width.GetTotalWidth()*2,
                            this->m_Src);
         if(result==NULL)
         {
            break;
         }
         row++;

         ErrCode err = ProcessRow();
         if(err!=None)
         {
            return err;
         }

         Progress(row);

      }

      //and close files and buffers
      Close();

      return None;
   }

};

//class derived from CTextToCSV to allow
//bespoke progress handling
class MyProcess : public CTextToCSV
{
public:

protected:
   //process progress report as each row is read
   void Progress(unsigned int row_num)
   {
      if((row_num%10000)==0)
      {
         printf("Row %d\r\n",row_num);
      }
   }

};

int main(int argc, char* argv[])
{
   printf("TextToCSV Version 1.0.0.1 (c) 2014\r\n\r\n");

   //read params
   int num_param=argc;
   if(num_param   {
      printf("parameters:-\r\n\r\n");
      printf("dest filename (e.g. mydata.csv)\r\n");
      printf("source filename (e.g. mydata.txt\r\n");
      printf("column widths (e.g. 10,10,20,30,50\r\n");
      printf("start column position (optional, 0 based)\r\n");

      return(0);
   }

   const char* src=NULL;
   const char* dest=NULL;
   const char* widths=NULL;
   const char* start_col=NULL;

   int i=1;
   while(i   {
      const char* pr=(const char*)argv[i];
      //assign each paramater
      if(pr)
      {
         if(dest==NULL)
         {
            dest=pr;
         }
         else if(src==NULL)
         {
            src = pr;
         }
         else if(widths==NULL)
         {
            widths = pr;
         }
         else if(start_col==NULL)
         {
            start_col = pr;
         }
      }
      i++;
   }

   if(src == NULL)
   {
      printf("Missing source filename");
      return -1;
   }
   if(dest == NULL)
   {
      printf("Missing dest filename");
      return -1;
   }
   if(widths == NULL)
   {
      printf("Missing column widths");
      return -1;
   }

   printf("Processing file %s into file %s\r\n\r\n",src,dest);

   //an instance of our class, derived from CTextToCSV
   //note that this instance is created on the stack which is simpler and
   //safer than using new and delete.
   //
   MyProcess curpos;

   //and process the file
   MyProcess::ErrCode err = curpos.Process(dest,src,widths,start_col);

   //read error code if and print a report to console
   if(err!=MyProcess::None)
   {
      switch(err)
      {
         case MyProcess::OutOfMem:
            printf("Error: out of memory\r\n\r\n");
            break;
         case MyProcess::FileNotFound:
            printf("Error: source file not found\r\n\r\n");
            break;
         case MyProcess::FileOpenForWriteFailed:
            printf("Error: unable to open destination file\r\n\r\n");
            break;

         default:
            break;

      }
   }
   else
   {
      printf("process completed, no errors\r\n");

   }

   return 0;
}

Normalising Nationalities (via a good ISO-3166 Country List)

A recent development has seen the acquisition of some very large data sets which contain a “nationality” column. Unfortunately the contents of that column are inconsistent; sometimes the country name is used, sometimes the nationality (or demonym) is mis-spelled. We decided therefore to normalise the nationality columns, and to do so by converting them to the two character ISO-3166 country code list; common codes are GB, US, AU, CA, CN, ES etc.

Unfortunately many of the lists available are incomplete, out of date or have errors, We have therefore compiled a new ISO-3166 list which includes, where possible, up to three demonyms for each row. These include some common mis-spellings as well as the genuine alternatives. The data referred to above includes the unpleasant sounding nationality “Turk” as well as “Turkish”, so both denonyms are included in the list for the sake of completeness.

The latest changes to the world’s countries are represented here, including creation of South Sudan, and the separation of Serbia from Montenegro. Burma was renamed (to the Republic of the Union of Myanmar) by its government in 1989, but the old name is still commonly used by news agencies and other bodies, so our list gives the official name in brackets after the common name.

The table includes some rows which are not sovereign countries, such as Guam or Jersey. The table is sorted by ISO-3166 code.

We have sucessfully used this table to normalise the nationality data in the aforementioned large data sets, and the new column is a simple 2 latin character code, which represents a worthwhile space saving excercise in our large database, in addition to the new consistency of the data.

One problem with this approach is the occasional duplication of denonyms which would prevent the remapping of nationalities back from the ISO-3166 country code; look at The Virgin Islands (both countries) whose inhabitants are both described as “Virgin Islanders”. This description fails to clarify to which Virgin Islands the person belongs, so it is insufficient to accurately determine the ISO-3166 code anyway. We have special cased our normalisation in this instance.

The ISO-3166 CSV is available to download here, and the table is shown below:-

Code Name Demonym 1 Demonym 2 Demonym 3
AD Andorra Andorran
AE United Arab Emirates Emirian Emirati
AF Afghanistan Afghani Afghan
AG Antigua and Barbuda Antiguan
AI Anguilla Anguillan
AL Albania Albanian Alabanian
AM Armenia Armenian Hayastani
AO Angola Angolan
AQ Antarctica Antarctic
AR Argentina Argentine Argentinian Argentinean
AS American Samoa Samoan
AT Austria Austrian
AU Australia Australian
AW Aruba Arubian
AX Åland Islands Ålandic Ålandish
AZ Azerbaijan Azerbaijani
BA Bosnia and Herzegovina Bosnian Herzegovinian
BB Barbados Barbadian Barbadan Bajan
BD Bangladesh Bangladeshi
BE Belgium Belgian
BF Burkina Faso Burkinabe
BG Bulgaria Bulgarian
BH Bahrain Bahrainian
BI Burundi Burundian
BJ Benin Beninese
BL Saint Barthélemy Barthélemois
BM Bermuda Bermudan
BN Brunei Bruneian
BO Bolivia Bolivian
BQ Caribbean Netherlands
BR Brazil Brazilian
BS Bahamas Bahameese Bahamian
BT Bhutan Bhutanese
BV Bouvet Island
BW Botswana Motswana Batswana
BY Belarus Belarusian
BZ Belize Belizean
CA Canada Canadian
CC Cocos (Keeling) Islands Cocossian Cocos Islandia
CD Democratic Republic of the Congo Congolese
CF Central African Republic Central African
CG Congo (Republic of) Congolese
CH Switzerland Swiss
CI Côte d’Ivoire (Ivory Coast) Ivorian
CK Cook Islands Cook Islander
CL Chile Chilean
CM Cameroon Cameroonian
CN China Chinese
CO Colombia Colombian Columbian
CR Costa Rica Costa Rican
CU Cuba Cuban
CV Cape Verde Cape Verdean
CW Curaçao Curaçaoan
CX Christmas Island Christmas Islander
CY Cyprus Cypriot
CZ Czech Republic Czech
DE Germany German
DJ Djibouti Djiboutian Djibouti
DK Denmark Danish Dane
DM Dominica Dominican
DO Dominican Republic Dominican
DZ Algeria Algerian
EC Ecuador Ecuadorean Ecudorean
EE Estonia Estonian
EG Egypt Egyptian
EH Western Saharan Western Saharan Sahrawi
ER Eritrea Eritrean
ES Spain Spanish
ET Ethiopia Ethiopian
FI Finland Finnish
FJ Fiji Fijian
FK Falkland Islands Falkland Islander
FM Micronesia Micronesian
FO Faroe Islands Faroese
FR France French
GA Gabon Gabonese
GB United Kingdom British
GD Grenada Grenadian
GE Georgia Georgian
GF French Guiana French Guianese
GG Guernsey
GH Ghana Ghanaian Ghanian
GI Gibraltar Gibraltarian
GL Greenland Greenlander Greenlandic
GM Gambia Gambian
GN Guinea Guinean
GP Guadeloupe Guadeloupean
GQ Equatorial Guinea Equatorial Guinean Equatoguinean
GR Greece Greek
GS South Georgia and the South Sandwich Islands
GT Guatemala Guatemalan
GU Guam Guamanian
GW Guinea-Bissau Guinean
GY Guyana Guyanese
HK Hong Kong Hong Konger
HM Heard and McDonald Islands
HN Honduras Honduran
HR Croatia Croatian Croat
HT Haiti Haitian
HU Hungary Hungarian
ID Indonesia Indonesian
IE Ireland Irish
IL Israel Israeli
IM Isle of Man Manx
IN India Indian
IO British Indian Ocean Territory
IQ Iraq Iraqi
IR Iran Iranian
IS Iceland Icelander
IT Italy Italian
JE Jersey
JM Jamaica Jamaican
JO Jordan Jordanian
JP Japan Japanese
KE Kenya Kenyan
KG Kyrgyzstan Kyrgyzstani
KH Cambodia Cambodian
KI Kiribati I-Kiribati
KM Comoros Comoran
KN Saint Kitts and Nevis Kittian Nevisian
KP North Korea North Korean
KR South Korea South Korean
KW Kuwait Kuwaiti
KY Cayman Islands Caymanian
KZ Kazakhstan Kazakhstani Kazakh
LA Laos Laotian
LB Lebanon Lebanese
LC Saint Lucia Saint Lucian
LI Liechtenstein Liechtensteiner
LK Sri Lanka Sri Lankan
LR Liberia Liberian
LS Lesotho Mosotho Basotho
LT Lithuania Lithunian
LU Luxembourg Luxembourger
LV Latvia Latvian
LY Libya Libyan
MA Morocco Moroccan
MC Monaco Monacan
MD Moldova Moldovan
ME Montenegro Montenegrin
MF Saint Martin (France)
MG Madagascar Malagasy
MH Marshall Islands Marshallese
MK Macedonia Macedonian
ML Mali Malian
MM Burma (Republic of the Union of Myanmar) Myanmarese Burmese
MN Mongolia Mongolian
MO Macau Macanese
MP Northern Mariana Islands Northern Mariana Islander
MQ Martinique Martinican Martiniquaís
MR Mauritania Mauritanian
MS Montserrat Montserratian
MT Malta Maltese
MU Mauritius Mauritian
MV Maldives Maldivan
MW Malawi Malawian
MX Mexico Mexican
MY Malaysia Malaysian
MZ Mozambique Mozambican
NA Namibia Namibian
NC New Caledonia New Caledonian New Caledonians
NE Niger Nigerien
NF Norfolk Island Norfolk Islander
NG Nigeria Nigerian
NI Nicaragua Nicaraguan Nicoya
NL Netherlands Dutch
NO Norway Norwegian
NP Nepal Nepalese
NR Nauru Nauruan
NU Niue Niuean
NZ New Zealand New Zealander
OM Oman Omani
PA Panama Panamanian
PE Peru Peruvian
PF French Polynesia French Polynesian
PG Papua New Guinea Papua New Guinean
PH Philippines Filipino
PK Pakistan Pakistani
PL Poland Polish Pole
PM St. Pierre and Miquelon Saint-Pierrais Miquelonnais
PN Pitcairn Pitcairn Islander
PR Puerto Rico Puerto Rican
PS Palestine Palestinian
PT Portugal Portuguese Portugese
PW Palau Palauan
PY Paraguay Paraguayan
QA Qatar Qatari
RE Réunion
RO Romania Romanian
RS Serbia Serbian Serb
RU Russian Federation Russian
RW Rwanda Rwandan Rwandese
SA Saudi Arabia Saudi Arabian Saudi
SB Solomon Islands Solomon Islander
SC Seychelles Seychellois
SD Sudan Sudanese
SE Sweden Swedish Swede
SG Singapore Singaporean
SH Saint Helena Saint Helenian
SI Slovenia Slovenian Slovene
SJ Svalbard and Jan Mayen Islands
SK Slovakia Slovakian Slovak
SL Sierra Leone Sierra Leonean
SM San Marino Sanmarinese Sammarinese
SN Senegal Senegalese
SO Somalia Somali
SR Suriname Surinamer Surinamese
SS South Sudan Sudanese
ST São Tome and Príncipe São Tomean Sao Tomean
SV El Salvador Salvadorean Salvadoran
SX Saint Martin (Netherlands)
SY Syria Syrian
SZ Swaziland Swazi
TC Turks and Caicos Islands Turks and Caicos Islander
TD Chad Chadian
TF French Southern Territories
TG Togo Togolese
TH Thailand Thai
TJ Tajikistan Tajikistani
TK Tokelau Tokelauan
TL Timor-Leste Timorese
TM Turkmenistan Turkmen
TN Tunisia Tunisian
TO Tonga Tongan
TR Turkey Turkish Turk
TT Trinidad and Tobago Trinidadian Tobagonian
TV Tuvalu Tuvaluan
TW Taiwan Taiwanese
TZ Tanzania Tanzanian
UA Ukraine Ukrainian
UG Uganda Ugandan
UM United States Minor Outlying Islands
US United States of America American
UY Uruguay Uruguayan
UZ Uzbekistan Uzbekistani
VA Vatican
VC Saint Vincent and Grenadines Saint Vincentian Vincentian
VE Venezuela Venezuelan
VG British Virgin Islands Virgin Islander
VI United States Virgin Islands Virgin Islander
VN Vietnam Vietnamese
VU Vanuatu Ni-Vanuatu
WF Wallis and Futuna Islands Wallisian Futunan
WS Samoa Samoan
YE Yemen Yemeni Yemenese
YT Mayotte Mahoran
ZA South Africa South African
ZM Zambia Zambian
ZW Zimbabwe Zimbabwean

The United States in CSV Format

A recent development required an array of the U.S. States including the 2 character ANSI abbreviation, the English name and the Spanish translation. I was unable to find a CSV or similar resource which I could download, so here’s the one I made earlier.

The District of Columbia is not actually a U.S. State but is included in this list. A more complete data set would include the overseas U.S. territories.

This is a link to the CSV, which is just 1220 bytes in size.

https://t2a.co/blog/wp-content/uploads/2014/02/US_States.csv

The table is also shown below.

English Español ANSI Code
Alabama Alabama AL
Alaska Alaska AK
Arizona Arizona AZ
Arkansas Arkansas AR
California California CA
Colorado Colorado CO
Connecticut Connecticut CT
Delaware Delaware DE
District of Columbia Distrito de Columbia DC
Florida Florida FL
Georgia Georgia GA
Hawaii Hawai HI
Idaho Idaho ID
Illinois Illinois IL
Indiana Indiana IN
Iowa Iowa IA
Kansas Kansas KS
Kentucky Kentucky KY
Louisiana Luisiana LA
Maine Maine ME
Maryland Maryland MD
Massachusetts Massachusetts MA
Michigan Michigan MI
Minnesota Minesota MN
Mississippi Misisipi MS
Missouri Misuri MO
Montana Montana MT
Nebraska Nebraska NE
Nevada Nevada NV
New Hampshire Nuevo Hampshire NH
New Jersey Nueva Jersey NJ
New Mexico Nuevo México NM
New York Nueva York NY
North Carolina Carolina del Norte NC
North Dakota Dakota del Norte ND
Oklahoma Oklahoma OK
Oregon Oregón OR
Pennsylvania Pensilvania PA
Rhode Island Rhode Island RI
South Carolina Carolina del Sur SC
South Dakota Dakota del Sur SD
Tennessee Tennessee TN
Texas Texas TX
Utah Utah UT
Vermont Vermont VT
Virginia Virginia VA
Washington Washington WA
West Virginia Virginia Occidental WV
Wisconsin Wisconsin WI
Wyoming Wyoming WY

NEW SERVICE – Search for UK people using electoral roll and telephone directory data

We have just added a new service to T2A which allows you to search for UK people using electoral roll data and current UK telephone directory lists. Just enter a name or address, or partial name or address to find the name, address and telephone number(where available) of all the people that match your specified criteria. The Find UK people service costs 12 credits per search which works out as 18p per search on our biggest credit package.

Multiple value input field with jQuery

Are you a Facebook or Hotmail user? You’ve probably seen those input boxes that let you input multiple email addresses or names, adding them to a list as you go along and giving you the ability to remove them later by clicking their close button… If you’re not sure what I’m talking about, they look a bit like this:

Multiple value input field with jQuery

So, having had a fairly extensive search on google for an out-of-the-box solution, it turns out there aren’t really any so I set about writing my own, using Hotmail’s for inspiration. It didn’t take many lines of code to get the job done… It’s implemented as a jQuery plugin with a little bit of CSS, which you can edit to your liking. Feel free to improve the plugin too – I haven’t checked to see if it works when applied to multiple inputs on the same page, but it works very nicely with just the one.

Here’s the code for the jQuery plugin:

(function( $ ){

     $.fn.multipleInput = function() {

          return this.each(function() {

               // create html elements

               // list of email addresses as unordered list
               $list = $('<ul />');

               // input
               var $input = $('<input type="text" />').keyup(function(event) {

                    if(event.which == 32 || event.which == 188) {
                         // key press is space or comma
                         var val = $(this).val().slice(0, -1); // remove space/comma from value

                         // append to list of emails with remove button
                         $list.append($('<li class="multipleInput-email"><span>' + val + '</span></li>')
                              .append($('<a href="#" class="multipleInput-close" title="Remove" />')
                                   .click(function(e) {
                                        $(this).parent().remove();
                                        e.preventDefault();
                                   })
                              )
                         );
                         $(this).attr('placeholder', '');
                         // empty input
                         $(this).val('');
                    }

               });

               // container div
               var $container = $('<div class="multipleInput-container" />').click(function() {
                    $input.focus();
               });

               // insert elements into DOM
               $container.append($list).append($input).insertAfter($(this));

               // add onsubmit handler to parent form to copy emails into original input as csv before submitting
               var $orig = $(this);
               $(this).closest('form').submit(function(e) {

                    var emails = new Array();
                    $('.multipleInput-email span').each(function() {
                         emails.push($(this).html());
                    });
                    emails.push($input.val());

                    $orig.val(emails.join());

               });

               return $(this).hide();

          });

     };
})( jQuery );

Here’s the CSS:

.multipleInput-container {
     border:1px #ccc solid;
     padding:1px;
     padding-bottom:0;
     cursor:text;
     font-size:13px;
     width:100%;
}

.multipleInput-container input {
     font-size:13px;
     clear:both;
     width:250px;
     height:24px;
     border:0;
     margin-bottom:1px;
}

.multipleInput-container ul {
     list-style-type:none;
}

li.multipleInput-email {
     float:left;
     margin-right:2px;
     margin-bottom:1px;
     border:1px #BBD8FB solid;
     padding:2px;
     background:#F3F7FD;
}

.multipleInput-close {
     width:16px;
     height:16px;
     background:url(close.png);
     display:block;
     float:right;
     margin:0 3px;
}

Turn a regular input into a super, jQuery-powered multiple value input like so… The names/emails entered will be submitted as a comma separated list from the original form input!

$('#my_input').multipleInput();

Any questions, feel free to comment!

Using an XML SOAP Web Service from Visual Studio 2005, 2008 and 2010

Introduction

Here we show how to use external SOAP web services from within Microsoft Visual Studio versions 2005, 2008 and 2010.

The scope of this article extends only to XML SOAP web services which may have been created in Visual Studio (and are usually identified by the asmx extension at the end of the URL).

In order to demonstrate the use of external web services, we will use T2A’s SOAP service, which was created using Visual Studio 2005.

All code examples here are in C#.

Making Life Easy

If you’ve never made use of an external web service from Visual Studio, you’re about to discover just how easy it is. In comparison, parsing XML yourself can be quite time consuming, even using a powerful XML reader or parser. A SOAP web service allows Visual Studio to discover how its methods operate (their inputs and return formats) and using this information, allows your IDE to create code within your project, to allow your code to seamlessly use the external service, just as if it was a class in your own code.

Using a Web Service from Visual Studio 2005

In order to simply demonstrate the use of T2A’s SOAP service, we created a C# console application. The IDE created an empty project.

We then “told” our project about the web service. We right clicked on the project in the solution explorer, and clicked on Add Web Reference. This can be viewed below, for our project ws_test_2005:-

A new window opened, in which we specified the location of the web service, in this case, https://t2a.co/soap. We clicked the go button:-

The IDE was able to identify the web service. We named the web service t2a. We then clicked on add reference as seen below.

The IDE then created the necessary code to allow our simple project to use the external web service. You can see the information about the added web reference in the right hand pane below:-

We then created a simple console application which uses the T2A soap service, specifically the geo_code method, which as its name suggests, returns geographical co-ordinates for a given postcode, street or place.

The example code shown below uses T2A’s free test mode, during which it returns dummy data at no charge, for the benefit of developers in their initial integration stages.


using System;
using System.Collections.Generic;
using System.Text;

namespace ws_test_2005
{
    class Program
    {
        static void Main(string[] args)
        {
            string api_key = "test";

            t2a.T2A ws = new t2a.T2A();
            t2a.geo_code_res ws_res = ws.geo_code(api_key, null,"york");
            if (ws_res.status.CompareTo("ok") == 0)
            {
                 //print geo code data
                int i = 0;
                while (i < ws_res.geo_data_list.Length)
                {
                    Console.WriteLine(ws_res.geo_data_list[i].description +
                        " at (" +
                        ws_res.geo_data_list[i].latitude + "," +
                        ws_res.geo_data_list[i].longitude+")"
                        );

                    i++;
                }

            }

        }
    }
}

When the console application is executed in the debugger, we can see the class instance that the IDE has created; in this case an array of geo_data instances, this being a member of the result class geo_code_res. Note in the bottom pane the contents of the class instances, in this case, the dummy data returned by T2A in free test mode:-

Detailed Explanation of the Code Example

Normally you would use the API key associated with your T2A account; for this example we are using T2A’s free test mode; this is activated by an API key “test”:-

string api_key = "test";

Remember that we named the external web service t2a? The code created by the IDE is in a namespace t2a. The namespace includes the main class that contains the T2A methods; this is named T2A, or more correctly, t2a.T2A.

We now create an instance of that class.

t2a.T2A ws = new t2a.T2A();

The geo_code method returns an instance of the class t2a.geo_code_res. We now invoke the method; the third parameter is the place, street or postcode for which we want the geo co-ordinates.

t2a.geo_code_res ws_res = ws.geo_code(api_key, null,"york");

If the method has succeeded, the status is ok. If not, we normally would then read the error code.

We then display the contents of the t2a.geo_data instances held in the array t2a.geo_code_res.geo_data_list.

if (ws_res.status.CompareTo("ok") == 0)
{
   //print geo code data

   int i = 0;
   while (i < ws_res.geo_data_list.Length)
   {
      Console.WriteLine(ws_res.geo_data_list[i].description +
      " at (" +
      ws_res.geo_data_list[i].latitude + "," +
      ws_res.geo_data_list[i].longitude+")"
      );

Using a Web Service from Visual Studio 2008

We created a C# console application; it actually uses the same code as shown above. When adding the web reference, the first thing one notices is that whereas formerly beneath Add Reference there was an item Add Web Reference, this seems to be missing from Visual Studio 2008. In its place is Add Service Reference.

We clicked that instead (see below).

A new window opened, named Add Service Reference. In order to add a “legacy” web reference, we clicked the advanced button at the bottom.

Yet another new window opened, named Service Reference Settings. We hit the Add Web Reference button at the bottom.

From that point, adding the web reference is the same as above, for Visual Studio 2005. We created the reference by informing the dialog box that we wished to use https://t2a.co/soap as we had with Visual Studio 2005.

Using a Web Service from Visual Studio 2010

The procedure for adding the web reference is the virtually the same for the 2010 version as for 2008; the image below shows the default Solution Explorer pane for our console application.

Line Segment Intersection for Partial or Lapsed Mathematicians

Introduction

A common problem in 2D geometry is the determination of whether two line segments intersect, and following a positive answer, optionally deriving the intersection point. The most common solution is the use of cross product to derive the intersection point (if any), and whilst to a mathematician (or student thereof) this is a relatively simple solution, research for this article quickly revealed at least two published incorrect code implementations, flawed by their author’s failure to understand that maths.

The purpose of this article is therefore to present a less elegant but much simpler solution, easier to illustrate and comprehend.

The article is therefore written with the assumption of an average knowledge of school mathematics.

Lines and Line Segments

In common parlance a 2D line is the shortest distance between two points, that description actually applies to line segments. A line is of infinite length; a line segment is a portion of the line, between two given points. Unless the two lines are parallel they will by definition intersect somewhere, given their inifinte lengths; the question of intersection of line segments is therefore the concern of this article.

The image below shows two line segments AB and CD

A Simpler Solution

To solve the line segment intersection problem we will use the fact the the intersection calculation is very simple if one of the lines is vertical or horizontal.

We will translate and rotate the “world” so that this is the case; we imagine ourselves at point A, looking along the line towards B, and considering the “world” relative to our own position and orentation.

This solution, described below in stages, also allows for some trivial (i.e. fast) checks that can determine that the line segments do not cross, before the full calculations are executed.

Stage 1 – Initial Checks

Considering again two line segments AB and CD. If the lines are identical (i.e. A and C, and B and D are at the same position (or A and D and B and C) the line segments should be designated as crossing or not crossing, subject to the ultimate objective of the test. No further checks are needed.

If both line segments are actually points (i.e. A=B and C=D), execute a simple test to determine if the the points coincide. No further checks are needed.

Finally if one of the line segment ends is the same as one of the other ends (but only one, not both), the line segments would normally be designated as not crossing. This is shown below, where B and C are coincidental.

Stage 2 – Trivial Rejection

Consider the line segments below. We can clearly see that they do not cross.

If we check the bounding boxes of each line segment, we can quickly in this case deduce that the line segments cannot cross, and no further checks are needed. The follow pseudo-code is used:-

if (box1.xmax < box2.xmin OR box1.ymax < box2.ymin OR box1.xmin > box2.xmax OR box1.ymin > box2.ymax) then reject

In the case shown below, line1’s box.xmax is less than line2’s box xmin, so they can never cross.

Stage 3 – Translate and Rotate Relative to A

Considering again the line segments AB and CD. Clearly they cross;
we will now show how to determine whether they do, and optionally calculate the point at which they do.

In order to determine whether they cross, we will rotate and shift the “world” of AB and CD so that AB is vertical, with A being at the centre (0,0). We will view the “world” from the perspective of point A, looking at point B.

First, translate A to (0,0) and move B,C and D by the same amount:-
Subtract A’s X and Y coordinates from points B,C and D:-

Now we will rotate the “world” around 0,0 so that AB lies at X=0 (on the Y axis). We will rotate both AB and CD around (0,0) by the angle between AB and the Y axis (X=0); we will designate this angle a.

Here’s where the school maths comes in. The Pythagoras formula shows that for a right angled triangle, the square of the longest edge = the sum of the square of the other two edges. Considering the line segment AB shown below:-

If we complete the right angled triangle using point E which is (A.x,B.y), the angle a is that between AB and AE. Using Pythagoras, the length or magnitude of line segment AB is derived thus:-

sqrt( (B.x-A.x)^2 + (B.y-A.y>^2) )

In order to rotate points by a given angle, we need the sine and cosine of that angle. For angle a these are:-

sin a = length(EB) / length(AB) or sin = B.x / length(AB)
cos a = length(EA) / length(AB) or cos = B.y / length(AB)

Remember that we calculated length(AB) using Pythagoras, above.

To rotate points around (0,0) by angle a, use these formulae, which you may have encountered in school maths:-

x' = x cos a - y sin a
y' = y cos a + x sin a

When the above formula is applied to AB and CD we see the result below.

(A shrewd observer will note that A (0,0) does not need to be rotated, and the new position B’ of B is (0,length(AB)).)

Remember that this is the same “world”, merely viewed from the perspetive of A, looking towards B. We will calculate the intersection point (if any) relative to point A – the fact of intersection is not changed by the co-ordinates being relative to A.

Allowing Tolerance

Even when using high precision floating point code, slight errors must be allowed for. Furthermore, although lines and line segments are normally regarded as having zero with, your application may consider them to be “thick”. For these reasons we need to add a small tolerance to some checks applied after the rotation has been applied.

We will define a small value which will be used for the tolerance. For example:-

const double small_value = 0.00001;

Stage 4 – Further Trivial Checks

The new position of C’D’ relative to AB’ allows us to easily ascertain if C’D’ crosses A’B’. We can see from the above diagram that the following pseudo-code checks can be applied:-

if C'.y < 0 AND D'.y < 0 they do not cross
if C'.y > B'.y AND D'.y > B'.y they do not cross
if C'.x < 0 AND D'.x < 0 they do not cross
if C'.x > 0 AND D'.x > 0 they do not cross

Stage 5 – Check for Co-linear

If line segment C’D’ is now on the Y axis (abs(C’.x) < small_value) AND abs(D’.x) < small_value) then the line segments lie on the same infinite line. The only questions that now concerns is whether they share the same space on that line. The stage 4 trivial checks have determined that A’B’ and C’D’ are not above or below each other, we know that if C’D’ lies at or very close to X=0 it must cross A’B’.

Stage 6 – Check for Parallel

If line segment C’D’ is now vertical ((abs(C’.x – D’.x) < small_value) we know that they cannot cross, and we also know that it is not co-linear, having already performed the stage 5 check.

Stage 7 – Do They Cross?

Having completed the above trivial checks, if the question is still not settled, we must calculate where C’D’ crosses the Y axis (X=0).

Here is some more maths, but really simple, similar triangles. Look at C’D’ crossing the Y axis below at point F:-

The smaller shaded triangle is “similar” to the larger triangle formed by C’D’ and (D’.x,C’.y). To compute the Y position of point F:-

F.y = C'.y - (((D'.y - C'.y) * C'.x) / (D'.x - C'.x) )

Note that we already know that C’D’ is not vertical, so D’.x-C’x cannot be zero – there is no need to apply a division by zero check before executing the above calculation.

We now have the position of F. We simply need to check it against line A’B’.

Thus, if F.y < 0 or F.y> B’.y, the line segments do not cross.

Stage 8 – Calculate the Intersection Point (optional)

If we need the intersection position, we will need to reverse the rotation and translation of stage 3, upon point F.

First, rotate point F the other way, using the rotation through the angle -a, using again these formulae:-

x' = x cos a - y sin a
y' = y cos a + x sin a

…but since F.x=0, sin(-a) = -sin(a), and cos(-a) = cos(a), the operation becomes much simpler:-

F'.x = F.y * sin a
F'.y = F.y * cos a

Then, translate the rotated F’ by the position of A:-

F'.x = F'x + A.x
F'.y = F'y + A.y

… to derive the actual “world” position of the intersection point F, rather than relative to AB.

Disadvantages

Clearly this method is not as elegant as the cross product solution. The single square root required is relatively slow.

Advantages

This method much is easier to comprehend and illustrate. The trivial checks at different stages can save un-needed calculations.

Improvements

If a complex system, where many line segments must be checked against each other, a simple improvment to performance could be derived by caching the line segment lengths as they are calculated.

Adding Figures

  1. Using the line segments AB and CD in the worked example:-AB is (-5,-2) to (1,2)
    CD is (-8,-2) to (2,1)
  2. The boundary boxes cross, so we proceed to translate so that A is at (0,0).
  3. Subtract A from B,C,D i.e. subtract (-5,-2)
  4. AB is now (0,0) to (6,4)
    CD is now (-3,0) to (7,3)
  5. Calculate length(AB) using Pythagoras.
    Length(AB) = 7.2111025
  6. Calculate sin(a) = B.x / length(AB) = 0.83205029433784372
  7. Calculate cos(a) = B.y / length(AB) = 0.55470019622522915
  8. Rotate AB and CD around (0,0)
    A’B’ is (0,0) to (0,7.2111025)
    C’D’ is (-1.66410059,-2.49615) to (1.38675,7.4884)
  9. C’D’ is not co-linear or vertical, and crosses X=0, so calculate where it crosses.
  10. C’D’ crosses X=0 at point F.
  11. F.Y = 2.9499964
    (F.X = 0, for clarity)
  12. Check if F.Y is between 0 and B’.Y or 7.2111025. It is, so the line segments do cross.
  13. Rotating F the opposite way, and translating by -A, we calculate the intersection point(-2.5454545,-0.3636363740)

Code Example

This C# example uses two classes Point2D and LineSeg2D


public struct ConstantValue
{
    internal const double SmallValue = 0.00001;
}

public class Point2D
{
    private float X;
    private float Y;

    public Point2D(double src_x, double src_y)
    {
        this.X = (float)src_x;
        this.Y = (float)src_y;
    }

    public Point2D(float src_x, float src_y)
    {
        this.X = src_x;
        this.Y = src_y;
    }

    public float X
    {
        set
        {
            this.X = value;
        }
        get
        {
            return this.X;
        }
    }

    public float Y
    {
        set
        {
            this.Y = value;
        }
        get
        {
            return this.Y;
        }
    }
}

public class LineSeg2D
{
    private Point2D m_Start;
    private Point2D m_End;

    public LineSeg2D(Point2D start, Point2D end)
    {
        this.m_Start = startPoint;
        this.m_End = endPoint;
    }

    // returns true if intersection, and returns position
    // if a buffer is supplied (intersect_point)
    //
    public bool CheckIntersect(Point2D intersect_point, LineSeg2D other_line)
    {

        //check if points are same

        if (
            this.m_End.X == other_line.m_End.X &&
            this.m_End.Y == other_line.m_End.Y &&
            this.m_Start.X == other_line.m_Start.X &&
            this.m_Start.Y == other_line.m_Start.Y
            )
        {
            //same other_line - mark as crossing
            if (intersect_point != null)
            {
                intersect_point.X = this.m_End.X;
                intersect_point.Y = this.m_End.Y;
            }

            return (true);
        }

        if (
            this.m_End.X == other_line.m_Start.X &&
            this.m_End.Y == other_line.m_Start.Y &&
            this.m_Start.X == other_line.m_End.X &&
            this.m_Start.Y == other_line.m_End.Y
            )
        {
            //same other_line - mark as crossing
            if (intersect_point != null)
            {
                intersect_point.X = this.m_End.X;
                intersect_point.Y = this.m_End.Y;
            }
            return true;
        }

        //allow end points to coincide
        if (
            this.m_End.X == other_line.m_Start.X &&
            this.m_End.Y == other_line.m_Start.Y
            )
        {
            return false;
        }

        if (
            other_line.m_End.X == this.m_Start.X &&
            other_line.m_End.Y == this.m_Start.Y
            )
        {
            return false;
        }

        //check line1 is not a point

        if (
            this.m_End.X == this.m_Start.X &&
            this.m_End.Y == this.m_Start.Y
            )
        {
            //line1 is a point
            return false;
        }

        //check line2 is not a point
        if (
            this.m_End.X == this.m_Start.X &&
            this.m_End.Y == this.m_Start.Y
            )
        {
            //line1 is a point
            return false;
        }

        //get the magnitude of line1
        double line1_dx = this.m_Start.X - this.m_End.X;
        double line1_dy = this.m_Start.Y - this.m_End.Y;
        double line1_mag = Math.Sqrt((line1_dx * line1_dx)
        + (line1_dy * line1_dy));

        //shift line2 relative to line1 START POINT

        double line2_sx = other_line.m_End.X - this.m_End.X;
        double line2_sy = other_line.m_End.Y - this.m_End.Y;

        double line2_fx = other_line.m_Start.X - this.m_End.X;
        double line2_fy = other_line.m_Start.Y - this.m_End.Y;

        double sin = line1_dx / line1_mag;
        double cos = line1_dy / line1_mag;

        //rotate

        //x' = x cos f - y sin f
        //y' = y cos f + x sin f

        double line2a_sx = (line2_sx * cos) - (line2_sy * sin);
        double line2a_sy = (line2_sy * cos) + (line2_sx * sin);

        double line2a_fx = (line2_fx * cos) - (line2_fy * sin);
        double line2a_fy = (line2_fy * cos) + (line2_fx * sin);

        //check if co-linear
        if (Math.Abs(line2a_sx) < ConstantValue.SmallValue &&
           Math.Abs(line2a_fx) < ConstantValue.SmallValue         )         {                  //line2 lies on the Y axis                if (line2a_sy > line2a_fy)
            {
                double t = line2a_sy;
                line2a_sy = line2a_fy;
                line2a_fy = t;
            }

            if (line2a_sy > line1_mag)
            {
                //line2 is ABOVE line1
                return false;
            }
            if (line2a_fy < 0)
            {
                //line2 is BELOW line1
                return false;
            }

            //lines are co-linear and DO interesect

            if (intersect_point != null)
            {
                intersect_point.X = this.m_End.X;
                intersect_point.Y = this.m_End.Y;
            }
            return true;

        }

        //ensure line2 crosses Y axis
        if (line2a_sx < 0 && line2a_fx < 0)         {                 //both ends are BELOW x=0                 return false;         }                  if (line2a_fx > 0 && line2a_sx > 0)
        {
            //both ends ABOVE y=0
            return false;
        }

        //check if line2a is all
        if (line2a_sy < 0 && line2a_fy < 0)                      {                              return false;                     }                              //check if line2a is all >mag1
        if (line2a_sy >line1_mag && line2a_fy > line1_mag)
        {
            return false;
        }

        //check if line2a is parallel
        if (Math.Abs(line2a_sx - line2a_fx) < ConstantValue.SmallValue)
        {
            return false;
        }

        //now calculate where line2a crosses Y axis

        //using "similar trianges"
        double ry = ((line2a_fy - line2a_sy) * line2a_sx) /
             (line2a_fx - line2a_sx);

        ry = line2a_sy-ry;        //ry is where it crosses x=0;

        if (ry < 0 || ry > line1_mag)
        {
            //crosses Y axis OUTSIDE of line1
            return false;
        }

        if (intersect_point != null)
        {
            // calculate intersection point by roating (0,ry)
            // by the reverse of the original
            // rotation, and adding to line1 start point
            //
            intersect_point.X = (float)((ry * sin) + this.m_End.X);
            intersect_point.Y = (float)((ry * cos) + this.m_End.Y);
        }
        return true;
    }

}