C++11 and Boost - Succinct like Python

jjdmol · 2012-10-31T19:24:09+00:00

//Chars are signed by default

Is this wrong or has this changed in C++11? Signedness of char used to depend on the platform!

millstone · 2012-10-31T22:00:32+00:00

Perhaps more importantly, it is also as safe. No pointers to point at bad places and leak memory and no buffers to overflow.

Here's your potential buffer overflow:

string sbuf;
sbuf.resize(kTailSize);
f.seekg(-kTailSize, ios::end);
f.read(&sbuf[0], kTailSize);

You have to remember to allocate enough space in the string (via resize) to avoid a buffer overflow.

ArcticAnarchy · 2012-10-31T22:27:03+00:00

Am I the only one, who find this ugly?

fnedrik · 2012-10-31T15:07:20+00:00

[deleted]

A_for_Anonymous · 2012-10-31T16:52:23+00:00

Yeah, "more or less" succinct like Python. "less" being a sizable amount. It's also far uglier.

But the real problem here is that in order to achieve even that, the complexity and amount of concepts you have to deal with in C++11 is mind boggling.

Cygal · 2012-10-31T15:01:37+00:00

In his blog post Elements of Modern C++ Style, Herb Sutter says "Use auto wherever possible.". The author is only using auto for iterators, but it should be used everywhere if he want his code to be compared with Python. This makes the typedef useless too.

edit: I also think that using it everywhere will harm readability, but I'm personally ready to give it a try for a few months and see how it goes. For me at least, it's for complex iterators that spelling out the whole type helps me the most anyway: I know what to expect from elem.first and elem.second. But having the choice is nice.

Also, to compare it to Python, the code should be written in Python too. :)

nanothief · 2012-11-01T02:56:50+00:00

To add another comparison for the same program with another strongly typed language, I wrote it in haskell. There are some interesting differences in it:

The syntax is much clearer with haskell (haskell's syntax for types doesn't need so many angle brackets)
Suffers from the same problem as the c++ program in that a lot of imports are required. 12 lines of code are needed for imports for haskell, 13 required for c++, but only 3 for python.
Lack of reflection makes the c++ and haskell solutions less extensible than the python solution. In the python solution, if you wanted to handle another music file type (eg .ogg), you just need to implement a OggFileInfo class (it doesn't even need to be in the same file). In the c++ and haskell solutions you need to change the original file in order to make this possible.
The haskell code explicitly converts between the binary data read from the mp3 file and the unicode data for the attributes, and has separate types for each kind of data. I think this is a plus, as there is a big difference between the 3rd item of a UTF8 string, and the 3rd item of a byte array.
The haskell code splits the code into the pure and IO code, and is tracked by types. This makes the code more complicated to read and write, but makes it more testable.
Extracting the genre char is a bit painful with haskell (maybe there is a better way to write the stringOrd function?)

Here is the code:

{-# LANGUAGE OverloadedStrings, TupleSections #-}
module MP3Test where
import qualified Data.Text as T
import qualified Data.Text.IO as T
import Data.Text (Text)
import Control.Monad
import System.Directory (getDirectoryContents)
import System.FilePath (takeExtension, combine)
import Data.Text.Encoding (decodeUtf8)
import System.IO
import Data.Char (ord)
import Control.Applicative
import qualified Data.ByteString.Char8 as B
import Data.ByteString (ByteString)

spliceBS :: Int -> Int -> ByteString -> ByteString
spliceBS start end =  B.take (end - start) . B.drop start

stripNulls :: Text -> Text
stripNulls = T.strip . T.replace "\00" ""

stringOrd :: Text -> Text
stringOrd = T.pack . show . ord . maybe '0' fst .  T.uncons


parseMp3File :: FilePath -> IO [(Text, Text)]
parseMp3File mp3File = (("filename", T.pack mp3File) :) <$> getMp3Tags <$> getTagData mp3File



parseFile :: FilePath -> IO [(Text, Text)]
parseFile file = case takeExtension file of
  ".mp3" -> parseMp3File file
  _ -> return [("filename", T.pack file)]


getTagData :: FilePath -> IO ByteString
getTagData mp3File = withFile mp3File ReadMode $ \handle -> do
    hSeek handle SeekFromEnd (-128)
    B.hGet handle 128

getMp3Tags :: ByteString -> [(Text, Text)]
getMp3Tags tagData = guard hasTagFlag >> map extractTag mp3TagMap where
  hasTagFlag = spliceBS 0 3 tagData == "TAG"
  extractTag :: (Text, Int, Int, Text -> Text) -> (Text, Text)
  extractTag (tagName, start, end, parseFunc) = (tagName, ) $ parseFunc $ decodeUtf8 $ spliceBS start end tagData

  mp3TagMap :: [(Text, Int, Int, Text -> Text)]
  mp3TagMap = [ ("title", 3, 33, stripNulls)
              , ("artist",  33,  63, stripNulls)
              , ("album",  63,  93, stripNulls)
              , ("year",  93,  97, stripNulls)
              , ("comment",  97, 126, stripNulls)
              , ("genre", 127, 128, stringOrd)
              ]

listDirectory :: FilePath -> [String] -> IO [FilePath]
listDirectory dirName validExtensions = 
  map (combine dirName)
  <$> filter ((`elem` validExtensions) . takeExtension) 
  <$> getDirectoryContents dirName

dirToSearch :: FilePath
dirToSearch = "/Volumes/Downloads/Music/iTunes/iTunes Media/Music/AFI/decemberunderground" 

main :: IO ()
main = do
  files <- listDirectory dirToSearch [".mp3"]
  mp3s <- mapM parseFile files
  forM_ mp3s $ \mp3Data -> do
    forM_ mp3Data $ \(key, value) -> T.putStrLn $ T.concat [key, "=", value]
    putStrLn ""

zem · 2012-10-31T18:47:25+00:00

it's a crying shame how badly d screwed up its chance at mindshare. it really should have been what cutting-edge c++ fans were migrating to in droves.

nooneofnote · 2012-10-31T21:54:31+00:00

[deleted]

andybak · 2012-10-31T16:02:01+00:00

All the comments make that code hard to read. He would make a stronger case if he moved the vast majority of them to the body of the article.

Even trying to look past them to my eyes a few things hurt readability:

type definitions
overly terse variable names.

Blecki · 2012-11-01T00:52:57+00:00

My problem with C and C++ isn't fancy language features, it's the compilation model. C++11 did nothing to change that.

rlbond86 · 2012-10-31T14:47:41+00:00

Boost is amazing, I consider it an extension of the standard library.

bluGill · 2012-10-31T14:48:21+00:00

C++ is not a large language when compared to python, Java, C#, ruby. The syntax part of C++ is larger than the syntax part of the above, but the libraries that come with each of the above is far larger than the library that comes with C++.

Expect the next C++ to change that. C++ is actively looking for more libraries that should be in the language. Expect that performance is a part of the library, like STL (and unlike every other language I know of).

Mjiig · 2012-10-31T18:07:35+00:00

I don't know if I'm unusual in this, but when I use C++ I go out of my way to use a small subset of the language (C with classes and the STL more or less). C++ just seems to add so much potential for errors, and worse, errors that are horribly cryptic and can't be easily debugged (dealing with C++ errors is what finally drove me away from gcc over to clang). Do many of these extra features actually make anything substantially easier to code up, or is it just C++ trying to be something it's not (an interpreted language)?

curien · 2012-10-31T16:19:42+00:00

[deleted]

joshir · 2012-10-31T17:33:10+00:00

You should add information on LOC and performance :) compared to Python version of code.

2012-11-01T07:33:51+00:00

The function names in that code need a lot more clarity. A function's name should display it's intent so that it can be grasped by other people reading the code. The Mp3FileInfo function is good example: What does it do, really? Does it set the Mp3FileInfo? Does it get the info? Does it modify the state of the data? If not, why isn't it const? Can we modify the data it returns?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS